🕷️ 简介
Playwright是微软开发的自动化测试工具,支持多浏览器,API更现代化。
🚀 基本示例
import csv
import os
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
current_path = os.path.dirname(os.path.abspath(__file__))
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Content-Type': 'text/html;charset=UTF-8',
}
url = 'https://www.coingecko.com/en'
with sync_playwright() as playwright:
# 启动Chromium浏览器
browser = playwright.chromium.launch(headless=True)
# 创建一个新页面
page = browser.new_page()
# 设置请求头
page.set_extra_http_headers(headers)
# 发送请求
page.goto(url)
# 模拟鼠标滚动
element = page.locator('#unobtrusive-flash-messages')
box = element.bounding_box()
mouse = page.mouse
mouse.move(box['x'] + box['width'] / 2, box['y'] + box['height'] / 2)
mouse.down()
mouse.move(box['x'] + box['width'] / 2 + 100, box['y'] + box['height'] / 2)
mouse.up()
# 等待页面加载完成
page.wait_for_selector('table.table-scrollable')
# 获取页面内容
page_content = page.content()
# 解析 HTML
soup = BeautifulSoup(page_content, 'html.parser')
table = soup.find('table', {'class': 'table-scrollable'})
# 数据处理...
browser.close()
📦 主要特性
- ✅ 支持Chromium、Firefox、WebKit
- ✅ 自动等待元素
- ✅ 支持鼠标和键盘操作
- ✅ 支持多页面和上下文
- ✅ 支持网络拦截