python爬虫 - playwright

🕷️ 简介

Playwright是微软开发的自动化测试工具,支持多浏览器,API更现代化。

🚀 基本示例

import csv
import os
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

current_path = os.path.dirname(os.path.abspath(__file__))

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Content-Type': 'text/html;charset=UTF-8',
}

url = 'https://www.coingecko.com/en'

with sync_playwright() as playwright:
    # 启动Chromium浏览器
    browser = playwright.chromium.launch(headless=True)

    # 创建一个新页面
    page = browser.new_page()

    # 设置请求头
    page.set_extra_http_headers(headers)

    # 发送请求
    page.goto(url)

    # 模拟鼠标滚动
    element = page.locator('#unobtrusive-flash-messages')
    box = element.bounding_box()

    mouse = page.mouse
    mouse.move(box['x'] + box['width'] / 2, box['y'] + box['height'] / 2)
    mouse.down()
    mouse.move(box['x'] + box['width'] / 2 + 100, box['y'] + box['height'] / 2)
    mouse.up()

    # 等待页面加载完成
    page.wait_for_selector('table.table-scrollable')

    # 获取页面内容
    page_content = page.content()

    # 解析 HTML
    soup = BeautifulSoup(page_content, 'html.parser')
    table = soup.find('table', {'class': 'table-scrollable'})
    
    # 数据处理...

    browser.close()

📦 主要特性

  • ✅ 支持Chromium、Firefox、WebKit
  • ✅ 自动等待元素
  • ✅ 支持鼠标和键盘操作
  • ✅ 支持多页面和上下文
  • ✅ 支持网络拦截

作者:spike

分类: Python

创作时间:2026-02-23

更新时间:2026-02-23