python爬虫 - request库

🕷️ 简介

Requests是Python中最流行的HTTP库,用于发送HTTP请求。

🚀 基本爬虫示例

import requests
from bs4 import BeautifulSoup
import csv
import os

# 获取当前路径
current_path = os.path.dirname(os.path.abspath(__file__))

# 设置请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Content-Type': 'text/html;charset=UTF-8',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Time-Zone': 'Asia/Shanghai'
}

# 构造请求 URL
url = 'https://www.coingecko.com/en'

# 发送请求
response = requests.get(url, headers=headers)

# 解析 HTML
soup = BeautifulSoup(response.text, 'html.parser')

# 找到加密货币表格
table = soup.find('table', {'class': 'table-scrollable'})

# 找到表格中的每一行
rows = table.find_all('tr')

# 打开 CSV 文件
with open(os.path.join(current_path, 'coingecko.csv'), 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['name', 'price', '1h', '24h', '7d', 'volume', 'market_cap'])

    for row in rows:
        cols = row.find_all('td')
        if len(cols) == 0:
            continue
        name = cols[2].find('a').find('span').text.strip()
        price = cols[3].find('span').text.replace('$', '')
        # ... 其他字段
        writer.writerow([name, price])

🔧 高级用法

处理重定向

import requests

response = requests.get('http://github.com', allow_redirects=False)
print(response.status_code)
print(response.history)

设置超时

import requests

try:
    response = requests.get('http://github.com', timeout=0.001)
except requests.exceptions.Timeout:
    print('The request timed out')

处理大文件

import requests

response = requests.get('http://example.com/big_file', stream=True)

with open('big_file', 'wb') as fd:
    for chunk in response.iter_content(chunk_size=128):
        fd.write(chunk)

异常处理

import requests
from requests.exceptions import RequestException

try:
    response = requests.get('http://example.com')
except RequestException as e:
    print('Error:', e)

📦 常用参数

参数说明
headers请求头
paramsURL参数
data请求体数据
jsonJSON数据
timeout超时时间
proxies代理设置
allow_redirects是否允许重定向
stream流式下载

作者:spike

分类: Python

创作时间:2026-02-23

更新时间:2026-02-23