🕷️ 简介
Requests是Python中最流行的HTTP库,用于发送HTTP请求。
🚀 基本爬虫示例
import requests
from bs4 import BeautifulSoup
import csv
import os
# 获取当前路径
current_path = os.path.dirname(os.path.abspath(__file__))
# 设置请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Content-Type': 'text/html;charset=UTF-8',
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
'Time-Zone': 'Asia/Shanghai'
}
# 构造请求 URL
url = 'https://www.coingecko.com/en'
# 发送请求
response = requests.get(url, headers=headers)
# 解析 HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 找到加密货币表格
table = soup.find('table', {'class': 'table-scrollable'})
# 找到表格中的每一行
rows = table.find_all('tr')
# 打开 CSV 文件
with open(os.path.join(current_path, 'coingecko.csv'), 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['name', 'price', '1h', '24h', '7d', 'volume', 'market_cap'])
for row in rows:
cols = row.find_all('td')
if len(cols) == 0:
continue
name = cols[2].find('a').find('span').text.strip()
price = cols[3].find('span').text.replace('$', '')
# ... 其他字段
writer.writerow([name, price])
🔧 高级用法
处理重定向
import requests
response = requests.get('http://github.com', allow_redirects=False)
print(response.status_code)
print(response.history)
设置超时
import requests
try:
response = requests.get('http://github.com', timeout=0.001)
except requests.exceptions.Timeout:
print('The request timed out')
处理大文件
import requests
response = requests.get('http://example.com/big_file', stream=True)
with open('big_file', 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
异常处理
import requests
from requests.exceptions import RequestException
try:
response = requests.get('http://example.com')
except RequestException as e:
print('Error:', e)
📦 常用参数
| 参数 | 说明 |
|---|---|
headers | 请求头 |
params | URL参数 |
data | 请求体数据 |
json | JSON数据 |
timeout | 超时时间 |
proxies | 代理设置 |
allow_redirects | 是否允许重定向 |
stream | 流式下载 |