Go to file

zcy 893dab53b7 d		2024-11-17 03:09:14 +08:00
.idea	🐛🔨🆕	2021-02-27 23:01:45 +08:00
.vscode	ee	2024-07-03 17:49:47 +08:00
assets	add cookie login	2019-11-08 11:57:43 +08:00
blog/ds19991999	ee	2024-07-03 17:49:47 +08:00
csdn	d	2024-11-17 03:09:14 +08:00
.gitignore	ee	2024-07-03 17:49:47 +08:00
LICENSE	Initial commit	2019-10-19 18:54:26 +08:00
README.md	🐛🔨🆕 修复 BUG，去掉低级多线程爬取	2021-02-27 23:06:35 +08:00
requirements.txt	Bump requests from 2.22.0 to 2.31.0	2023-05-22 22:30:53 +00:00
single_article.py	d	2024-11-17 03:09:14 +08:00
test.html	d	2024-11-17 03:09:14 +08:00
test.md	add	2024-08-28 15:23:08 +08:00
upload_file.py	d	2024-11-17 03:09:14 +08:00
user.py	d	2024-11-17 03:09:14 +08:00

README.md

CSDN 爬虫

主要功能：爬取 csdn 博客指定用户的所有博文并转换为 markdown 格式保存到本地。

下载脚本

git clone https://github.com/ds19991999/csdn-spider.git
cd csdn-spider
python3 -m pip install -r requirements.txt

# 测试
python3 test.py # 需要先配置登录 cookie

爬取用户全部博文

import csdn
csdn.spider("ds19991999", "cookie.txt")
# 参数 usernames: str, cookie_path:str, folder_name: str = "blog"

示例爬取博文效果： ds19991999 的博文

LICENSE

PS：随意写的爬虫脚本，佛系更新。

README.md

CSDN 爬虫

下载脚本

获取 cookie

爬取用户全部博文

LICENSE