commit demo

2019-10-19 19:23:00 +08:00 · 2019-10-19 19:23:00 +08:00 · db79d3c171
parent 604709b719
commit db79d3c171
19 changed files with 1029 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1 +1,62 @@
-# csdn-spider
+# CSDN 爬虫脚本
 主要功能：爬取 `csdn` 博客指定用户的所有博文并转换为 `markdown` 格式保存到本地。
 ## 一、运行环境
 需要安装`WebDriver`驱动，https://chromedriver.chromium.org/downloads，下载与本地对应的`chrome`驱动后，将其添加至环境变量`$PATH`
 ```shell
 python3
 python3 -m pip install -r requirements.txt
 ```
 ## 二、获取脚本
 ```shell
 git clone https://github.com/ds19991999/csdn-spider.git
 ```
 ## 三、用法
 ### 1.获取cookie
 登录 `csdn` 账号，进入：https://blog.csdn.net ，按 `F12` 调试网页，复制所有的 `Request Headers`，保存到`cookie.txt`文件中
 ![1571482112632](assets/1571482112632.png)
 ### 2.添加需要爬取的 `csdn` 用户
 在`username.txt`中添加用户名，一行一个
 ### 3.运行脚本
 ```shell
 python3 csdn.py
 ```
 ## 四、效果
 **运行过程**
 ![1571483423256](assets/1571483423256.png)
 **文章列表建立**：`./articles/username/README.md`
 ![1571483552438](assets/1571483552438.png)
 **爬取的博文**：`./articles/username/`
 ![1571483479356](assets/1571483479356.png)
 **博文转换效果**：
 ![1571483777703](assets/1571483777703.png)
 ## 五、LICENSE
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
 `PS`：随意写的爬虫脚本，佛系更新。
--- a/articles/ds19991999/1.原创-Debian快速手动安装JupyterLab并配置Https.md
+++ b/articles/ds19991999/1.原创-Debian快速手动安装JupyterLab并配置Https.md
@ -0,0 +1,186 @@
 # 1.原创：Debian快速手动安装JupyterLab并配置Https
 很久之前我写过一篇关于`Jupyer lab`得超详细安装教程，[`传送门`](https://www.creat.kim/archives/25/)，感觉复杂了点，特别是`nginx`，我这块也没写清楚，所以不少人出现了无法运行`python`的情况，按照教程一步步来是绝对不会出问题的。有时候，虽然你能够用`https`访问，但是不代表就能运行，因为这里`jupyter lab`是基于`websocket`通信的，不是`http`。这里就再简化一下，用`Debian`系统安装一下`Jupyter Lab`，并使用`caddy`配置`https`访问，亲测可以运行程序。本教程只包括`Pytho2`内核，要同时安装`Python3`见[`传送门`](https://www.creat.kim/archives/25/)，这里简单写下步骤，快速上手，避免花费过多时间，一次成功，速度还蛮快的. demo: [https://jupyter.creat.kim](https://jupyter.creat.kim)<br/>
 <img alt="" src="http://image.creat.kim/picgo/20190326142651.png"/><br/>
 <img alt="" src="http://image.creat.kim/picgo/20190326151655.png"/>
 ```
 sudo apt-get install software-properties-common
 ```
 ## 安装`Python`环境
 ```
 sudo apt-get install python-pip python-dev build-essential 
 sudo pip install --upgrade pip 
 sudo pip install --upgrade virtualenv 
 sudo apt-get install python-setuptools python-dev build-essential 
 sudo easy_install pip 
 sudo pip install --upgrade virtualenv 
 sudo apt-get install python3-pip
 sudo apt-get install python-pip
 sudo pip3 install --upgrade pip
 sudo pip2 install --upgrade pip
 sudo pip install --upgrade pip
 ```
 ## 查看`pip`指向
 ```
 ~ $which pip
 /usr/local/bin/pip
 21:36 alien@alien-Inspiron-3443:
 ~ $which pip2
 /usr/local/bin/pip2
 21:36 alien@alien-Inspiron-3443:
 ~ $which pip3
 /usr/local/bin/pip3
 ```
 ## 安装`yarn`
 ```
 curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
 echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
 sudo apt-get update
 sudo apt-get install yarn
 ```
 ## 安装`nodejs`
 ```
 curl -sL https://deb.nodesource.com/setup_10.x | bash -
 apt-get install -y nodejs
 ```
 ## 安装`jupyterlab`
 ```
 sudo pip2 install jupyterlab
 ```
 ## 配置`jupyerlab`
 ```
 jupyter-notebook password
 ```
 进入`ipython`设置哈希密码，这里输入的是你登陆`jupyter lab`的密码，记下生成的哈希密码.
 ```
 ipython
 from notebook.auth import passwd
 passwd()
 # 输入你自己设置登录JupyterLab界面的密码，
 # 然后就会生产下面这样的密码，将它记下来，待会儿用
 'sha1:b92f3fb7d848:a5d40ab2e26aa3b296ae1faa17aa34d3df351704'
 ```
 ## 编辑配置文件
 一般在`/root/.jupyter/jupyter_notebook_config.py`中，找到并修改以下配置项。
 ```
 c.NotebookApp.allow_root = True
 c.NotebookApp.ip = '0.0.0.0'
 c.NotebookApp.notebook_dir = u'/root/JupyterLab'
 c.NotebookApp.open_browser = False
 c.NotebookApp.password = u'sha1:b92f3fb7d848:a5d40ab2e26aa3b296ae1faa17aa34d3df351704'
 c.NotebookApp.port = 8888
 # 解释以上各项
 允许以root方式运行jupyterlab
 允许任意ip段访问
 设置jupyterlab页面的根目录
 默认运行时不启动浏览器，因为服务器默认只有终端嘛
 设置之前生产的哈希密码
 设置访问端口，与下面的caddy需一致
 ```
 ## 运行`Jupyter Lab`
 ```
 jupyter-lab --version
 jupyter lab build
 mkdir ~/JupyterLab
 cd ~/JupyterLab
 # 方便后台运行
 apt install screen
 screen -S jupterlab
 jupyter lab 
 ```
 `ctrl+A+D`退出这个窗口。
 ## `caddy`开启`https`反代
 域名改成你自己的，`caddy`详细使用见：[`【传送门】`](https://www.creat.kim/archives/18/)
 ```
 wget -N --no-check-certificate https://raw.githubusercontent.com/ds19991999/shell.sh/shell/caddy_install.sh &amp;&amp; chmod +x caddy_install.sh &amp;&amp; bash caddy_install.sh
 echo "jupyter.creat.kim
 gzip
 tls cva.engineer.ding@gmail.com
 proxy / 127.0.0.1:8888 {
  transparent
  websocket
 }" &gt; /usr/local/caddy/Caddyfile
 ```
 ## 定时备份到`GitHub`
 见大佬写的比较详细的文章：[`【传送门】`](https://www.moerats.com/archives/858/)
 ## 配置`python2`和`python3`内核
 好人做到底吧，这里肯定很多人踩坑。。。用`pip`安装包的时候千万不要用`pip3 install ***`或者`pip2 install ***`呀.
 ```
 python2 -m pip install ipykernel ipython matplotlib scipy pandas numpy
 python3 -m pip install ipykernel ipython matplotlib scipy pandas numpy
 ```
 检查一下内核
 ```
 root@google:~/JupyterLab# jupyter kernelspec list
 Available kernels:
  python2    /usr/local/share/jupyter/kernels/python2
  python3    /usr/local/share/jupyter/kernels/python3
 ```
 好了，访问域名，开始使用吧。
 ---
 ## 最后一点思悟
 大概这是我发在`CSDN`最后的博文了，本文来自 [https://www.creat.kim/archives/40/](https://www.creat.kim/archives/40/) ，不错，终于抛弃公共博客平台了。我在`CSDN`写了差不多一年半左右的博文吧，共`107`篇，其中`97`篇原(chao)创(xi)，`7`篇转载，`2`篇私密，`1`篇因违反相关政策被管理员设为私密 … 博客`CSDN`排名`10k+`，访问量`225k+`，粉丝数`48`，表现平平，博文水平一般，算是代表了大部分人吧。
 国内的博客平台其实都不错，`CSDN` 的写作体验也非常好，我曾经也一度在自己的博客平台或者公共博客平台之间徘徊，慢慢的最初写博客的意义就变味了，不过经历过这个过程，大概就明白了一些事吧。
 在尝试`WordPress` 、`知乎` 、`简书`、`博客园`、`新浪`、`GitHub-Jekyll` 、`coding-jekyll`、`hexo` 、`Typecho`…之后，了解了一些网站运行常识，最起码知道国内的都是需要备案的 …<br/>
 在图床方面，从最初的直接复制粘贴到`GitHub`+`PicGo`、`又拍云` (需要备案)、`七牛云`（需要备案）、自建图床…明白了一些`CDN`加速技巧 …<br/>
 在文档方面，从最初的直接编辑，到`CSDN`的`MarkDown`编辑器、`有道云笔记`、`Evernote`(分国外国内版本)、`GitHub-README`、`GitBook`、`MkDoc`、`Read the Docs`、`Sphinx`、`Docsify`，明白了孰能生巧，熟练的话，什么文本都能写的漂亮，虽然我至今不会`Vim` …<br/>
 在服务器选择上面，国内和国外的差异，也了解了不少，也越来越深恶痛绝 `install` 一个包或者一个`程序`的时候，你就那么几`k`几`b`的跑，国内源再怎么换，也比不上国外源的速度，有些网站虽然没有被`q`，你本地那速度受的了吗，现在也服气当初我是怎么忍受那龟一般的网速。看到过，了解过，才能从另一个角度看待问题，总比一直看被经过过滤的信息强吧。
 再看看国外的教育福利，有人说是国外被中国人撸羊毛撸怕了，所以就不给中国提供教育福利。但是你看看国内大厂的教育福利，那服务器多便宜，我自己都眼馋，赶紧去每个厂注册一个号。要求实名，好，我实名，我传照片；要求备案，啥，还备案，好，我备案，我传照片，又是一个星期；这咋还有监测呢，忍不了了 … 这像不像裸贷，你只要用身份证实名，把自己的靓照交给他，他就给你提供廉价的服务器，这里说的有点过了，哈哈哈。前不久谷歌也要求中国IP注册地需要传照片了，唯独中国。国外在教育方面的投资我们真的要好好学习学习 …
 之前的`12306事件`、`蓝灯事件`、`某某数据库泄露`，真真假假假亦真。身在国内，就不得不用隐私换取便利。
--- a/articles/ds19991999/2.原创-解决套路云Debian新机update的时候出现Waiting
+++ b/articles/ds19991999/2.原创-解决套路云Debian新机update的时候出现Waiting
@ -0,0 +1,33 @@
 # 2.原创：解决套路云Debian新机update的时候出现Waiting for headers和404错误
 ```
 rm -rf /root/.pip /root/.pydistutils.cfg /etc/apt/sources.list.d/sources-aliyun-0.list /etc/apt/sources.list.d/sources-aliyun* /var/lib/apt/lists/* 
 ```
 ```
 deb http://mirrors.cloud.aliyuncs.com/debian/ jessie main contrib non-free
 deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie main contrib non-free
 deb http://mirrors.cloud.aliyuncs.com/debian/ jessie-proposed-updates main non-free contrib
 deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie-proposed-updates main non-free contrib
 deb http://mirrors.cloud.aliyuncs.com/debian/ jessie-updates main contrib non-free
 deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie-updates main contrib non-free
 ## Uncomment the following two lines to add software from the 'backports'
 ## repository.
 ##
 ## N.B. software from this repository may not have been tested as
 ## extensively as that contained in the main release, although it includes
 ## newer versions of some applications which may provide useful features.
 #deb http://mirrors.cloud.aliyuncs.com/debian/ jessie-backports main contrib non-free
 #deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie-backports main contrib non-free
 ```
 ```
 apt-get clean
 apt-get update
 ```
 套路云还是套路云，服气！！！
--- a/articles/ds19991999/3.原创-Jekyll
+++ b/articles/ds19991999/3.原创-Jekyll
@ -0,0 +1,3 @@
 # 3.原创：Jekyll 博客 Netlify CMS 后台部署
 ### 文章目录
--- a/articles/ds19991999/4.原创-Let's
+++ b/articles/ds19991999/4.原创-Let's
@ -0,0 +1,71 @@
 # 4.原创：Let's Encrypt 泛域名证书申请
 > 
 github: [https://github.com/Neilpang/acme.sh](https://github.com/Neilpang/acme.sh)
 通过acme申请Let’s Encrypt证书支持的域名DNS服务商有以下这些（国内用户较多的）：`cloudxns、dnspod、aliyun（阿里云）、cloudflare、linode、he、digitalocean、namesilo、aws、namecom、freedns、godaddy、yandex` 等等。
 ### 目录
 ## [安装acm.sh](http://xn--acm-pd0fq01r.sh)
 ```
 curl  https://get.acme.sh | sh
 ```
 `acme.sh`被安装在了`~./.acme.sh`，创建 一个 `bash` 的 `alias`, 方便你的使用: `alias acme.sh=~/.acme.sh/acme.sh`
 通过`acme.sh`安装的证书会自动为你创建 `cronjob`, 每天 0:00 点自动检测所有的证书, 如果快过期了, 需要更新, 则会自动更新证书.
 ## DNS方式验证域名所有权
 ```
 acme.sh  --issue  --dns   -d mydomain.com
 ```
 `acme.sh` 会生成相应的解析记录显示出来, 你只需要在你的域名管理面板中添加这条 `txt` 记录即可.
 ## 获取`DNS API`
 获取`DNS`域名商的`DNS API` ，`api` 也会将 上面的`txt` 记录自动添加到域名解析商。比喻阿里的`api`：[https://ak-console.aliyun.com/#/accesskey](https://ak-console.aliyun.com/#/accesskey) ，然后看说明进行配置 [https://github.com/Neilpang/acme.sh/tree/master/dnsapi](https://github.com/Neilpang/acme.sh/tree/master/dnsapi) 阿里的就是：
 ```
 export Ali_Key="sdfsdfsdfljlbjkljlkjsdfoiwje"
 export Ali_Secret="jlsdflanljkljlfdsaklkjflsa"
 acme.sh --issue --dns dns_ali -d example.com -d *.example.com
 ```
 这个`*`值的就是泛域名。运行一次之后Ali_Key和Ali_Secret将被保存`~/.acme.sh/account.conf`，生成的SSL证书目录在`~/.acme.sh/example.com`
 ## 安装证书
 > 
 详见：[copy/安装 证书](https://github.com/Neilpang/acme.sh/wiki/%E8%AF%B4%E6%98%8E#3-copy%E5%AE%89%E8%A3%85-%E8%AF%81%E4%B9%A6)
 使用 `--installcert` 命令,并指定目标位置, 然后证书文件会被copy到相应的位置, 例如:
 ```
 acme.sh  --installcert  -d  &lt;domain&gt;.com   \
        --key-file   /etc/nginx/ssl/&lt;domain&gt;.key \
        --fullchain-file /etc/nginx/ssl/fullchain.cer \
        --reloadcmd  "service nginx force-reload"
 ```
 宝塔用户在SSL选项选择其他证书，把SSL证书内容粘贴上面去就行了<br/>
 <img alt="" src="http://image.creat.kim/picgo/20190314132922.png"/><br/>
 这里改一下证书路径<br/>
 <img alt="" src="http://image.creat.kim/picgo/20190314132617.png"/><br/>
 目前证书在 60 天以后会自动更新, 你无需任何操作. 今后有可能会缩短这个时间, 不过都是自动的, 你不用关心.
 ## 更新 `acme.sh`
 自动更新：`acme.sh --upgrade --auto-upgrade`<br/>
 关闭更新：`acme.sh --upgrade --auto-upgrade 0`
 有问题看 [wiki](https://github.com/Neilpang/acme.sh/wiki) 和 [dubug](https://github.com/Neilpang/acme.sh/wiki/How-to-debug-acme.sh)
--- a/articles/ds19991999/5.原创-Rclone笔记.md
+++ b/articles/ds19991999/5.原创-Rclone笔记.md
@ -0,0 +1,181 @@
 # 5.原创：Rclone笔记
 > 
 ### 目录
 ## 一些简单命令
 ### 挂载
 ```
 # windows 挂载命令
 rclone mount OD:/ H: --cache-dir E:\ODPATH --vfs-cache-mode writes &amp;
 # linux 挂载命令
 nohup rclone mount GD:/ /root/GDPATH --copy-links --no-gzip-encoding --no-check-certificate --allow-other --allow-non-empty --umask 000 &amp;
 # 取消挂载————linux 通用
 fusermount -qzu /root/GDPATH 或者
 fusermount -u /path/to/local/mount
 # windows 取消挂载
 umount /path/to/local/mount
 ```
 ### rclone命令
 ```
 rclone ls
 eg____rclone ls remote:path [flags]
 ls # 递归列出 remote 所有文件及其大小，有点类似 tree 命令
 lsl # 递归列出 remote 所有文件、大小及修改时间
 lsd # 仅仅列出文件夹的修改时间和文件夹内的文件个数
 lsf # 列出当前层级的文件或文件夹名称
 lsjson # 以JSON格式列出文件和目录
 rclone copy
 eg____rclone copy OD:/SOME/PATH GD:/OTHER/PATH
 --no-traverse # /path/to/src中有很多文件，但每天只有少数文件发生变化，加上这个参数可以提高传输速度
 -P # 实时查看传输统计信息
 --max-age 24h # 仅仅传输24小时内修改过的文件，默认关闭
 rclone copy --max-age 24h --no-traverse /path/to/src remote:/PATH -P
 rclone sync
 eg____rclone sync source:path dest:path [flags]
 # 使用该命令时先用 --dry-run 测试，明确要复制和删除的内容
 rclone delete
 # 列出大于 100M 的文件
 rclone --min-size 100M lsl remote:path
 # 删除测试
 rclone --dry-run --min-size 100M delete remote:path
 # 删除
 rclone --min-size 100M delete remote:path
 # 删除路径及其所有内容，filters此时无效，这与 delete 不同
 rclone purge
 # 删除空路径
 rclone rmdir
 # 删除路径下的空目录
 rclone rmdirs
 # 移动文件
 rclone move
 # 移动后删除空源目录
 --delete-empty-src-dirs
 # 检查源和目标匹配中的文件
 rclone check
 # 从两个源下载数据并在运行中互相检查它们而不是哈希检测
 --download
 rclone md5sum
 # 为路径中的所有文件生成md5sum文件
 rclone sha1sum
 # 为路径中的所有文件生成sha1sum文件
 rclone size
 # 在remote：path中打印该路径下的文件总大小和数量
 --json # 输出json格式
 rclone version --check #检查版本更新
 rclone cleanup # 清理源的垃圾箱或者旧版本文件
 rclone dedupe # 以交互方式查找重复文件并删除/重命名它们
 --dedupe-mode newest - 删除相同的文件，然后保留最新的文件,非交互方式
 rclone cat
 # 同linux
 rclone copyto
 # 将文件从源复制到dest，跳过已复制的文件
 rclone gendocs output_directory [flags] 
 # 生成rclone的说明文档
 rclone listremotes # 列出配置文件中所有源
 --long 显示类型和名称 默认只显示名称
 rclone moveto
 # 不会传输未更改的文件
 rclone cryptcheck /path/to/files encryptedremote:path
 # 检查加密源的完整性
 rclone about
 # 获取源的配额 ，eg
 $ rclone about ODA1P1:
 Total:   5T
 Used:    284.885G
 Free:    4.668T
 Trashed: 43.141G
 --json # 以 json 格式输出
 rclone mount # 挂载命令
 # 在Windows使用则需要安装winfsp
 --vfs-cache-mode # 不使用该参数，只能按顺序写入文件，只能在读取时查找，即windows程序无法操作文件，使用该参数即启用缓存机制
 # 共四种模式：off|minimal|writes|full 缓存模式越高，rclone越多，代价是使用磁盘空间，默认为full
 --vfs-cache-max-age 24h # 缓存24小时内修改过的文件
 --vfs-cache-max-size 10g # 最大总缓存10g (缓存可能会超过此大小)
 --cache-dir 指定缓存位置
 --umask int 覆盖文件系统权限
 --allow-non-empty 允许挂载在非空目录
 --allow-other 允许其他用户访问
 --no-check-certificate 不检查服务器SSL证书
 --no-gzip-encoding 不设置接受gzip编码
 ```
 ## 用自己的 api 进行 gd 转存
 > 
 见这位大佬博客：[https://www.moerats.com/archives/877/](https://www.moerats.com/archives/877/)
 使用 `rclone` 的人太多吉会有一个问题，我们使用的是共享的`client_id`，在高峰期会出现`403`或者还没到`750G`限制就出现`Limitations`问题，所以高频率使用`rclone`转存谷歌文件得朋友就需要使用自己的`api`。通过上面那篇文章给出的方法获取谷歌的 API 客户端`ID`和客户端密钥，`rclone config`命令配置的时候，会有部分提示你输入，直接粘贴就`OK`.
 挂载就变成：
 ```
 #该参数主要是上传用的
 /usr/bin/rclone mount DriveName:Folder LocalFolder \
 --umask 0000 \
 --default-permissions \
 --allow-non-empty \
 --allow-other \
 --transfers 4 \
 --buffer-size 32M \
 --low-level-retries 200
 #如果你还涉及到读取使用，比如使用H5ai等在线播放，就还建议加3个参数，添加格式参考上面
 --dir-cache-time 12h
 --vfs-read-chunk-size 32M
 --vfs-read-chunk-size-limit 1G
 ```
 ## 突破 Google Drive 服务端 750g 限制
 谷歌官方明确限制通过第三方`api`每天限制转存`750G`文件，这个 `750G` 是直接通过谷歌服务端进行，文件没有经过客户端，另外经过客户端上传到 `gd` 与 服务端转存不冲突，官方也有 `750G` 限制，所以每天上传限额一共是 `1.5T`
 ```
 # 一般用法，使用服务端API，不消耗本地流量
 rclone copy GD1:/PATH GD2:/PATH
 # disable server side copies 使用客户端 API，流量走客户端
 rclone --disable copy GD1:/PATH GD2:/PATH
 ```
 这样就是每天 `1.5T` 了。
 ## 谷歌文档限制
 在 `rclone ls` 中谷歌文档会出现 `-1`， 而对于其他 `VFS` 层文件显示 `0` ，比喻通过 `rclone mount`，`rclone serve`操作的文件。而我们用 `rclone sync`，`rclone copy`的命令时，它会忽略文档大小而直接操作。也就是说如果你没有下载谷歌文档，就不知道它多大，没啥影响…
--- a/articles/ds19991999/6.转载-Office365
+++ b/articles/ds19991999/6.转载-Office365
@ -0,0 +1,7 @@
 # 6.转载：Office365 PC版修改更新频道
 Office 365 PC版 默认为半年更新频道，可以修改为每月更新频道或其他频道，以体验最新功能。
 > 
 原文链接：[https://www.mr-technos.com/forum.php?mod=viewthread&amp;tid=79](https://www.mr-technos.com/forum.php?mod=viewthread&amp;tid=79)
--- a/articles/ds19991999/7.原创-转存百度盘到gd-od的解决方案.md
+++ b/articles/ds19991999/7.原创-转存百度盘到gd-od的解决方案.md
@ -0,0 +1,91 @@
 # 7.原创：转存百度盘到gd/od的解决方案
 **首页：**[HomePage](https://telegra.ph/HomePage-01-03)<br/>[https://telegra.ph/Fuck-PanBaidu-02-19](https://telegra.ph/Fuck-PanBaidu-02-19) <br/>[https://graph.org/Fuck-PanBaidu-02-19](https://graph.org/Fuck-PanBaidu-02-19)
 ### 一、安装aria2
 ```
 wget -N https://git.io/aria2.sh &amp;&amp; chmod +x aria2.sh &amp;&amp; bash aria2.sh
 ```
 启动：/etc/init.d/aria2 start
 停止：/etc/init.d/aria2 stop
 重启：/etc/init.d/aria2 restart
 查看状态：/etc/init.d/aria2 status
 配置文件：/root/.aria2/aria2.conf （配置文件包含中文注释，但是一些系统可能不支持显示中文）
 令牌密匙：随机生成（可以改配置文件）
 默认下载目录：/root/Download
 ### 二、aria2离线gd/od方案
 1、安装rclone
 ```
 curl https://rclone.org/install.sh | sudo bash
 ```
 rclone配置可以参考：[https://rclone.org/drive/](https://rclone.org/drive/)
 2、修改脚本 **/root/.aria2/autoupload.sh**
 ```
 - name='Onedrive' #配置Rclone时的name- folder='/DRIVEX/Download' #网盘里的文件夹，留空为网盘根目录。
 ```
 3、修改aria2配置文件：**/root/.aria2/aria2.conf 启用文件下载完成后脚本：**
 ```
 - # 调用 rclone 上传(move)到网盘- on-download-complete=/root/.aria2/autoupload.sh
 ```
 4、重启 aria2
 ```
 - /root/aria2.sh  选6重启- 或者运行：service aria2 restart
 ```
 5、使用aria2前端面板进行文件下载：[aria2.ml](http://aria2.ml/)
 填好vps端的aria2配置信息
 点击新建粘贴下载链接进行文件下载
 下载的文件会自动上传到gd/od
 ### 三、利用第三方百度盘
 这里推荐速盘，可惜PanDownload没有开放aria2配置
 如图，修改下载文件保存位置，GUI界面无法修改，请先退出软件，在config.ini文件中进行修改：
 其中下载文件保存位置与远程服务器的aria2的配置一样，比喻此方式安装的aria2就是**/root/Download**
 于是就可以把你的百度网盘文件直接下载到gd/od中了。
 ### 四、效果图
 1.使用AriaNG面板下载文件到VPS，利用**autoupload.sh脚本实现gd离线下载电影**
 2.利用速盘远程aria2的功能实现将百度网盘文件远程下载到VPS，再利用**autoupload.sh脚本实现自动转存到gd**
--- a/articles/ds19991999/README.md
+++ b/articles/ds19991999/README.md
@ -0,0 +1,15 @@
 # ds19991999 的博文
 1. [原创：Debian快速手动安装JupyterLab并配置Https](https://blog.csdn.net/ds19991999/article/details/88935996)
 2. [原创：解决套路云Debian新机update的时候出现Waiting for headers和404错误](https://blog.csdn.net/ds19991999/article/details/88659452)
 3. [原创：Jekyll 博客 Netlify CMS 后台部署](https://blog.csdn.net/ds19991999/article/details/88651187)
 4. [原创：Let's Encrypt 泛域名证书申请](https://blog.csdn.net/ds19991999/article/details/88553810)
 5. [原创：Rclone笔记](https://blog.csdn.net/ds19991999/article/details/88370053)
 6. [转载：Office365 PC版修改更新频道](https://blog.csdn.net/ds19991999/article/details/87973325)
 7. [原创：转存百度盘到gd/od的解决方案](https://blog.csdn.net/ds19991999/article/details/87736377)
 8. [原创：以WebDav方式挂载OneDrive](https://blog.csdn.net/ds19991999/article/details/86506042)
 9. [原创：接码平台分享](https://blog.csdn.net/ds19991999/article/details/86505762)
 10. [原创：CSDN自定义友链侧边栏](https://blog.csdn.net/ds19991999/article/details/86505686)
 11. [原创：资源分享](https://blog.csdn.net/ds19991999/article/details/85225611)
 12. [原创：Windows上挂载OneDrive为本地硬盘](https://blog.csdn.net/ds19991999/article/details/85008885)
 13. [原创：Ubuntu使用日常](https://blog.csdn.net/ds19991999/article/details/83719417)
 14. [原创：彻底解决Ubuntu联网问题——网速飞起](https://blog.csdn.net/ds19991999/article/details/83715489)
--- a/assets/1571482112632.png
+++ b/assets/1571482112632.png
--- a/assets/1571483423256.png
+++ b/assets/1571483423256.png
--- a/assets/1571483479356.png
+++ b/assets/1571483479356.png
--- a/assets/1571483552438.png
+++ b/assets/1571483552438.png
--- a/assets/1571483777703.png
+++ b/assets/1571483777703.png
--- a/cookie.txt
+++ b/cookie.txt
@ -0,0 +1,13 @@
 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
 Accept-Encoding: gzip, deflate, br
 Accept-Language: zh-CN,zh;q=0.9
 Cache-Control: max-age=0
 Connection: keep-alive
 Cookie: acw_tc=2760829715714827204377171e8e9dc3a79185500e46805511b2c277adf1fb; acw_sc__v3=5daaec608ce6c5ba1fab0c4137c00ecb0cd34525; uuid_tt_dd=10_2450623130-1571482720624-229726; dc_session_id=10_1571482720624.999633; acw_sc__v2=5daaec6067f5ec51b728d2bd7660bf7372ed8903; TY_SESSION_ID=c82ca68f-e408-4c15-b681-71da67f637c2; dc_tos=pzmbtt; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1571482722; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1571482722; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_2450623130-1571482720624-229726; c-login-auto=1; announcement=%257B%2522announcementUrl%2522%253A%2522https%253A%252F%252Fblogdev.blog.csdn.net%252Farticle%252Fdetails%252F102605809%2522%252C%2522announcementCount%2522%253A1%252C%2522announcementExpire%2522%253A527116621%257D
 Host: blog.csdn.net
 Referer: https://blog.csdn.net/
 Sec-Fetch-Mode: navigate
 Sec-Fetch-Site: same-origin
 Sec-Fetch-User: ?1
 Upgrade-Insecure-Requests: 1
 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36
--- a/csdn.py
+++ b/csdn.py
@ -0,0 +1,208 @@
 #!/usr/bin/env python
 # coding: utf-8
 import os, time, re
 import requests
 import threading
 import logging
 from bs4 import BeautifulSoup, Comment
 from selenium import webdriver
 from tomd import Tomd
 def result_file(folder_name, file_name):
 	folder = os.path.join(os.path.dirname(os.path.realpath(__file__)), "articles", folder_name)
 	if not os.path.exists(folder):
 		os.makedirs(folder)
 		path = os.path.join(folder, file_name)
 		file = open(path,"w")
 		file.close()
 	else:
 		path = os.path.join(folder, file_name)
 	return path
 def get_headers(cookie_path:str):
 	cookies = {}
 	with open(cookie_path, "r", encoding="utf-8") as f:
 		cookie_list = f.readlines()
 	for line in cookie_list:
 		cookie = line.split(":")
 		cookies[cookie[0]] = str(cookie[1]).strip()
 	return cookies
 def delete_ele(soup:BeautifulSoup, tags:list):
 	for ele in tags:
 		for useless_tag in soup.select(ele):
 			useless_tag.decompose()
 def delete_ele_attr(soup:BeautifulSoup, attrs:list):
 	for attr in attrs:
 		for useless_attr in soup.find_all():
 			del useless_attr[attr]
 def delete_blank_ele(soup:BeautifulSoup, eles_except:list):
 	for useless_attr in soup.find_all():
 		try:
 			if useless_attr.name not in eles_except and useless_attr.text == "":
 				useless_attr.decompose()
 		except Exception:
 			pass
 class TaskQueue(object):
 	def __init__(self):
 		self.VisitedList = []
 		self.UnVisitedList = []
 	def getVisitedList(self):
 		return self.VisitedList
 	def getUnVisitedList(self):
 		return self.UnVisitedList
 	def InsertVisitedList(self, url):
 		if url not in self.VisitedList:
 			self.VisitedList.append(url)
 	def InsertUnVisitedList(self, url):
 		if url not in self.UnVisitedList:
 			self.UnVisitedList.append(url)
 	def RemoveVisitedList(self, url):
 		self.VisitedList.remove(url)
 	def PopUnVisitedList(self,index=0):
 		url = ""
 		if index and self.UnVisitedList:
 			url = self.UnVisitedList[index]
 			del self.UnVisitedList[:index]
 		elif self.UnVisitedList:
 			url = self.UnVisitedList.pop()
 		return url
 	def getUnVisitedListLength(self):
 		return len(self.UnVisitedList)
 class Article(object):
 	def __init__(self):
 		self.options = webdriver.ChromeOptions()
 		self.options.add_experimental_option('excludeSwitches', ['enable-logging'])
 		self.options.add_argument('headless')
 		self.browser = webdriver.Chrome(options=self.options)
 		# 设置全局智能等待时间
 		self.browser.implicitly_wait(30)
 	def get_content(self, url):
 		self.browser.get(url)
 		try:
 			self.browser.find_element_by_xpath('//a[@class="btn-readmore"]').click()
 		except Exception:
 			pass
 		content = self.browser.find_element_by_xpath('//div[@id="content_views"]').get_attribute("innerHTML")
 		return content
 	def get_md(self, url):
 		"""
 		转换为markdown格式
 		"""
 		content = self.get_content(url)
 		soup = BeautifulSoup(content, 'lxml')
 		# 删除注释
 		for useless_tag in soup(text=lambda text: isinstance(text, Comment)):
 			useless_tag.extract()
 		# 删除无用标签
 		tags = ["svg", "ul", ".hljs-button.signin"]
 		delete_ele(soup, tags)
 		# 删除标签属性
 		attrs = ["class", "name", "id", "onclick", "style", "data-token", "rel"]
 		delete_ele_attr(soup,attrs)
 		# 删除空白标签
 		eles_except = ["img", "br", "hr"]
 		delete_blank_ele(soup, eles_except)
 		# 转换为markdown
 		md = Tomd(str(soup)).markdown
 		return md
 class CSDN(object):
 	def __init__(self, cookie_path):
 		self.headers = get_headers(cookie_path)
 		self.TaskQueue = TaskQueue()
 	def get_articles(self, username:str):
 		"""获取文章标题和链接"""
 		num = 0
 		while True:
 			num += 1
 			url = u'https://blog.csdn.net/' + username + '/article/list/' + str(num)
 			response = requests.get(url=url, headers=self.headers)
 			html = response.text
 			soup = BeautifulSoup(html, "html.parser")
 			articles = soup.find_all('div', attrs={"class":"article-item-box csdn-tracking-statistics"})
 			if len(articles) > 0:
 				for article in articles:
 					article_title = article.a.text.strip().replace('        ','：')
 					article_href = article.a['href']
 					yield article_title,article_href
 			else:
 				break
 	def write_articals(self, username:str):
 		"""将博文写入本地"""
 		print("[++] 正在爬取 {} 的博文......".format(username))
 		artical = Article()
 		reademe_path = result_file(username,file_name="README.md")
 		with open(reademe_path,'w', encoding='utf-8') as reademe_file:
 			i = 1
 			readme_head = "# " + username + " 的博文\n"
 			reademe_file.write(readme_head)
 			for article_title,article_href in self.get_articles(username):
 				print("[++++] {}. 正在处理URL：{}".format(str(i), article_href))
 				text = str(i) + '. [' + article_title + ']('+ article_href +')\n'
 				reademe_file.write(text)
 				file_name = str(i) + "." + re.sub(r'[\/:：*?"<>|]','-', article_title) + ".md"
 				artical_path = result_file(folder_name=username, file_name=file_name)
 				md_content = artical.get_md(article_href)
 				md_head = "# " + str(i) + "." + article_title + "\n"
 				md = md_head + md_content
 				with open(artical_path, "w", encoding="utf-8") as artical_file:
 					artical_file.write(md)
 				i += 1
 				time.sleep(2)
 	def spider(self):
 		"""将爬取到的文章保存到本地"""
 		while True:
 			if self.TaskQueue.getUnVisitedListLength():
 				username = self.TaskQueue.PopUnVisitedList()
 				self.write_articals(username)
 	def check_user(self, user_path:str):
 		with open(user_path, 'r', encoding='utf-8') as f:
 			users = f.readlines()
 		for user in users:
 			self.TaskQueue.InsertUnVisitedList(user.strip())
 	def run(self, user_path):
 		UserThread = threading.Thread(target=self.check_user, args=(user_path,))
 		SpiderThread = threading.Thread(target=self.spider, args=())
 		UserThread.start()
 		SpiderThread.start()
 		UserThread.join()
 		SpiderThread.join()
 def main():
 	user_path = 'username.txt'
 	csdn = CSDN('cookie.txt')
 	csdn.run(user_path)
 if __name__ == "__main__":
 	main()
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,3 @@
 bs4==0.0.1
 selenium==3.141.0
 requests==2.22.0
--- a/tomd.py
+++ b/tomd.py
@ -0,0 +1,155 @@
 import re
 __all__ = ['Tomd', 'convert']
 MARKDOWN = {
    'h1': ('\n# ', '\n'),
    'h2': ('\n## ', '\n'),
    'h3': ('\n### ', '\n'),
    'h4': ('\n#### ', '\n'),
    'h5': ('\n##### ', '\n'),
    'h6': ('\n###### ', '\n'),
    'code': ('`', '`'),
    'ul': ('', ''),
    'ol': ('', ''),
    'li': ('- ', ''),
    'blockquote': ('\n> ', '\n'),
    'em': ('**', '**'),
    'strong': ('**', '**'),
    'block_code': ('\n```\n', '\n```\n'),
    'span': ('', ''),
    'p': ('\n', '\n'),
    'p_with_out_class': ('\n', '\n'),
    'inline_p': ('', ''),
    'inline_p_with_out_class': ('', ''),
    'b': ('**', '**'),
    'i': ('*', '*'),
    'del': ('~~', '~~'),
    'hr': ('\n---', '\n\n'),
    'thead': ('\n', '|------\n'),
    'tbody': ('\n', '\n'),
    'td': ('|', ''),
    'th': ('|', ''),
    'tr': ('', '\n')
 }
 BlOCK_ELEMENTS = {
    'h1': '<h1.*?>(.*?)</h1>',
    'h2': '<h2.*?>(.*?)</h2>',
    'h3': '<h3.*?>(.*?)</h3>',
    'h4': '<h4.*?>(.*?)</h4>',
    'h5': '<h5.*?>(.*?)</h5>',
    'h6': '<h6.*?>(.*?)</h6>',
    'hr': '<hr/>',
    'blockquote': '<blockquote.*?>(.*?)</blockquote>',
    'ul': '<ul.*?>(.*?)</ul>',
    'ol': '<ol.*?>(.*?)</ol>',
    'block_code': '<pre.*?><code.*?>(.*?)</code></pre>',
    'p': '<p\s.*?>(.*?)</p>',
    'p_with_out_class': '<p>(.*?)</p>',
    'thead': '<thead.*?>(.*?)</thead>',
    'tr': '<tr>(.*?)</tr>'
 }
 INLINE_ELEMENTS = {
    'td': '<td>(.*?)</td>',
    'tr': '<tr>(.*?)</tr>',
    'th': '<th>(.*?)</th>',
    'b': '<b>(.*?)</b>',
    'i': '<i>(.*?)</i>',
    'del': '<del>(.*?)</del>',
    'inline_p': '<p\s.*?>(.*?)</p>',
    'inline_p_with_out_class': '<p>(.*?)</p>',
    'code': '<code.*?>(.*?)</code>',
    'span': '<span.*?>(.*?)</span>',
    'ul': '<ul.*?>(.*?)</ul>',
    'ol': '<ol.*?>(.*?)</ol>',
    'li': '<li.*?>(.*?)</li>',
    'img': '<img.*?src="(.*?)".*?>(.*?)</img>',
    'a': '<a.*?href="(.*?)".*?>(.*?)</a>',
    'em': '<em.*?>(.*?)</em>',
    'strong': '<strong.*?>(.*?)</strong>'
 }
 DELETE_ELEMENTS = ['<span.*?>', '</span>', '<div.*?>', '</div>']
 class Element:
    def __init__(self, start_pos, end_pos, content, tag, is_block=False):
        self.start_pos = start_pos
        self.end_pos = end_pos
        self.content = content
        self._elements = []
        self.is_block = is_block
        self.tag = tag
        self._result = None
        if self.is_block:
            self.parse_inline()
    def __str__(self):
        wrapper = MARKDOWN.get(self.tag)
        self._result = '{}{}{}'.format(wrapper[0], self.content, wrapper[1])
        return self._result
    def parse_inline(self):
        for tag, pattern in INLINE_ELEMENTS.items():
            if tag == 'a':
                self.content = re.sub(pattern, '[\g<2>](\g<1>)', self.content)
            elif tag == 'img':
                self.content = re.sub(pattern, '![\g<2>](\g<1>)', self.content)
            elif self.tag == 'ul' and tag == 'li':
                self.content = re.sub(pattern, '- \g<1>', self.content)
            elif self.tag == 'ol' and tag == 'li':
                self.content = re.sub(pattern, '1. \g<1>', self.content)
            elif self.tag == 'thead' and tag == 'tr':
                self.content = re.sub(pattern, '\g<1>\n', self.content.replace('\n', ''))
            elif self.tag == 'tr' and tag == 'th':
                self.content = re.sub(pattern, '|\g<1>', self.content.replace('\n', ''))
            elif self.tag == 'tr' and tag == 'td':
                self.content = re.sub(pattern, '|\g<1>', self.content.replace('\n', ''))
            else:
                wrapper = MARKDOWN.get(tag)
                self.content = re.sub(pattern, '{}\g<1>{}'.format(wrapper[0], wrapper[1]), self.content)
 class Tomd:
    def __init__(self, html='', options=None):
        self.html = html
        self.options = options
        self._markdown = ''
    def convert(self, html, options=None):
        elements = []
        for tag, pattern in BlOCK_ELEMENTS.items():
            for m in re.finditer(pattern, html, re.I | re.S | re.M):
                element = Element(start_pos=m.start(),
                                  end_pos=m.end(),
                                  content=''.join(m.groups()),
                                  tag=tag,
                                  is_block=True)
                can_append = True
                for e in elements:
                    if e.start_pos < m.start() and e.end_pos > m.end():
                        can_append = False
                    elif e.start_pos > m.start() and e.end_pos < m.end():
                        elements.remove(e)
                if can_append:
                    elements.append(element)
        elements.sort(key=lambda element: element.start_pos)
        self._markdown = ''.join([str(e) for e in elements])
        for index, element in enumerate(DELETE_ELEMENTS):
            self._markdown = re.sub(element, '', self._markdown)
        return self._markdown
    @property
    def markdown(self):
        self.convert(self.html, self.options)
        return self._markdown
 _inst = Tomd()
 convert = _inst.convert
--- a/username.txt
+++ b/username.txt
@ -0,0 +1 @@
 ds19991999
		`@ -0,0 +1,3 @@`
							`# 3.原创：Jekyll 博客 Netlify CMS 后台部署`

							`### 文章目录`