commit demo

2019-10-19 19:23:00 +08:00 · 2019-10-19 19:23:00 +08:00 · db79d3c171
parent 604709b719
commit db79d3c171
19 changed files with 1029 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1 +1,62 @@
-# csdn-spider
+# CSDN 爬虫脚本
+
+主要功能：爬取 `csdn` 博客指定用户的所有博文并转换为 `markdown` 格式保存到本地。
+
+## 一、运行环境
+
+需要安装`WebDriver`驱动，https://chromedriver.chromium.org/downloads，下载与本地对应的`chrome`驱动后，将其添加至环境变量`$PATH`
+
+```shell
+python3
+python3 -m pip install -r requirements.txt
+```
+
+## 二、获取脚本
+
+```shell
+git clone https://github.com/ds19991999/csdn-spider.git
+```
+
+## 三、用法
+
+### 1.获取cookie
+
+登录 `csdn` 账号，进入：https://blog.csdn.net ，按 `F12` 调试网页，复制所有的 `Request Headers`，保存到`cookie.txt`文件中
+
+![1571482112632](assets/1571482112632.png)
+
+### 2.添加需要爬取的 `csdn` 用户
+
+在`username.txt`中添加用户名，一行一个
+
+### 3.运行脚本
+
+```shell
+python3 csdn.py
+```
+
+## 四、效果
+
+**运行过程**
+
+![1571483423256](assets/1571483423256.png)
+
+**文章列表建立**：`./articles/username/README.md`
+
+![1571483552438](assets/1571483552438.png)
+
+**爬取的博文**：`./articles/username/`
+
+![1571483479356](assets/1571483479356.png)
+
+**博文转换效果**：
+
+![1571483777703](assets/1571483777703.png)
+
+## 五、LICENSE
+
+<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
+
+
+
+`PS`：随意写的爬虫脚本，佛系更新。
--- a/articles/ds19991999/1.原创-Debian快速手动安装JupyterLab并配置Https.md
+++ b/articles/ds19991999/1.原创-Debian快速手动安装JupyterLab并配置Https.md
@ -0,0 +1,186 @@
+# 1.原创：Debian快速手动安装JupyterLab并配置Https
+
+很久之前我写过一篇关于`Jupyer lab`得超详细安装教程，[`传送门`](https://www.creat.kim/archives/25/)，感觉复杂了点，特别是`nginx`，我这块也没写清楚，所以不少人出现了无法运行`python`的情况，按照教程一步步来是绝对不会出问题的。有时候，虽然你能够用`https`访问，但是不代表就能运行，因为这里`jupyter lab`是基于`websocket`通信的，不是`http`。这里就再简化一下，用`Debian`系统安装一下`Jupyter Lab`，并使用`caddy`配置`https`访问，亲测可以运行程序。本教程只包括`Pytho2`内核，要同时安装`Python3`见[`传送门`](https://www.creat.kim/archives/25/)，这里简单写下步骤，快速上手，避免花费过多时间，一次成功，速度还蛮快的. demo: [https://jupyter.creat.kim](https://jupyter.creat.kim)<br/>
+<img alt="" src="http://image.creat.kim/picgo/20190326142651.png"/><br/>
+<img alt="" src="http://image.creat.kim/picgo/20190326151655.png"/>
+
+```
+sudo apt-get install software-properties-common
+
+```
+
+## 安装`Python`环境
+
+```
+sudo apt-get install python-pip python-dev build-essential 
+sudo pip install --upgrade pip 
+sudo pip install --upgrade virtualenv 
+sudo apt-get install python-setuptools python-dev build-essential 
+sudo easy_install pip 
+sudo pip install --upgrade virtualenv 
+sudo apt-get install python3-pip
+sudo apt-get install python-pip
+sudo pip3 install --upgrade pip
+sudo pip2 install --upgrade pip
+sudo pip install --upgrade pip
+
+```
+
+## 查看`pip`指向
+
+```
+~ $which pip
+/usr/local/bin/pip
+21:36 alien@alien-Inspiron-3443:
+~ $which pip2
+/usr/local/bin/pip2
+21:36 alien@alien-Inspiron-3443:
+~ $which pip3
+/usr/local/bin/pip3
+
+```
+
+## 安装`yarn`
+
+```
+curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
+echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
+sudo apt-get update
+sudo apt-get install yarn
+
+```
+
+## 安装`nodejs`
+
+```
+curl -sL https://deb.nodesource.com/setup_10.x | bash -
+apt-get install -y nodejs
+
+```
+
+## 安装`jupyterlab`
+
+```
+sudo pip2 install jupyterlab
+
+```
+
+## 配置`jupyerlab`
+
+```
+jupyter-notebook password
+
+```
+
+进入`ipython`设置哈希密码，这里输入的是你登陆`jupyter lab`的密码，记下生成的哈希密码.
+
+```
+ipython
+from notebook.auth import passwd
+passwd()
+# 输入你自己设置登录JupyterLab界面的密码，
+# 然后就会生产下面这样的密码，将它记下来，待会儿用
+'sha1:b92f3fb7d848:a5d40ab2e26aa3b296ae1faa17aa34d3df351704'
+
+```
+
+## 编辑配置文件
+
+一般在`/root/.jupyter/jupyter_notebook_config.py`中，找到并修改以下配置项。
+
+```
+c.NotebookApp.allow_root = True
+c.NotebookApp.ip = '0.0.0.0'
+c.NotebookApp.notebook_dir = u'/root/JupyterLab'
+c.NotebookApp.open_browser = False
+c.NotebookApp.password = u'sha1:b92f3fb7d848:a5d40ab2e26aa3b296ae1faa17aa34d3df351704'
+c.NotebookApp.port = 8888
+
+# 解释以上各项
+允许以root方式运行jupyterlab
+允许任意ip段访问
+设置jupyterlab页面的根目录
+默认运行时不启动浏览器，因为服务器默认只有终端嘛
+设置之前生产的哈希密码
+设置访问端口，与下面的caddy需一致
+
+```
+
+## 运行`Jupyter Lab`
+
+```
+jupyter-lab --version
+jupyter lab build
+
+mkdir ~/JupyterLab
+cd ~/JupyterLab
+
+# 方便后台运行
+apt install screen
+screen -S jupterlab
+jupyter lab 
+
+```
+
+`ctrl+A+D`退出这个窗口。
+
+## `caddy`开启`https`反代
+
+域名改成你自己的，`caddy`详细使用见：[`【传送门】`](https://www.creat.kim/archives/18/)
+
+```
+wget -N --no-check-certificate https://raw.githubusercontent.com/ds19991999/shell.sh/shell/caddy_install.sh &amp;&amp; chmod +x caddy_install.sh &amp;&amp; bash caddy_install.sh
+
+echo "jupyter.creat.kim
+ gzip
+ tls cva.engineer.ding@gmail.com
+ proxy / 127.0.0.1:8888 {
+  transparent
+  websocket
+ }" &gt; /usr/local/caddy/Caddyfile
+
+```
+
+## 定时备份到`GitHub`
+
+见大佬写的比较详细的文章：[`【传送门】`](https://www.moerats.com/archives/858/)
+
+## 配置`python2`和`python3`内核
+
+好人做到底吧，这里肯定很多人踩坑。。。用`pip`安装包的时候千万不要用`pip3 install ***`或者`pip2 install ***`呀.
+
+```
+python2 -m pip install ipykernel ipython matplotlib scipy pandas numpy
+python3 -m pip install ipykernel ipython matplotlib scipy pandas numpy
+
+```
+
+检查一下内核
+
+```
+root@google:~/JupyterLab# jupyter kernelspec list
+Available kernels:
+  python2    /usr/local/share/jupyter/kernels/python2
+  python3    /usr/local/share/jupyter/kernels/python3
+
+```
+
+好了，访问域名，开始使用吧。
+
+---
+
+
+## 最后一点思悟
+
+大概这是我发在`CSDN`最后的博文了，本文来自 [https://www.creat.kim/archives/40/](https://www.creat.kim/archives/40/) ，不错，终于抛弃公共博客平台了。我在`CSDN`写了差不多一年半左右的博文吧，共`107`篇，其中`97`篇原(chao)创(xi)，`7`篇转载，`2`篇私密，`1`篇因违反相关政策被管理员设为私密 … 博客`CSDN`排名`10k+`，访问量`225k+`，粉丝数`48`，表现平平，博文水平一般，算是代表了大部分人吧。
+
+国内的博客平台其实都不错，`CSDN` 的写作体验也非常好，我曾经也一度在自己的博客平台或者公共博客平台之间徘徊，慢慢的最初写博客的意义就变味了，不过经历过这个过程，大概就明白了一些事吧。
+
+在尝试`WordPress` 、`知乎` 、`简书`、`博客园`、`新浪`、`GitHub-Jekyll` 、`coding-jekyll`、`hexo` 、`Typecho`…之后，了解了一些网站运行常识，最起码知道国内的都是需要备案的 …<br/>
+在图床方面，从最初的直接复制粘贴到`GitHub`+`PicGo`、`又拍云` (需要备案)、`七牛云`（需要备案）、自建图床…明白了一些`CDN`加速技巧 …<br/>
+在文档方面，从最初的直接编辑，到`CSDN`的`MarkDown`编辑器、`有道云笔记`、`Evernote`(分国外国内版本)、`GitHub-README`、`GitBook`、`MkDoc`、`Read the Docs`、`Sphinx`、`Docsify`，明白了孰能生巧，熟练的话，什么文本都能写的漂亮，虽然我至今不会`Vim` …<br/>
+在服务器选择上面，国内和国外的差异，也了解了不少，也越来越深恶痛绝 `install` 一个包或者一个`程序`的时候，你就那么几`k`几`b`的跑，国内源再怎么换，也比不上国外源的速度，有些网站虽然没有被`q`，你本地那速度受的了吗，现在也服气当初我是怎么忍受那龟一般的网速。看到过，了解过，才能从另一个角度看待问题，总比一直看被经过过滤的信息强吧。
+
+再看看国外的教育福利，有人说是国外被中国人撸羊毛撸怕了，所以就不给中国提供教育福利。但是你看看国内大厂的教育福利，那服务器多便宜，我自己都眼馋，赶紧去每个厂注册一个号。要求实名，好，我实名，我传照片；要求备案，啥，还备案，好，我备案，我传照片，又是一个星期；这咋还有监测呢，忍不了了 … 这像不像裸贷，你只要用身份证实名，把自己的靓照交给他，他就给你提供廉价的服务器，这里说的有点过了，哈哈哈。前不久谷歌也要求中国IP注册地需要传照片了，唯独中国。国外在教育方面的投资我们真的要好好学习学习 …
+
+之前的`12306事件`、`蓝灯事件`、`某某数据库泄露`，真真假假假亦真。身在国内，就不得不用隐私换取便利。
--- a/articles/ds19991999/2.原创-解决套路云Debian新机update的时候出现Waiting
+++ b/articles/ds19991999/2.原创-解决套路云Debian新机update的时候出现Waiting
@ -0,0 +1,33 @@
+# 2.原创：解决套路云Debian新机update的时候出现Waiting for headers和404错误
+
+```
+rm -rf /root/.pip /root/.pydistutils.cfg /etc/apt/sources.list.d/sources-aliyun-0.list /etc/apt/sources.list.d/sources-aliyun* /var/lib/apt/lists/* 
+
+```
+
+```
+deb http://mirrors.cloud.aliyuncs.com/debian/ jessie main contrib non-free
+deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie main contrib non-free
+deb http://mirrors.cloud.aliyuncs.com/debian/ jessie-proposed-updates main non-free contrib
+deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie-proposed-updates main non-free contrib
+deb http://mirrors.cloud.aliyuncs.com/debian/ jessie-updates main contrib non-free
+deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie-updates main contrib non-free
+ 
+## Uncomment the following two lines to add software from the 'backports'
+## repository.
+##
+## N.B. software from this repository may not have been tested as
+## extensively as that contained in the main release, although it includes
+## newer versions of some applications which may provide useful features.
+#deb http://mirrors.cloud.aliyuncs.com/debian/ jessie-backports main contrib non-free
+#deb-src http://mirrors.cloud.aliyuncs.com/debian/ jessie-backports main contrib non-free
+
+```
+
+```
+apt-get clean
+apt-get update
+
+```
+
+套路云还是套路云，服气！！！
--- a/articles/ds19991999/3.原创-Jekyll
+++ b/articles/ds19991999/3.原创-Jekyll
@ -0,0 +1,3 @@
+# 3.原创：Jekyll 博客 Netlify CMS 后台部署
+
+### 文章目录
--- a/articles/ds19991999/4.原创-Let's
+++ b/articles/ds19991999/4.原创-Let's
@ -0,0 +1,71 @@
+# 4.原创：Let's Encrypt 泛域名证书申请
+
+> 
+github: [https://github.com/Neilpang/acme.sh](https://github.com/Neilpang/acme.sh)
+
+
+通过acme申请Let’s Encrypt证书支持的域名DNS服务商有以下这些（国内用户较多的）：`cloudxns、dnspod、aliyun（阿里云）、cloudflare、linode、he、digitalocean、namesilo、aws、namecom、freedns、godaddy、yandex` 等等。
+
+### 目录
+
+## [安装acm.sh](http://xn--acm-pd0fq01r.sh)
+
+```
+curl  https://get.acme.sh | sh
+
+```
+
+`acme.sh`被安装在了`~./.acme.sh`，创建 一个 `bash` 的 `alias`, 方便你的使用: `alias acme.sh=~/.acme.sh/acme.sh`
+
+通过`acme.sh`安装的证书会自动为你创建 `cronjob`, 每天 0:00 点自动检测所有的证书, 如果快过期了, 需要更新, 则会自动更新证书.
+
+## DNS方式验证域名所有权
+
+```
+acme.sh  --issue  --dns   -d mydomain.com
+
+```
+
+`acme.sh` 会生成相应的解析记录显示出来, 你只需要在你的域名管理面板中添加这条 `txt` 记录即可.
+
+## 获取`DNS API`
+
+获取`DNS`域名商的`DNS API` ，`api` 也会将 上面的`txt` 记录自动添加到域名解析商。比喻阿里的`api`：[https://ak-console.aliyun.com/#/accesskey](https://ak-console.aliyun.com/#/accesskey) ，然后看说明进行配置 [https://github.com/Neilpang/acme.sh/tree/master/dnsapi](https://github.com/Neilpang/acme.sh/tree/master/dnsapi) 阿里的就是：
+
+```
+export Ali_Key="sdfsdfsdfljlbjkljlkjsdfoiwje"
+export Ali_Secret="jlsdflanljkljlfdsaklkjflsa"
+acme.sh --issue --dns dns_ali -d example.com -d *.example.com
+
+```
+
+这个`*`值的就是泛域名。运行一次之后Ali_Key和Ali_Secret将被保存`~/.acme.sh/account.conf`，生成的SSL证书目录在`~/.acme.sh/example.com`
+
+## 安装证书
+
+> 
+详见：[copy/安装 证书](https://github.com/Neilpang/acme.sh/wiki/%E8%AF%B4%E6%98%8E#3-copy%E5%AE%89%E8%A3%85-%E8%AF%81%E4%B9%A6)
+
+
+使用 `--installcert` 命令,并指定目标位置, 然后证书文件会被copy到相应的位置, 例如:
+
+```
+acme.sh  --installcert  -d  &lt;domain&gt;.com   \
+        --key-file   /etc/nginx/ssl/&lt;domain&gt;.key \
+        --fullchain-file /etc/nginx/ssl/fullchain.cer \
+        --reloadcmd  "service nginx force-reload"
+
+```
+
+宝塔用户在SSL选项选择其他证书，把SSL证书内容粘贴上面去就行了<br/>
+<img alt="" src="http://image.creat.kim/picgo/20190314132922.png"/><br/>
+这里改一下证书路径<br/>
+<img alt="" src="http://image.creat.kim/picgo/20190314132617.png"/><br/>
+目前证书在 60 天以后会自动更新, 你无需任何操作. 今后有可能会缩短这个时间, 不过都是自动的, 你不用关心.
+
+## 更新 `acme.sh`
+
+自动更新：`acme.sh --upgrade --auto-upgrade`<br/>
+关闭更新：`acme.sh --upgrade --auto-upgrade 0`
+
+有问题看 [wiki](https://github.com/Neilpang/acme.sh/wiki) 和 [dubug](https://github.com/Neilpang/acme.sh/wiki/How-to-debug-acme.sh)
--- a/articles/ds19991999/5.原创-Rclone笔记.md
+++ b/articles/ds19991999/5.原创-Rclone笔记.md
@ -0,0 +1,181 @@
+# 5.原创：Rclone笔记
+
+> 
+
+
+
+### 目录
+
+## 一些简单命令
+
+### 挂载
+
+```
+# windows 挂载命令
+rclone mount OD:/ H: --cache-dir E:\ODPATH --vfs-cache-mode writes &amp;
+# linux 挂载命令
+nohup rclone mount GD:/ /root/GDPATH --copy-links --no-gzip-encoding --no-check-certificate --allow-other --allow-non-empty --umask 000 &amp;
+# 取消挂载————linux 通用
+fusermount -qzu /root/GDPATH 或者
+fusermount -u /path/to/local/mount
+# windows 取消挂载
+umount /path/to/local/mount
+
+```
+
+### rclone命令
+
+```
+rclone ls
+
+eg____rclone ls remote:path [flags]
+ls # 递归列出 remote 所有文件及其大小，有点类似 tree 命令
+lsl # 递归列出 remote 所有文件、大小及修改时间
+lsd # 仅仅列出文件夹的修改时间和文件夹内的文件个数
+
+lsf # 列出当前层级的文件或文件夹名称
+lsjson # 以JSON格式列出文件和目录
+
+
+rclone copy
+
+eg____rclone copy OD:/SOME/PATH GD:/OTHER/PATH
+--no-traverse # /path/to/src中有很多文件，但每天只有少数文件发生变化，加上这个参数可以提高传输速度
+-P # 实时查看传输统计信息
+--max-age 24h # 仅仅传输24小时内修改过的文件，默认关闭
+rclone copy --max-age 24h --no-traverse /path/to/src remote:/PATH -P
+
+rclone sync
+eg____rclone sync source:path dest:path [flags]
+# 使用该命令时先用 --dry-run 测试，明确要复制和删除的内容
+
+rclone delete
+# 列出大于 100M 的文件
+rclone --min-size 100M lsl remote:path
+# 删除测试
+rclone --dry-run --min-size 100M delete remote:path
+# 删除
+rclone --min-size 100M delete remote:path
+
+# 删除路径及其所有内容，filters此时无效，这与 delete 不同
+rclone purge
+
+# 删除空路径
+rclone rmdir
+
+# 删除路径下的空目录
+rclone rmdirs
+
+# 移动文件
+rclone move
+# 移动后删除空源目录
+--delete-empty-src-dirs
+
+# 检查源和目标匹配中的文件
+rclone check
+# 从两个源下载数据并在运行中互相检查它们而不是哈希检测
+--download
+
+rclone md5sum
+# 为路径中的所有文件生成md5sum文件
+rclone sha1sum
+# 为路径中的所有文件生成sha1sum文件
+rclone size
+# 在remote：path中打印该路径下的文件总大小和数量
+--json # 输出json格式
+rclone version --check #检查版本更新
+rclone cleanup # 清理源的垃圾箱或者旧版本文件
+
+rclone dedupe # 以交互方式查找重复文件并删除/重命名它们
+--dedupe-mode newest - 删除相同的文件，然后保留最新的文件,非交互方式
+
+rclone cat
+# 同linux
+
+rclone copyto
+# 将文件从源复制到dest，跳过已复制的文件
+
+rclone gendocs output_directory [flags] 
+# 生成rclone的说明文档
+
+rclone listremotes # 列出配置文件中所有源
+--long 显示类型和名称 默认只显示名称
+
+rclone moveto
+# 不会传输未更改的文件
+
+rclone cryptcheck /path/to/files encryptedremote:path
+# 检查加密源的完整性
+
+rclone about
+# 获取源的配额 ，eg
+$ rclone about ODA1P1:
+Total:   5T
+Used:    284.885G
+Free:    4.668T
+Trashed: 43.141G
+--json # 以 json 格式输出
+
+
+rclone mount # 挂载命令
+
+# 在Windows使用则需要安装winfsp
+--vfs-cache-mode # 不使用该参数，只能按顺序写入文件，只能在读取时查找，即windows程序无法操作文件，使用该参数即启用缓存机制
+# 共四种模式：off|minimal|writes|full 缓存模式越高，rclone越多，代价是使用磁盘空间，默认为full
+--vfs-cache-max-age 24h # 缓存24小时内修改过的文件
+--vfs-cache-max-size 10g # 最大总缓存10g (缓存可能会超过此大小)
+--cache-dir 指定缓存位置
+--umask int 覆盖文件系统权限
+--allow-non-empty 允许挂载在非空目录
+--allow-other 允许其他用户访问
+--no-check-certificate 不检查服务器SSL证书
+--no-gzip-encoding 不设置接受gzip编码
+
+```
+
+## 用自己的 api 进行 gd 转存
+
+> 
+见这位大佬博客：[https://www.moerats.com/archives/877/](https://www.moerats.com/archives/877/)
+
+
+使用 `rclone` 的人太多吉会有一个问题，我们使用的是共享的`client_id`，在高峰期会出现`403`或者还没到`750G`限制就出现`Limitations`问题，所以高频率使用`rclone`转存谷歌文件得朋友就需要使用自己的`api`。通过上面那篇文章给出的方法获取谷歌的 API 客户端`ID`和客户端密钥，`rclone config`命令配置的时候，会有部分提示你输入，直接粘贴就`OK`.
+
+挂载就变成：
+
+```
+#该参数主要是上传用的
+/usr/bin/rclone mount DriveName:Folder LocalFolder \
+ --umask 0000 \
+ --default-permissions \
+ --allow-non-empty \
+ --allow-other \
+ --transfers 4 \
+ --buffer-size 32M \
+ --low-level-retries 200
+
+#如果你还涉及到读取使用，比如使用H5ai等在线播放，就还建议加3个参数，添加格式参考上面
+--dir-cache-time 12h
+--vfs-read-chunk-size 32M
+--vfs-read-chunk-size-limit 1G
+
+```
+
+## 突破 Google Drive 服务端 750g 限制
+
+谷歌官方明确限制通过第三方`api`每天限制转存`750G`文件，这个 `750G` 是直接通过谷歌服务端进行，文件没有经过客户端，另外经过客户端上传到 `gd` 与 服务端转存不冲突，官方也有 `750G` 限制，所以每天上传限额一共是 `1.5T`
+
+```
+# 一般用法，使用服务端API，不消耗本地流量
+rclone copy GD1:/PATH GD2:/PATH
+
+# disable server side copies 使用客户端 API，流量走客户端
+rclone --disable copy GD1:/PATH GD2:/PATH
+
+```
+
+这样就是每天 `1.5T` 了。
+
+## 谷歌文档限制
+
+在 `rclone ls` 中谷歌文档会出现 `-1`， 而对于其他 `VFS` 层文件显示 `0` ，比喻通过 `rclone mount`，`rclone serve`操作的文件。而我们用 `rclone sync`，`rclone copy`的命令时，它会忽略文档大小而直接操作。也就是说如果你没有下载谷歌文档，就不知道它多大，没啥影响…
--- a/articles/ds19991999/6.转载-Office365
+++ b/articles/ds19991999/6.转载-Office365
@ -0,0 +1,7 @@
+# 6.转载：Office365 PC版修改更新频道
+
+Office 365 PC版 默认为半年更新频道，可以修改为每月更新频道或其他频道，以体验最新功能。
+
+> 
+原文链接：[https://www.mr-technos.com/forum.php?mod=viewthread&amp;tid=79](https://www.mr-technos.com/forum.php?mod=viewthread&amp;tid=79)
+
--- a/articles/ds19991999/7.原创-转存百度盘到gd-od的解决方案.md
+++ b/articles/ds19991999/7.原创-转存百度盘到gd-od的解决方案.md
@ -0,0 +1,91 @@
+# 7.原创：转存百度盘到gd/od的解决方案
+
+**首页：**[HomePage](https://telegra.ph/HomePage-01-03)<br/>[https://telegra.ph/Fuck-PanBaidu-02-19](https://telegra.ph/Fuck-PanBaidu-02-19) <br/>[https://graph.org/Fuck-PanBaidu-02-19](https://graph.org/Fuck-PanBaidu-02-19)
+
+### 一、安装aria2
+
+```
+wget -N https://git.io/aria2.sh &amp;&amp; chmod +x aria2.sh &amp;&amp; bash aria2.sh
+
+```
+
+启动：/etc/init.d/aria2 start
+
+停止：/etc/init.d/aria2 stop
+
+重启：/etc/init.d/aria2 restart
+
+查看状态：/etc/init.d/aria2 status
+
+配置文件：/root/.aria2/aria2.conf （配置文件包含中文注释，但是一些系统可能不支持显示中文）
+
+令牌密匙：随机生成（可以改配置文件）
+
+默认下载目录：/root/Download
+
+### 二、aria2离线gd/od方案
+
+1、安装rclone
+
+```
+curl https://rclone.org/install.sh | sudo bash
+
+```
+
+rclone配置可以参考：[https://rclone.org/drive/](https://rclone.org/drive/)
+
+2、修改脚本 **/root/.aria2/autoupload.sh**
+
+```
+- name='Onedrive' #配置Rclone时的name- folder='/DRIVEX/Download' #网盘里的文件夹，留空为网盘根目录。
+```
+
+3、修改aria2配置文件：**/root/.aria2/aria2.conf 启用文件下载完成后脚本：**
+
+```
+- # 调用 rclone 上传(move)到网盘- on-download-complete=/root/.aria2/autoupload.sh
+```
+
+4、重启 aria2
+
+```
+- /root/aria2.sh  选6重启- 或者运行：service aria2 restart
+```
+
+5、使用aria2前端面板进行文件下载：[aria2.ml](http://aria2.ml/)
+
+填好vps端的aria2配置信息
+
+ 
+
+点击新建粘贴下载链接进行文件下载
+
+ 
+
+下载的文件会自动上传到gd/od
+
+### 三、利用第三方百度盘
+
+这里推荐速盘，可惜PanDownload没有开放aria2配置
+
+ 
+
+如图，修改下载文件保存位置，GUI界面无法修改，请先退出软件，在config.ini文件中进行修改：
+
+ 
+
+ 
+
+其中下载文件保存位置与远程服务器的aria2的配置一样，比喻此方式安装的aria2就是**/root/Download**
+
+于是就可以把你的百度网盘文件直接下载到gd/od中了。
+
+### 四、效果图
+
+1.使用AriaNG面板下载文件到VPS，利用**autoupload.sh脚本实现gd离线下载电影**
+
+ 
+
+2.利用速盘远程aria2的功能实现将百度网盘文件远程下载到VPS，再利用**autoupload.sh脚本实现自动转存到gd**
+
+ 
--- a/articles/ds19991999/README.md
+++ b/articles/ds19991999/README.md
@ -0,0 +1,15 @@
+# ds19991999 的博文
+1. [原创：Debian快速手动安装JupyterLab并配置Https](https://blog.csdn.net/ds19991999/article/details/88935996)
+2. [原创：解决套路云Debian新机update的时候出现Waiting for headers和404错误](https://blog.csdn.net/ds19991999/article/details/88659452)
+3. [原创：Jekyll 博客 Netlify CMS 后台部署](https://blog.csdn.net/ds19991999/article/details/88651187)
+4. [原创：Let's Encrypt 泛域名证书申请](https://blog.csdn.net/ds19991999/article/details/88553810)
+5. [原创：Rclone笔记](https://blog.csdn.net/ds19991999/article/details/88370053)
+6. [转载：Office365 PC版修改更新频道](https://blog.csdn.net/ds19991999/article/details/87973325)
+7. [原创：转存百度盘到gd/od的解决方案](https://blog.csdn.net/ds19991999/article/details/87736377)
+8. [原创：以WebDav方式挂载OneDrive](https://blog.csdn.net/ds19991999/article/details/86506042)
+9. [原创：接码平台分享](https://blog.csdn.net/ds19991999/article/details/86505762)
+10. [原创：CSDN自定义友链侧边栏](https://blog.csdn.net/ds19991999/article/details/86505686)
+11. [原创：资源分享](https://blog.csdn.net/ds19991999/article/details/85225611)
+12. [原创：Windows上挂载OneDrive为本地硬盘](https://blog.csdn.net/ds19991999/article/details/85008885)
+13. [原创：Ubuntu使用日常](https://blog.csdn.net/ds19991999/article/details/83719417)
+14. [原创：彻底解决Ubuntu联网问题——网速飞起](https://blog.csdn.net/ds19991999/article/details/83715489)
--- a/assets/1571482112632.png
+++ b/assets/1571482112632.png
--- a/assets/1571483423256.png
+++ b/assets/1571483423256.png
--- a/assets/1571483479356.png
+++ b/assets/1571483479356.png
--- a/assets/1571483552438.png
+++ b/assets/1571483552438.png
--- a/assets/1571483777703.png
+++ b/assets/1571483777703.png
--- a/cookie.txt
+++ b/cookie.txt
@ -0,0 +1,13 @@
+Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
+Accept-Encoding: gzip, deflate, br
+Accept-Language: zh-CN,zh;q=0.9
+Cache-Control: max-age=0
+Connection: keep-alive
+Cookie: acw_tc=2760829715714827204377171e8e9dc3a79185500e46805511b2c277adf1fb; acw_sc__v3=5daaec608ce6c5ba1fab0c4137c00ecb0cd34525; uuid_tt_dd=10_2450623130-1571482720624-229726; dc_session_id=10_1571482720624.999633; acw_sc__v2=5daaec6067f5ec51b728d2bd7660bf7372ed8903; TY_SESSION_ID=c82ca68f-e408-4c15-b681-71da67f637c2; dc_tos=pzmbtt; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1571482722; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1571482722; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_2450623130-1571482720624-229726; c-login-auto=1; announcement=%257B%2522announcementUrl%2522%253A%2522https%253A%252F%252Fblogdev.blog.csdn.net%252Farticle%252Fdetails%252F102605809%2522%252C%2522announcementCount%2522%253A1%252C%2522announcementExpire%2522%253A527116621%257D
+Host: blog.csdn.net
+Referer: https://blog.csdn.net/
+Sec-Fetch-Mode: navigate
+Sec-Fetch-Site: same-origin
+Sec-Fetch-User: ?1
+Upgrade-Insecure-Requests: 1
+User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36
--- a/csdn.py
+++ b/csdn.py
@ -0,0 +1,208 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+import os, time, re
+import requests
+import threading
+import logging
+from bs4 import BeautifulSoup, Comment
+from selenium import webdriver
+from tomd import Tomd
+
+
+def result_file(folder_name, file_name):
+	folder = os.path.join(os.path.dirname(os.path.realpath(__file__)), "articles", folder_name)
+	if not os.path.exists(folder):
+		os.makedirs(folder)
+		path = os.path.join(folder, file_name)
+		file = open(path,"w")
+		file.close()
+	else:
+		path = os.path.join(folder, file_name)
+	return path
+
+
+def get_headers(cookie_path:str):
+	cookies = {}
+	with open(cookie_path, "r", encoding="utf-8") as f:
+		cookie_list = f.readlines()
+	for line in cookie_list:
+		cookie = line.split(":")
+		cookies[cookie[0]] = str(cookie[1]).strip()
+	return cookies
+
+
+def delete_ele(soup:BeautifulSoup, tags:list):
+	for ele in tags:
+		for useless_tag in soup.select(ele):
+			useless_tag.decompose()
+
+
+def delete_ele_attr(soup:BeautifulSoup, attrs:list):
+	for attr in attrs:
+		for useless_attr in soup.find_all():
+			del useless_attr[attr]
+
+
+def delete_blank_ele(soup:BeautifulSoup, eles_except:list):
+	for useless_attr in soup.find_all():
+		try:
+			if useless_attr.name not in eles_except and useless_attr.text == "":
+				useless_attr.decompose()
+		except Exception:
+			pass
+
+
+class TaskQueue(object):
+	def __init__(self):
+		self.VisitedList = []
+		self.UnVisitedList = []
+	
+	def getVisitedList(self):
+		return self.VisitedList
+
+	def getUnVisitedList(self):
+		return self.UnVisitedList
+	
+	def InsertVisitedList(self, url):
+		if url not in self.VisitedList:
+			self.VisitedList.append(url)
+	
+	def InsertUnVisitedList(self, url):
+		if url not in self.UnVisitedList:
+			self.UnVisitedList.append(url)
+	
+	def RemoveVisitedList(self, url):
+		self.VisitedList.remove(url)
+
+	def PopUnVisitedList(self,index=0):
+		url = ""
+		if index and self.UnVisitedList:
+			url = self.UnVisitedList[index]
+			del self.UnVisitedList[:index]
+		elif self.UnVisitedList:
+			url = self.UnVisitedList.pop()
+		return url
+	
+	def getUnVisitedListLength(self):
+		return len(self.UnVisitedList)
+
+
+class Article(object):
+	def __init__(self):
+		self.options = webdriver.ChromeOptions()
+		self.options.add_experimental_option('excludeSwitches', ['enable-logging'])
+		self.options.add_argument('headless')
+		self.browser = webdriver.Chrome(options=self.options)
+		# 设置全局智能等待时间
+		self.browser.implicitly_wait(30)
+	
+	def get_content(self, url):
+		self.browser.get(url)
+		try:
+			self.browser.find_element_by_xpath('//a[@class="btn-readmore"]').click()
+		except Exception:
+			pass
+		content = self.browser.find_element_by_xpath('//div[@id="content_views"]').get_attribute("innerHTML")
+		return content
+	
+	def get_md(self, url):
+		"""
+		转换为markdown格式
+		"""
+		content = self.get_content(url)
+		soup = BeautifulSoup(content, 'lxml')
+		# 删除注释
+		for useless_tag in soup(text=lambda text: isinstance(text, Comment)):
+			useless_tag.extract()
+		# 删除无用标签
+		tags = ["svg", "ul", ".hljs-button.signin"]
+		delete_ele(soup, tags)
+		# 删除标签属性
+		attrs = ["class", "name", "id", "onclick", "style", "data-token", "rel"]
+		delete_ele_attr(soup,attrs)
+		# 删除空白标签
+		eles_except = ["img", "br", "hr"]
+		delete_blank_ele(soup, eles_except)
+		# 转换为markdown
+		md = Tomd(str(soup)).markdown
+		return md
+
+
+class CSDN(object):
+	def __init__(self, cookie_path):
+		self.headers = get_headers(cookie_path)
+		self.TaskQueue = TaskQueue()
+
+	def get_articles(self, username:str):
+		"""获取文章标题和链接"""
+		num = 0
+		while True:
+			num += 1
+			url = u'https://blog.csdn.net/' + username + '/article/list/' + str(num)
+			response = requests.get(url=url, headers=self.headers)
+			html = response.text
+			soup = BeautifulSoup(html, "html.parser")
+			articles = soup.find_all('div', attrs={"class":"article-item-box csdn-tracking-statistics"})
+			if len(articles) > 0:
+				for article in articles:
+					article_title = article.a.text.strip().replace('        ','：')
+					article_href = article.a['href']
+					yield article_title,article_href
+			else:
+				break
+	
+	def write_articals(self, username:str):
+		"""将博文写入本地"""
+		print("[++] 正在爬取 {} 的博文......".format(username))
+		artical = Article()
+		reademe_path = result_file(username,file_name="README.md")
+		with open(reademe_path,'w', encoding='utf-8') as reademe_file:
+			i = 1
+			readme_head = "# " + username + " 的博文\n"
+			reademe_file.write(readme_head)
+			for article_title,article_href in self.get_articles(username):
+				print("[++++] {}. 正在处理URL：{}".format(str(i), article_href))
+				text = str(i) + '. [' + article_title + ']('+ article_href +')\n'
+				reademe_file.write(text)
+				file_name = str(i) + "." + re.sub(r'[\/:：*?"<>|]','-', article_title) + ".md"
+				artical_path = result_file(folder_name=username, file_name=file_name)
+				md_content = artical.get_md(article_href)
+				md_head = "# " + str(i) + "." + article_title + "\n"
+				md = md_head + md_content
+				with open(artical_path, "w", encoding="utf-8") as artical_file:
+					artical_file.write(md)
+				i += 1
+				time.sleep(2)
+
+	def spider(self):
+		"""将爬取到的文章保存到本地"""
+		while True:
+			if self.TaskQueue.getUnVisitedListLength():
+				username = self.TaskQueue.PopUnVisitedList()
+				self.write_articals(username)
+	
+	def check_user(self, user_path:str):
+		with open(user_path, 'r', encoding='utf-8') as f:
+			users = f.readlines()
+		for user in users:
+			self.TaskQueue.InsertUnVisitedList(user.strip())
+
+	def run(self, user_path):
+		UserThread = threading.Thread(target=self.check_user, args=(user_path,))
+		SpiderThread = threading.Thread(target=self.spider, args=())
+		UserThread.start()
+		SpiderThread.start()
+		UserThread.join()
+		SpiderThread.join()
+
+
+def main():
+	user_path = 'username.txt'
+	csdn = CSDN('cookie.txt')
+	csdn.run(user_path)
+
+
+if __name__ == "__main__":
+	main()
+
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,3 @@
+bs4==0.0.1
+selenium==3.141.0
+requests==2.22.0
--- a/tomd.py
+++ b/tomd.py
@ -0,0 +1,155 @@
+import re
+
+__all__ = ['Tomd', 'convert']
+
+MARKDOWN = {
+    'h1': ('\n# ', '\n'),
+    'h2': ('\n## ', '\n'),
+    'h3': ('\n### ', '\n'),
+    'h4': ('\n#### ', '\n'),
+    'h5': ('\n##### ', '\n'),
+    'h6': ('\n###### ', '\n'),
+    'code': ('`', '`'),
+    'ul': ('', ''),
+    'ol': ('', ''),
+    'li': ('- ', ''),
+    'blockquote': ('\n> ', '\n'),
+    'em': ('**', '**'),
+    'strong': ('**', '**'),
+    'block_code': ('\n```\n', '\n```\n'),
+    'span': ('', ''),
+    'p': ('\n', '\n'),
+    'p_with_out_class': ('\n', '\n'),
+    'inline_p': ('', ''),
+    'inline_p_with_out_class': ('', ''),
+    'b': ('**', '**'),
+    'i': ('*', '*'),
+    'del': ('~~', '~~'),
+    'hr': ('\n---', '\n\n'),
+    'thead': ('\n', '|------\n'),
+    'tbody': ('\n', '\n'),
+    'td': ('|', ''),
+    'th': ('|', ''),
+    'tr': ('', '\n')
+}
+
+BlOCK_ELEMENTS = {
+    'h1': '<h1.*?>(.*?)</h1>',
+    'h2': '<h2.*?>(.*?)</h2>',
+    'h3': '<h3.*?>(.*?)</h3>',
+    'h4': '<h4.*?>(.*?)</h4>',
+    'h5': '<h5.*?>(.*?)</h5>',
+    'h6': '<h6.*?>(.*?)</h6>',
+    'hr': '<hr/>',
+    'blockquote': '<blockquote.*?>(.*?)</blockquote>',
+    'ul': '<ul.*?>(.*?)</ul>',
+    'ol': '<ol.*?>(.*?)</ol>',
+    'block_code': '<pre.*?><code.*?>(.*?)</code></pre>',
+    'p': '<p\s.*?>(.*?)</p>',
+    'p_with_out_class': '<p>(.*?)</p>',
+    'thead': '<thead.*?>(.*?)</thead>',
+    'tr': '<tr>(.*?)</tr>'
+}
+
+INLINE_ELEMENTS = {
+    'td': '<td>(.*?)</td>',
+    'tr': '<tr>(.*?)</tr>',
+    'th': '<th>(.*?)</th>',
+    'b': '<b>(.*?)</b>',
+    'i': '<i>(.*?)</i>',
+    'del': '<del>(.*?)</del>',
+    'inline_p': '<p\s.*?>(.*?)</p>',
+    'inline_p_with_out_class': '<p>(.*?)</p>',
+    'code': '<code.*?>(.*?)</code>',
+    'span': '<span.*?>(.*?)</span>',
+    'ul': '<ul.*?>(.*?)</ul>',
+    'ol': '<ol.*?>(.*?)</ol>',
+    'li': '<li.*?>(.*?)</li>',
+    'img': '<img.*?src="(.*?)".*?>(.*?)</img>',
+    'a': '<a.*?href="(.*?)".*?>(.*?)</a>',
+    'em': '<em.*?>(.*?)</em>',
+    'strong': '<strong.*?>(.*?)</strong>'
+}
+
+DELETE_ELEMENTS = ['<span.*?>', '</span>', '<div.*?>', '</div>']
+
+
+class Element:
+    def __init__(self, start_pos, end_pos, content, tag, is_block=False):
+        self.start_pos = start_pos
+        self.end_pos = end_pos
+        self.content = content
+        self._elements = []
+        self.is_block = is_block
+        self.tag = tag
+        self._result = None
+
+        if self.is_block:
+            self.parse_inline()
+
+    def __str__(self):
+        wrapper = MARKDOWN.get(self.tag)
+        self._result = '{}{}{}'.format(wrapper[0], self.content, wrapper[1])
+        return self._result
+
+    def parse_inline(self):
+        for tag, pattern in INLINE_ELEMENTS.items():
+
+            if tag == 'a':
+                self.content = re.sub(pattern, '[\g<2>](\g<1>)', self.content)
+            elif tag == 'img':
+                self.content = re.sub(pattern, '![\g<2>](\g<1>)', self.content)
+            elif self.tag == 'ul' and tag == 'li':
+                self.content = re.sub(pattern, '- \g<1>', self.content)
+            elif self.tag == 'ol' and tag == 'li':
+                self.content = re.sub(pattern, '1. \g<1>', self.content)
+            elif self.tag == 'thead' and tag == 'tr':
+                self.content = re.sub(pattern, '\g<1>\n', self.content.replace('\n', ''))
+            elif self.tag == 'tr' and tag == 'th':
+                self.content = re.sub(pattern, '|\g<1>', self.content.replace('\n', ''))
+            elif self.tag == 'tr' and tag == 'td':
+                self.content = re.sub(pattern, '|\g<1>', self.content.replace('\n', ''))
+            else:
+                wrapper = MARKDOWN.get(tag)
+                self.content = re.sub(pattern, '{}\g<1>{}'.format(wrapper[0], wrapper[1]), self.content)
+
+
+class Tomd:
+    def __init__(self, html='', options=None):
+        self.html = html
+        self.options = options
+        self._markdown = ''
+
+    def convert(self, html, options=None):
+        elements = []
+        for tag, pattern in BlOCK_ELEMENTS.items():
+            for m in re.finditer(pattern, html, re.I | re.S | re.M):
+                element = Element(start_pos=m.start(),
+                                  end_pos=m.end(),
+                                  content=''.join(m.groups()),
+                                  tag=tag,
+                                  is_block=True)
+                can_append = True
+                for e in elements:
+                    if e.start_pos < m.start() and e.end_pos > m.end():
+                        can_append = False
+                    elif e.start_pos > m.start() and e.end_pos < m.end():
+                        elements.remove(e)
+                if can_append:
+                    elements.append(element)
+
+        elements.sort(key=lambda element: element.start_pos)
+        self._markdown = ''.join([str(e) for e in elements])
+
+        for index, element in enumerate(DELETE_ELEMENTS):
+            self._markdown = re.sub(element, '', self._markdown)
+        return self._markdown
+
+    @property
+    def markdown(self):
+        self.convert(self.html, self.options)
+        return self._markdown
+
+
+_inst = Tomd()
+convert = _inst.convert
--- a/username.txt
+++ b/username.txt
@ -0,0 +1 @@
+ds19991999