安装请求库
1 2 3 4 5 6 7 8 9 10 11 12
| [root@localhost Python-3.6.6]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) [root@localhost Python-3.6.6]# uname -a Linux localhost.localdomain 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@localhost Python-3.6.6]# getenforce Disabled [root@localhost Python-3.6.6]# systemctl status firewalld.service ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) [root@localhost Python-3.6.6]#
|
1 2
| pip3 install requests pip3 install selenium
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| yum install Xvfb yum install libXfont yum install xorg-x11-fonts* vim /etc/yum.repos.d/google.repo [google] name=Google-x86_64 baseurl=http://dl.google.com/linux/rpm/stable/x86_64 enabled=1 gpgcheck=0 gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub yum install google-chrome-stable yum install GConf2-3.2.6-8.el7.x86_64 wget http://chromedriver.storage.googleapis.com/70.0.3538.67/chromedriver_linux64.zip unzip chromedriver_linux64.zip mv chromedrive /usr/bin chmod +x /usr/bin/chromedrive
chromedriver Starting ChromeDriver (v2.9.248304) on port 9515
#验证 python3 >>> from selenium import webdriver >>> browser = webdriver.Chrome() #会弹出一个空白的chrome
#默认情况下root用户不能调用chrome,建议为chrome建立一个单独用户
|
1 2 3 4 5 6 7 8 9 10
| yum install firefox wget https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz tar xf geckodriver-v0.23.0-linux64.tar.gz -C /usr/bin chmod +x geckodriver
#验证 python3 >>> from selenium import webdriver >>> browser = webdriver.Firefox() #会弹出一个空白的Firefox
|
以上,我们就可以利用chrome或者firefox进行网页抓取了,但是这样会有一个问题:因为程序的运行过程中需要一直开着浏览器。所以我们可以选用无界面的浏览器PhantomJS。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 tar xf https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 cd phantomjs-2.1.1-linux-x86_64/bin mv phantomjs /usr/bin/ chmod +x /usr/bin/phantomjs
#验证 python3 >>> from selenium import webdriver >>> browser = webdriver.PhantomJS() >>> browser.get('https://www.baidu.com') >>> print (browser.current_url) https://www.baidu.com/ >>> #此时,不会打开浏览器,但是通过print打印了请求地址。说明可以正常使用。
|
aiohttp是一种类似requests的请求库,区别在于,aiohttp是一个提供异步web服务的库。
安装方式如下:
1 2
| pip3 install aiohttp pip3 install cchardet aiodns #字符编码检测库及加速DNS解析的库
|