This article introduce how to use PhantomJS and Selenium to do headless Browser Testing and web spider
For the webspider, many times we encounter some annoying websites and found it is hard to crawler the data easily. So we need to simulate browser to do it. Selenium is a very powerful tool to help us crawling data. But Selenium also have some shortcomings, for example in linux and other cloud system, it is not easy to install a browser to do it. Another thing is that usually starting a browser is much less efficient to do the scrapy things. This article is intended to introduce PhantomJS and Selenium , which will help developer to do the browser testing quickly and web spider efficiently.
intall the required software
first is to install seleniumpip intall selenium
for phantomJS, we can use brew or use npm (Node.js) to install:npm -g install phantomjs-prebuilt
Note my node module is in “C:\Users\username\AppData\Roaming\npm\node_modules”
Once we are done with this, we can use PhantomJS freely in selenium
1 | ## python 3.5 |
We can see it is headless browser and give us results directly. (personally speaking, I do not think it is very fast. It seems still spend long time)