Scraping with WebHarvy
WebHarvy can be a great scraping tool to use with ProxyMesh rotating proxies. WebHarvy employs a point and click interface, without the need to write code or scripts to scrape data. You can select the data to be scraped with mouse clicks. And you can scrape data by automatically submitting a list of input keywords to search forms.
Setup for Scraping via Proxy Servers
You can set up WebHarvy to scrape websites via proxy servers. Scraping via proxy servers helps you to maintain a level of anonymity, by hiding your IP, while extracting data from websites. To edit proxy settings, click the Settings button from the Home menu and select the 'Proxy Settings' tab.
To add a proxy, provide the proxy server details in the 'Add proxy' box and click the ' +' button.
Example:
fr.proxymesh.com:31280
Please also see the discussion in Proxy Authentication.
WebHarvy supports the following protocols:
- HTTP
- HTTPS
- SOCKS4
- SOCKS4a
- SOCKS5
For web scraping, you can use either a single proxy server or a list of proxy servers. If you select the 'Rotate proxies' option, WebHarvy will automatically rotate and use each proxy server in the list for a period of time. Otherwise, it will use the first proxy in the list.
Disable cookies while mining
While using proxy servers for mining, it is recommended that you also select the 'Disable cookies while mining' option in Browser settings. That's because websites can get details about your previous visits by using cookies stored locally by the browser. But when this option is selected, WebHarvy will periodically delete browser cookies during mining.
Importing proxies from a file
To import a list of proxy addresses from a file (CSV or Text), click the Import button:
The proxy list file should have the following format.
Each line of the file will describe a proxy server in the following format:
proxy-address:port username password
As shown above, you must insert blank spaces to separate proxy-address:port, username, and password. The username and password fields are optional. So, if they're absent, only the proxy IP address will be present on a given line. Each line describes a proxy server. You may also separate proxy server information by commas (,) or semicolons (;) instead of newline (line by line format).
Proxy List File Example
http://12.345.67.89:8080 w000sa pwd123 http://111.019.765.43:21 http://nnn.nnn.nnn.nn:3128
In the above example, the first proxy has login credentials (username and password), while the last two are open.
On our proxyserver blog:
And for WebHarvy use cases and general discussion, see the "Web Scraping Articles" section under WebHarvy Visual Web Scraping Software .