Scraping with WebHarvy
WebHarvy can be a great scraping tool to use with ProxyMesh rotating proxies. WebHarvy employs a point and click interface, without the need to write code or scripts to scrape data. You can select the data to be scraped with mouse clicks. And you can scrape data by automatically submitting a list of input keywords to search forms.
Setup for Scraping via Proxy Servers
You can set up WebHarvy to scrape websites via proxy servers. Scraping via proxy servers helps you to maintain a level of anonymity, by hiding your IP, while extracting data from websites. To edit proxy settings, click the Settings button from the Home menu and select the 'Proxy Settings' tab.
To add a proxy, provide the proxy server details in the 'Add proxy' box and click the ' +' button.
Example:
fr.proxymesh.com:31280
Please also see the discussion in Proxy Authentication.
WebHarvy supports the following protocols:
- HTTP
- HTTPS
- SOCKS4
- SOCKS4a
- SOCKS5
Please note: Among the protocols listed above, ProxyMesh currently supports only HTTP connections to the proxy servers. But the server can securely proxy HTTPS/SSL connections between you and an HTTPS server using the CONNECT method.
The "Rotate proxies" checkbox on the above screen does not pertain to ProxyMesh's 12-hour rotation period for all IPs on a given proxy. Instead, it indicates your choice of WebHarvy's rotation period for each proxy in the Proxy List.
For web scraping, you can use either a single proxy server or a list of proxy servers. If you select the 'Rotate proxies' option, WebHarvy will automatically rotate and use each proxy server in the list for a period of time. Otherwise, it will use the first proxy in the list.
Disable cookies while mining
While using proxy servers for mining, it is recommended that you also select the 'Disable cookies while mining' option in Browser settings. That's because websites can get details about your previous visits by using cookies stored locally by the browser. But when this option is selected, WebHarvy will periodically delete browser cookies during mining.
Importing proxies from a file
To import a list of proxy addresses from a file (CSV or Text), click the Import button:
The proxy list file should have the following format.
Each line of the file will describe a proxy server in the following format:
proxy-address:port username password
As shown above, you must insert blank spaces to separate proxy-address:port, username, and password. The username and password fields are optional. So, if they're absent, only the proxy IP address will be present on a given line. Each line describes a proxy server. You may also separate proxy server information by commas (,) or semicolons (;) instead of newline (line by line format).
Proxy List File Example
http://12.345.67.89:8080 w000sa pwd123 http://111.019.765.43:21 http://nnn.nnn.nnn.nn:3128
In the above example, the first proxy has login credentials (username and password), while the last two are open.
On our proxyserver blog:
And for WebHarvy use cases and general discussion, see the "Web Scraping Articles" section under WebHarvy Visual Web Scraping Software .