Scraping with WebHarvy

WebHarvy can be a great scraping tool to use with ProxyMesh rotating proxies. WebHarvy employs a point and click interface, without the need to write code or scripts to scrape data. You can select the data to be scraped with mouse clicks. And you can scrape data by automatically submitting a list of input keywords to search forms.

Follow this link for a description of features and download steps. Below are steps for configuration of WebHarvy for scraping with ProxyMesh.

Setup for Scraping via Proxy Servers

You can set up WebHarvy to scrape websites via proxy servers. Scraping via proxy servers helps you to maintain a level of anonymity, by hiding your IP, while extracting data from websites. To edit proxy settings, click the Settings button from the Home menu and select the 'Proxy Settings' tab.

To add a proxy, provide the proxy server details in the 'Add proxy' box and click the ' +' button.

Instead of entering an IP address as the proxy address, enter the proxy hostname in combination with the proxy port showing in the proxies page in the ProxyMesh dashboard.

Example: fr.proxymesh.com:31280

Please also see the discussion in Proxy Authentication.

WebHarvy supports the following protocols:

  • HTTP
  • HTTPS
  • SOCKS4
  • SOCKS4a
  • SOCKS5

Please note: Among the protocols listed above, ProxyMesh currently supports only HTTP connections to the proxy servers. But the server can securely proxy HTTPS/SSL connections between you and an HTTPS server using the CONNECT method.

The "Rotate proxies" checkbox on the above screen does not pertain to ProxyMesh's 12-hour  rotation period for all IPs on a given proxy. Instead, it indicates your choice of WebHarvy's rotation period for each proxy in the Proxy List.

For web scraping, you can use either a single proxy server or a list of proxy servers. If you select the 'Rotate proxies' option, WebHarvy will automatically rotate and use each proxy server in the list for a period of time. Otherwise, it will use the first proxy in the list.

Disable cookies while mining

While using proxy servers for mining, it is recommended that you also select the 'Disable cookies while mining' option in Browser settings. That's because websites can get details about your previous visits by using cookies stored locally by the browser. But when this option is selected, WebHarvy will periodically delete browser cookies during mining.

Importing proxies from a file

To import a list of proxy addresses from a file (CSV or Text), click the Import button:

The proxy list file should have the following format.

Each line of the file will describe a proxy server in the following format:

proxy-address:port username password

As shown above, you must insert blank spaces to separate proxy-address:port, username, and password. The username and password fields are optional. So, if they're absent, only the proxy IP address will be present on a given line. Each line describes a proxy server. You may also separate proxy server information by commas (,) or semicolons (;) instead of newline (line by line format).

Proxy List File Example


http://12.345.67.89:8080 w000sa pwd123
http://111.019.765.43:21
http://nnn.nnn.nnn.nn:3128

In the above example, the first proxy has login credentials (username and password), while the last two are open.

See also:

On our proxyserver blog:

And for WebHarvy use cases and general discussion, see the "Web Scraping Articles" section under WebHarvy Visual Web Scraping Software .

Still need help? Contact Us Contact Us