PhantomBuster Strategies with ProxyMesh

Phantombuster is a scraping platform to automate website scraping, and provides many APIs for research on social media. Implementing Phantombuster with ProxyMesh can be an effective web scraping strategy, combining key use cases of both.

The ProxyMesh service is great for working with a platform like Phantombuster. This article discusses the benefits of using Phantombuster and ProxyMesh together. Then it describes how to install and implement Phantombuster with ProxyMesh. You'll also find links to helpful Phantombuster pages.

The Benefits

from Phantombuster:

  • Growth hacking to help businesses acquire and retain customers
  • Cloud marketing with specialized APIs
  • Research and outreach for job recruitment
  • Social media research

…and from ProxyMesh:

  • Anonymity
  • Distribution of requests over multiple proxies
  • Scraping
  • Avoidance of rate-limits

Remote sites could easily detect your browser in one location and Phantombuster tools from another location – and most likely you'd get blocked. But a proxy hides your identity behind an alternative IP address, with scheduled rotations, so that a site visit by one IP doesn’t appear too long. That makes a proxy server crucial in allowing you to anonymously gather research and data without the risk of websites blocking you.

You'll find details in our blog Get High Anonymity without Using Numerous Proxy Servers.

ProxyMesh offers a wide range of locations in the US and in other countries across the world.Installation

Installation

This article assumes that you already have:

If you haven't done so already, please check the proxy settings in your browser. Every browser provides a way for users to configure these settings. 

Please see How to Change Web Browser Proxy Settings for links to articles that can help you with the major browsers.

Create a Proxy Pool in Phantombuster

For this step, while keeping this article open, you can also open Phantombuster's Guide to using proxies for additional information.

In the Phantombuster header, click on your name.

Then scroll down in the Phantombuster guide to A) New proxy pool and follow the steps for naming your pool. This is your source for a Phantombuster proxy IP address.

New proxy pool

You will need to fill in the following information:

  • the proxy address you choose from among the authorized proxies on your dashboard 
  • your login details.

Make sure your username and password don't contain any special characters such as "@" or "#."

Click on Add proxy to save.

Set your Phantom to launch with a proxy

Go to your Phantom's configuration Step 2 (Settings) and click on the three little dots to show advanced settings.

Scroll down to the 'Proxy' options and tick Random proxy from pool.

Remember to click Save.

Now you're ready to use Phantombuster and ProxyMesh together.

Best Practices

Here are some recommended practices that can speed your proxy responses and minimize timeouts.

  • Especially for social media research, we recommend you choose a proxy near your geographical location. Social media monitor the geographical locations of IP addresses. So, if you’re accessing social media via Phantombuster (with servers located in the western U.S.) and a second, far-distant address which is your actual location, then use a proxy in your own geographical area for the Phantombuster requests. This can help you avoid detection and bans.
  • Reduce the number of concurrent requests from a single IP. For example, use an additional IP for crawling, or slow down the crawl rate on your current requests.
  • Add proxies for more IPs to multiply connection strategies available.
  • To connect to sites – especially in large numbers – in a specific geographical area, use proxies located near that area.
  • For optimal bandwidth use, minimize requests to pull images, JavaScript, and CSS files.

    • Before trying a proxy, authenticate it via your ProxyMesh Dashboard > Change Proxies.
    • The proxy and the port may have separate fields in Phantombuster’s interface.

501 Error Messages

At times when using Phantombuster with ProxyMesh, you may receive a 501 error response with the following message body: "Message body: "Proxy accepts request but does not seem to support SSL (HTTP 501)"

Generally, you can ignore this message because your requests are actually working. However, if they persist, you can contact us for assistance.

Timeout Error Messages

On occasion, the Phantombuster console may display error messages regarding a proxy whose IP address and other access information have been input. An error message might read: “Connection has timed out. Your proxy may not be working. Make sure to test it in your web browser first.” The error can generally be ignored, as users are likely to find that the proxy is actually functioning properly with Phantombuster despite the message.

Users who wish to double-check the message by running the proxy URL or IP address through a tool such as https://www.sitelike.org/ should be aware that the tool is designed for open proxies and not for ProxyMesh proxies. With ProxyMesh, you can get an accurate reading by going to the Proxy Status Page from your dashboard and then clicking the Uptime link.

If you have any remaining questions about these messages, please contact us.

Phantombuster with PhantomJS Headless & CasperJS

Be aware that PhantomJS, a headless web browser which is often used with CasperJS, is suspended and archived although version 2.1.1 is available for continued use. CasperJS is available as a testing framework and for scripting of full navigation scenarios in a simple interface. CasperJS also provides a download link for research.

This link leads you to a tutorial on web scraping with CasperJS and Phantombuster.

Headless Chrome

Similar to PhantomJS is Google's Headless Chrome, which enables automated control of web pages. With this tool, you can automate tasks, scripts, and user interface tests against a browser without opening the browser's user interface.

 Phantombuster has published a blog, Web Scraping in 2017: Advanced Headless Chrome Tips & Tricks, about ways to use Headless Chrome with Phantombuster.

Useful Links

You may want to follow these Phantombuster links for further details:

  • Github Phantombuster repository of cloud-based APIs, the SDK, and more.
  • Phantombuster API. An API for control of your Phantombuster account. The API is composed of HTTPS endpoints returning JSON data.
  • Phantombuster blog. From this home page of the blog, you can drill down to topics that fit your particular interests and goals.
  • Developer Quick Start. A page describing the agents you'll use as a developer and the scripts you'll code. Also describes your catalog of Phantoms, those you can share or not share.

Also, check out these articles on our blog site, proxyserver.com:

Still need help? Contact Us Contact Us