Phantombuster Strategies with ProxyMesh
Phantombuster is a scraping platform to automate website scraping, and provides many APIs for research on social media. Implementing Phantombuster with ProxyMesh can be an effective web scraping strategy, combining key use cases of both.
This article discusses the benefits of using Phantombuster and ProxyMesh together. Then it describes how to install and implement Phantombuster with ProxyMesh. You'll also find links to helpful Phantombuster pages.
- Growth hacking to help businesses acquire and retain customers
- Cloud marketing with specialized APIs
- Research and outreach for job recruitment
- Social media research
…and from ProxyMesh:
- Distribution of requests over multiple proxies
- Avoidance of rate-limits
Remote sites could easily detect your browser in one location and Phantombuster tools from another location – and most likely you'd get blocked. But a proxy hides your identity behind an alternative IP address, with scheduled rotations, so that a site visit by one IP doesn’t appear too long. That makes a proxy server crucial in allowing you to anonymously gather research and data without the risk of websites blocking you.
ProxyMesh offers a wide range of locations in the US and in other countries across the world.Installation
This article assumes that you already have:
- Opened a ProxyMesh account
- Set up subscription payments
- Authorized an IP address
- Chosen the ProxyMesh proxy that best fits your geographical location.
Please see How to Change Web Browser Proxy Settings for links to articles that can help you with the major browsers.
Creating a Proxy in Phantombuster
Create a proxy pool in Phantombuster
In the Phantombuster header, click on your name, then on 'Proxies':
Then click on '+ New proxy pool'. Call it anything you like.
You will need to fill in the following information:
- the proxy address you choose from among the authorized proxies on your dashboard
- your login details.
Make sure your username and password don't contain any special characters such as "@" or "#."
Click on 'Add proxy' to save.
Set your Phantom to launch with a proxy
Go to your Phantom's configuration Step 2 (Settings) and click on the three little dots to 'Show advanced settings':
Scroll down to the 'Proxy' options and tick 'Random proxy from pool':
Don’t forget to 'Save'.
Now you're ready to use Phantombuster and ProxyMesh together.
Here are some recommended practices that can speed your proxy responses and minimize timeouts.
- Especially for social media research, we recommend you choose a proxy near your geographical location. Social media monitor the geographical locations of IP addresses. So, if you’re accessing social media via Phantombuster (with servers located in the western U.S.) and a second, far-distant address which is your actual location, then use a proxy in your own geographical area for the Phantombuster requests. This can help you avoid detection and bans.
- Reduce the number of concurrent requests from a single IP. For example, use an additional IP for crawling, or slow down the crawl rate on your current requests.
- Add proxies for more IPs to multiply connection strategies available.
- To connect to sites – especially in large numbers – in a specific geographical area, use proxies located near that area.
- Before trying a proxy, authenticate it via your ProxyMesh Dashboard > Change Proxies.
- The proxy and the port may have separate fields in Phantombuster’s interface.
501 Error Messages
At times when using Phantombuster with ProxyMesh, you may receive a
501 error response with the following message body: "Message body: "Proxy accepts request but does not seem to support SSL (HTTP 501)"
Generally, you can ignore this message because your requests are actually working. However, if they persist, you can contact us for assistance.
Timeout Error Messages
On occasion, the Phantombuster console may display error messages regarding a proxy whose IP address and other access information have been input. An error message might read: “Connection has timed out. Your proxy may not be working. Make sure to test it in your web browser first.” The error can generally be ignored, as users are likely to find that the proxy is actually functioning properly with Phantombuster despite the message.
Users who wish to double-check the message by running the proxy URL or IP address through a tool such as https://www.proxy-checker.org/ should be aware that the tool is designed for open proxies and not for ProxyMesh proxies. With ProxyMesh, you can get an accurate reading by going to the Proxy Status Page from your dashboard and then clicking the Uptime link.
If you have any remaining questions about these messages, please contact us.
Phantombuster with PhantomJS Headless & CasperJS
Be aware that PhantomJS, a headless web browser which is often used with CasperJS, is suspended and archived although version 2.1.1 is available for continued use. CasperJS is available as a testing framework and for scripting of full navigation scenarios in a simple interface. CasperJS also provides a download link for research.
This link leads you to a tutorial on web scraping with CasperJS and Phantombuster.
Similar to PhantomJS is Google's Headless Chrome, which enables automated control of web pages. With this tool, you can automate tasks, scripts, and user interface tests against a browser without opening the browser's user interface.
Phantombuster has published a blog, Web Scraping in 2017: Advanced Headless Chrome Tips & Tricks, about ways to use Headless Chrome with Phantombuster.
You may want to follow these Phantombuster links for further details:
- Github Phantombuster repository of cloud-based APIs, the SDK, and more.
- Phantombuster API. An API for control of your Phantombuster account. The API is composed of HTTPS endpoints returning JSON data.
- Phantombuster.com. An online store with a range of APIs for specific media and project types.
- Phantombuster blog. From this home page of the blog, you can drill down to topics that fit your particular interests.
- Phantombuster documentation. A PDF containing scripts, modules, and packages, plus the Phantombuster API and SDK.
Also, check out these articles on our blog site, proxyserver.com: