Request Retry Strategy

In using any proxy server, or even in any web-scraping operation, good practice includes an effective retry strategy. Where possible, it's also good to minimize the need for retries. You can do so with ProxyMesh's anonymous IP changer proxy, which rotates IP addresses for you while automatically hiding your IP address.

Here is how you can take advantage of this feature to implement two common use cases:

  • crawling from multiple IP addresses
  • getting around IP bans and rate limits

Avoiding Error Responses

Distribute requests over many IPs to reduce delayed responses and timeouts. By default, the proxy servers choose a random IP for each request. The use of a rotating proxy server helps you avoid rate limits and blocking. You can get around them, for example, by changing your IP address frequently when you encounter a site or API that uses IP throttling or IP address rate limiting.

The proxies will not allow standard identifying headers to pass through; but be aware of additional headers, especially the User Agent header, that could make your request identifiable as a crawler script.

If a remote site regularly sends 503 response bodies indicating rate limits, try using a different proxy server or slowing down your crawl.

Timing of Retries

When retries are necessary – for example, with 40x or 50x response errors – we recommend retrying requests at least 3 times. Avoid timing your retries too closely together, especially if you're targeting a busy server that's already handling near-maximum request volume. Give the error response time to expire from the target server.

Best practice is to increase the delay between retries, as in the two-retry example below:

  1. Make first request
  2. Receive 500 error response code
  3. Wait 1 second
  4. Retry request
  5. Receive 500 error response code
  6. Wait 2 seconds
  7. Retry request
  8. Receive 200 success response code

Minimizing Retries

Consider the following retry-minimization techniques in light of all your use cases:

  • Reduce time  and number of requests for a given site with rotating proxies, which help avoid rate limits and blocking.
  • Avoid timeouts by using proxies located near your target sites and, if possible, in the same domain. Try configuring the X-ProxyMesh-Country request header for this result when using the open and world proxies.
  • For requests limited to IPs from a particular country, try the world proxy along with the X-ProxyMesh-Country header.
  • Make sure your user agent in the header string is appropriate to the browser you're using. Some sites block user agents that don't look like web browsers.
  • Anticipate 301 responses (i.e., a site has been permanently moved) by scripting your request to follow redirection.

Using Custom Headers

Sometimes, notwithstanding a well-considered approach, specific IPs can get blocked, especially when crawling at high volumes. One mitigation is to combine a retry strategy with custom headers, in order to control which IP addresses are not used. The way this could work is:

  1. Make first request
  2. Receive 500 error response code
  3. Get the IP from the X-ProxyMesh-IP response header
  4. Add this IP to a "not IP" list
  5. Retry the request with a X-ProxyMesh-Not-IP header containing the IP
  6. Receive 200 success response code

The X-ProxyMesh-Not-IP header can take a comma-separated list of IP addresses, assembling bad IPs to skip for future requests. If you use it:

  • Make sure the list is specific to the target domain.
  • Do not cache the IPs for more than 1 day, as they will be out-of-date or offline after 24hrs.
  • Remember to check the user agent in a custom header string so that it is associated with the appropriate web browser.

"Alert 80" Messages

At times, when running a script in a request over HTTPS, you may encounter a problem with the SSL protocol. This may be indicated by an "Alert 80" internal error message.

  • You may have trouble connecting to a specific server, but not to others, over HTTPS.
  • The message body includes SSL alert number 80 and internal error.
  • The error message description includes Sending server not found.

The Alert 80 message tends to appear randomly or intermittently. In most cases, it is difficult to determine whether the cause of the internal error is the proxy, the remote site, or the network in between.

Here are suggestions for addressing the error:

If the problem is not resolved, please contact us.

CDNs

With the increasing use of content delivery networks (CDNs) – geographically distributed networks of proxy servers that correlate traffic from many different sites – it has become easier to detect scraping IPs. Currently, the available solutions are to use more proxies and to slow down your requests as much as possible.

Still need help? Contact Us Contact Us