Request Retry Strategy

What information do you need right now?Click on the applicable links.

Information needed
Avoiding Error Responses – solutions
Timing of Retries – method
Minimizing Retries – techniques
Using Custom Headers – method
Alert 80 Message – reasons and solution
CDNs – description and suggestions

In using any proxy server, or even in any web-scraping operation, good practice includes an effective retry strategy. Where possible, it's also good to minimize the need for retries.

You can do so with ProxyMesh's anonymous IP changer proxy, which rotates IP addresses for you while automatically hiding your IP address.

You can take advantage of this feature to implement two common use cases:

  • crawling from multiple IP addresses
  • getting around IP bans and rate limits

Avoiding Error Responses

Distribute requests over many IPs to reduce delayed responses and timeouts. By default, the proxy servers choose a random IP for each request.

The use of a rotating proxy server helps you avoid rate limits and blocking. You can get around them, for example, by changing your IP address frequently when you encounter a site or API that uses IP throttling or IP address rate limiting.

The proxies will not allow standard identifying headers to pass through; but be aware of additional headers, especially the User Agent header, that could make your request identifiable as a crawler script.

If you’ve gotten many 503 errors when attempting to access certain sites, it’s possible that the target server is dropping the connection for some reason. It could be excessive load, poor network connections between the proxy and the target site, or some form of IP blocking.

Try sending requests to the site from other proxies or slowing down your crawl.

Timing of Retries

When retries are necessary – for example, with 40x or 50x response errors – we recommend retrying requests at least 3 times.

Avoid timing your retries too closely together, especially if you're targeting a busy server that's already handling near-maximum request volume. Give the error response time to expire from the target server.

Best practice is to increase the delay between retries, as in the two-retry example below:

  1. Make first request
  2. Receive 500 error response code
  3. Wait 1 second
  4. Retry request
  5. Receive 500 error response code
  6. Wait 2 seconds
  7. Retry request
  8. Receive 200 success response code
You can perform retries manually, but Celery, an open-source job queue for Python users (and usable in other languages too), includes a feature called “exponential backoff” which automates serial retries.

Minimizing Retries

Consider the following retry-minimization techniques in light of all your use cases:

  • Reduce time  and number of requests for a given site with rotating proxies, which help avoid rate limits and blocking.
  • Avoid timeouts by using proxies located near your target sites and, if possible, in the same domain. Try configuring the X-ProxyMesh-Country request header for this result when using the open and world proxies.
  • For requests limited to IPs from a particular country, try the world proxy along with the X-ProxyMesh-Country header.
  • Make sure your user agent in the header string is appropriate to the browser you're using. Some sites block user agents that don't look like web browsers.
  • Anticipate 301 responses (i.e., a site has been permanently moved) by scripting your request to follow redirection.

Using Custom Headers

Sometimes, notwithstanding a well-considered approach, specific IPs can get blocked, especially when crawling at high volumes. One mitigation is to combine a retry strategy with custom headers, in order to control which IP addresses are not used. The way this could work is:

  1. Make first request
  2. Receive 500 error response code
  3. Get the IP from the X-ProxyMesh-IP response header
  4. Add this IP to a "not IP" list
  5. Retry the request with a X-ProxyMesh-Not-IP header containing the IP
  6. Receive 200 success response code
The X-ProxyMesh-Not-IP header can take a comma-separated list of IP addresses, assembling bad IPs to skip for future requests. If you use it:
  • Make sure the list is specific to the target domain.
  • Do not cache the IPs for more than 1 day, as they will be out-of-date or offline after 12 hours.
  • Remember to check the user agent in a custom header string so that it is associated with the appropriate web browser.

"Alert 80" Messages

At times, when running a script in a request over HTTPS, you may encounter a problem with the SSL protocol. This may be indicated by an "Alert 80" internal error message.

  • You may have trouble connecting to a specific server, but not to others, over HTTPS.
  • The message body includes SSL alert number 80 and internal error.
  • The error message description includes Sending server not found.

The Alert 80 message tends to appear randomly or intermittently. In most cases, it is difficult to determine whether the cause of the internal error is the proxy, the remote site, or the network in between.

Here are suggestions for addressing the error:

If the problem is not resolved, please contact Support.

CDNs

The use of content delivery networks (CDNs) – geographically distributed networks of proxy servers with cached content that correlate traffic from many different sites – continues to increase. And while CDNs provide excellent data security, they also make it easier to detect scraping IPs.

Proxy servers can help you get around location challenges and circumvent CDN restrictions. Our solution is to use more proxies to make many IP addresses available, so you can more easily bypass geo-restrictions.

The proxies relay data between your device and the CDN, hiding your IP address so you can access content securely and anonymously.

Further Information

For details, please see our blog article Ways to Work around CDNs Using Proxy Servers.

Also, a useful library for Python users performing retries is Tenacity.

Still need help? Contact Us Contact Us