Python Proxy Configuration Examples

Requests

Requests is a great Python library for doing HTTP requests, specifically version 2.7.0 and higher. The example below shows the most reliable way to use proxy authentication, but if you're using IP authentication, then you can remove USERNAME:PASSWORD@ in the proxies dictionary.

>>> import requests
>>> proxies = {'http': 'http://USERNAME:PASSWORD@HOST:PORT',
               'https': 'http://USERNAME:PASSWORD@HOST:PORT'}
>>> response = requests.get('http://example.com', proxies=proxies)

To use multiple proxy servers, you can randomly choose one for each request. Your code might look like this:

>>> import random
>>> import requests
>>> proxy_choices = ['HOST1:PORT', 'HOST2:PORT']
...
>>> proxy = random.choice(proxy_choices)
>>> proxies = {'http': 'http://%s' % proxy, 'https': 'http://%s' % proxy}
>>> response = requests.get('http://example.com', proxies=proxies)

Below is an example of a test you can run for a request on a single proxy server. Note that, in addition to the proxy address and port, you must define the protocol. If you're defining more than one protocol, you can use the same proxy.

import requests
proxies = {
 'http': 'http://PROXYHOST:PORT',
 'https': 'http://PROXYHOST:PORT'
}
response = requests.get('http://xxxxx.xxx', proxies=proxies)

print(response.headers)
print(response.encoding)
print(response.status_code)
print(response.text)
print(response.links)

You should get a 200 response code indicating the test was successful.

If instead you get an error response, the message might look something like this:

ProxyError:

HTTPConnectionPool(host='PROXYHOST:PORT): Max retries exceeded with url: 
<a href="http://xxxxx.xxx/">http://xxxxx.xxx/</a> (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024D2567BC50>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')))

The likely cause of this error is a firewall issue. Please see Proxy Connection Problems for details.

Scrapy

For the Scrapy crawling framework, you must set the http_proxy environment variable:

$ export http_proxy=http://USERNAME:PASSWORD@HOST:PORT

 For HTTPS requests, use IP authentication and remove USERNAME:PASSWORD@ from the http_proxy variable.

After setting the environment variable, you can activate middlewares that work with Scrapy.

Downloader Middleware

This middleware provides a framework that hooks into the Scrapy request and response proccesses. For a description of the middleware and its workings, click here. To activate it, click this link.

Rotating Proxies Middleware

This Scrapy middleware package enables you to use rotating proxies, to check that the proxies are alive, and to adjust crawling speed. To install, do the following:

pip install scrapy-rotating-proxies

You can easily set up this middleware to use multiple proxies. Add ROTATING_PROXY_LIST option with a list of proxies to settings.py:

ROTATING_PROXY_LIST = [
    'proxy1.com:8000',
    'proxy2.com: 
    # ...
]

For alternative setup methods and more information about the middleware, see Scrapy Rotating Proxies Middleware.

Notes on Rotating Proxies Middleware

  • These notes reference the Rotating Proxies Middleware, but you may also find the suggestions helpful with other middlewares, especially those enabling multiple proxy use. It's generally easier to debug with Scrapy's proxy settings than with middleware settings.

Although it works well with multiple proxies, the use of this feature in Rotating Proxies Middleware can make it hard to debug your code. For efficient debugging, we recommend you use Scrapy normal proxy settings rather than the middleware settings.

Also, some Scrapy users who activate the middleware may receive error messages indicating that their chosen proxy is "dead" although they have authenticated to an alive proxy server. If you're only using a single proxy, you don't need the multiproxy feature of the middleware. Here, too, we would recommend turning off the middleware and using Scrapy's normal proxy settings.

Your request logs may show 200 "success" responses for many requests. For others, you may be getting 403 error codes indicating possible errors in your code or configuration. If you receive such errors, we suggest checking to make sure you're not generating requests directly from your code, that is, bypassing the needed proxy settings.

Random Proxy Middleware

The Rotating Proxies Middleware described above includes options for multiple proxies, but as an alternative you can also use RandomProxyMiddleware. This middleware processes Scrapy requests using a random proxy from a list to improve crawling speed and avoid IP bans. For quick installation, do this:

pip install scrapy_proxies

For more information on this middleware, click here.

Response Header

With every response, the remote site includes an X-ProxyMesh-IP header whose value is the IP used for the request. To use the same IP for a subsequent request, pass in this header unchanged.

If needed, you can parse the value of the X-Proxymesh-IP header for future use as in this example:

def parse_response(self, response):
    print response.headers

Scrapy with Splash Request

For a splash request via the proxy, add a 'proxy' argument to the SplashRequest object. Without this argument, you may receive a 503 service unavailable response.

Click here to view sample code for a splash request.

Selenium + Chrome

To configure the Python webdriver for Selenium to use Chrome, see  how do i set proxy for chrome in python webdriver. Be sure to use IP authentication before configuring Selenium.

Selenium + Firefox

To set the network proxy settings for Selenium to use Firefox, you can do something like this. Be sure to use  IP authentication before configuring Selenium.

profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", 'HOST')
profile.set_preference("network.proxy.http_port", 31280)
profile.set_preference("network.proxy.ssl", 'HOST')
profile.set_preference("network.proxy.ssl_port", 31280)
driver = webdriver.Firefox(firefox_profile=profile)

Still need help? Contact Us Contact Us