Python Proxy Configuration Examples

What information do you need right now? Click on the applicable links.

Information needed
Requests – examples
Scrapy – Scrapy environment variable
Rotating Proxy Middleware – Scrapy package
Response Header – using the same IP
Scrapy with Splash
Selenium - setting for Chrome & Firefox

Requests

Requests is a great Python library for doing HTTP requests, specifically version 2.7.0 and higher.

Configuration

Your proxies configuration should look like the example below. If you're making a request over HTTPS, you should not specify the HTTPS protocol at the beginning of the proxy server host, and instead specify HTTP.

proxies = {
  'http': 'http://HOST:PORT',
  'https': 'http://HOST:PORT'
}

Authentication

This code example shows the most reliable way to use proxy authentication. But if you're using IP authentication, then you can remove USERNAME:PASSWORD@ in the proxies dictionary.

>>> import requests
>>> proxies = {'http': 'http://USERNAME:PASSWORD@HOST:PORT',
               'https': 'http://USERNAME:PASSWORD@HOST:PORT'}
>>> response = requests.get('http://example.com', proxies=proxies)

Multiple Proxies

To use multiple proxy servers, you can randomly choose one for each request. Your code might look like this:

>>> import random 
>>> import requests 
>>> proxy_choices = ['HOST1:PORT', 'HOST2:PORT'] 
... 
>>> proxy = random.choice(proxy_choices) 
>>> proxies = {'http': 'http://%s' % proxy, 'https': 'http://%s' % proxy} 
>>> response = requests.get('http://example.com', proxies=proxies)

Below is an example of a test you can run for a request on a single proxy server. Note that, in addition to the proxy address and port, you must define the protocol. If you're defining more than one protocol, you can use the same proxy.

import requests

proxies = {
  'http': 'http://PROXYHOST:PORT',
  'https': 'http://PROXYHOST:PORT'
}
response = requests.get('http://xxxxx.xxx', proxies=proxies)

print(response.headers)
print(response.encoding)
print(response.status_code)
print(response.text)
print(response.links)

If instead you get an error response, the message might look something like this, the likely cause of the error is a firewall issue:

ProxyError:

HTTPConnectionPool(host='PROXYHOST:PORT): Max retries exceeded with url:
<a href="http://xxxxx.xxx/">http://xxxxx.xxx/</a> (Caused by Proxy
Error('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024D2567BC50>: 
Failed to establish a new connection:
[WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')))
Further Information

Please see Proxy Connection Problems for details.

Scrapy

For the Scrapy crawling framework, you must set the http_proxy environment variable:
$ export http_proxy=http://USERNAME:PASSWORD@HOST:PORT
	

For HTTPS requests, use IP authentication and remove USERNAME:PASSWORD@ from the http_proxy variable.

After setting the environment variable, you can activate middlewares that work with Scrapy.

Exception: You do not need the environment variable when you use the Rotating Proxies Middleware (scrapy-rotating-proxies).

Downloader Middleware

This middleware provides a framework that hooks into the Scrapy request and response processes.

Further Information

Follow the link for a description of the middleware and its workings.

To activate it, click the link.to Activating a downloader middleware.

Rotating Proxies Middleware

This Scrapy middleware package ( https://pypi.org/project/scrapy-rotating-proxies/) enables you to use rotating proxies, to check that the proxies are alive, and to adjust crawling speed. To install, do the following:
pip install scrapy-rotating-proxies
	

You do not need the environment variable when you use scrapy-rotating-proxies. scrapy-rotating-proxies keeps track of working and non-working proxies, and periodically re-checks the non-working ones.

You can easily set up this middleware to use multiple proxies. Add ROTATING_PROXY_LIST option with a list of proxies to settings.py:
ROTATING_PROXY_LIST = [     
    'proxy1.com:8000',     
    'proxy2.com:
    # ...
]
	
Further Information
For alternative setup methods and more information about the middleware, see Scrapy Rotating Proxies Middleware.

These notes reference the Rotating Proxies Middleware, but you may also find the suggestions helpful with other middlewares, especially those enabling multiple proxy use.

It's generally easier to debug with Scrapy's proxy settings than with middleware settings.

Although it works well with multiple proxies, the use of multiple proxies in Rotating Proxies Middleware can make it hard to debug your code. For efficient debugging, we recommend you use Scrapy normal proxy settings rather than the middleware settings.

Also, some Scrapy users who activate the middleware may receive error messages indicating that their chosen proxy is "dead" although they have authenticated to an alive proxy server. If you're only using a single proxy, you don't need the multiproxy feature of the middleware. Here, too, we would recommend turning off the middleware and using Scrapy's normal proxy settings.

Your request logs may show 200 "success" responses for many requests. For others, you may be getting 403 error codes indicating possible errors in your code or configuration. If you receive such errors, we suggest checking to make sure you're not generating requests directly from your code, that is, bypassing the needed proxy settings.

Random Proxy Middleware

The Rotating Proxies Middleware described above includes options for multiple proxies, but as an alternative you can also use RandomProxyMiddleware.

This middleware processes Scrapy requests using a random proxy from a list to improve crawling speed and avoid IP bans. For quick installation, do this:
pip install scrapy_proxies
	
Further Information

For more information on this middleware, click the link.

Response Header

With every response, the remote site includes an X-ProxyMesh-IP header whose value is the IP used for the request.

To use the same IP for a subsequent request, pass in this header unchanged.

If needed, you can parse the value of the X-Proxymesh-IP header for future use as in this example:
def parse_response(self, response): 
    print response.headers
	

Scrapy with Splash Request

For a splash request via the proxy, add a 'proxy' argument to the SplashRequest object. Without this argument, you may receive a 503 service unavailable response.

Click the link to view sample code for a splash request.

Selenium + Chrome

To configure the Python webdriver for Selenium to use Chrome, see  How do i set proxy for chrome in python webdriver. Be sure to use IP authentication before configuring Selenium.

Selenium + Firefox

To set the network proxy settings for Selenium to use Firefox, you can do something like this. (Be sure to use  IP authentication before configuring Selenium):

profile = webdriver.FirefoxProfile() profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", 'HOST')
profile.set_preference("network.proxy.http_port", 31280)
profile.set_preference("network.proxy.ssl", 'HOST')
profile.set_preference("network.proxy.ssl_port", 31280)
driver = webdriver.Firefox(firefox_profile=profile)

Still need help? Contact Us Contact Us