Python Proxy Configuration Examples

You can find various python code examples in our proxy-examples project.

What information do you need right now? Click on the applicable links.

Information needed
Requests – examples
Scrapy – Scrapy environment variable
Rotating Proxy Middleware – Scrapy package
Response Header – using the same IP
Scrapy with Splash
Selenium - setting for Chrome & Firefox
Selenium - setting for Chrome & Firefox
Other Libraries - urllib3, httpx, aiohttp

Requests

Requests is a great Python library for doing HTTP requests, specifically version 2.7.0 and higher.

Configuration

Your proxies configuration should look like the example below. If you're making a request over HTTPS, you should not specify the HTTPS protocol at the beginning of the proxy server host, and instead specify HTTP.

proxies = {
  'http': 'http://USERNAME:PASSWORD@HOST:PORT',
  'https': 'http://USERNAME:PASSWORD@HOST:PORT'
}

Authentication

This code example shows the most reliable way to use proxy authentication. But if you're using IP authentication, then you can remove USERNAME:PASSWORD@ in the proxies dictionary.

import requests
proxies = {'http': 'http://HOST:PORT',
           'https': 'http://HOST:PORT'}
response = requests.get('https://example.com', proxies=proxies)

Multiple Proxies

To use multiple proxy servers, you can randomly choose one for each request. Your code might look like this:

import random
import requests
proxy_choices = ['HOST1:PORT', 'HOST2:PORT']
proxy = random.choice(proxy_choices)
proxies = {
	'http': f'http://{proxy}',
	'https': f'http://{proxy}'
}
response = requests.get('https://example.com', proxies=proxies)

Proxy Headers

If you use our requests_adapter module from python-proxy-headers, you can pass in and receive our custom proxy headers, like this:

from python_proxy_headers import requests_adapter
r = requests_adapter.get('https://api.ipify.org?format=json', proxies={'http': 'http://PROXYHOST:PORT', 'https': 'http://PROXYHOST:PORT'}, proxy_headers={'X-ProxyMesh-Country': 'US'})
r.headers['X-ProxyMesh-IP']

Single Proxy Example

Below is an example of a test you can run for a request on a single proxy server. Note that, in addition to the proxy address and port, you must define the protocol. If you're defining more than one protocol, you can use the same proxy.

import requests

proxies = {
  'http': 'http://PROXYHOST:PORT',
  'https': 'http://PROXYHOST:PORT'
}
response = requests.get('http://xxxxx.xxx', proxies=proxies)

print(response.headers)
print(response.encoding)
print(response.status_code)
print(response.text)
print(response.links)

If instead you get an error response, the message might look something like this, the likely cause of the error is a firewall issue:

ProxyError:

HTTPConnectionPool(host='PROXYHOST:PORT): Max retries exceeded with url:
<a href="http://xxxxx.xxx/">http://xxxxx.xxx/</a> (Caused by Proxy
Error('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024D2567BC50>: 
Failed to establish a new connection:
[WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')))
Further Information

Please see Proxy Connection Problems for details.

Scrapy

For the Scrapy crawling framework, you must set the http_proxy environment variable:

$ export http_proxy=http://USERNAME:PASSWORD@HOST:PORT

For HTTPS requests, use IP authentication and remove USERNAME:PASSWORD@  from the http_proxy variable.

After setting the environment variable, you can activate middlewares that work with Scrapy.

Exception: You do not need the environment variable when you use the Rotating Proxies Middleware (scrapy-rotating-proxies).

Downloader Middleware for Custom Headers

By default, Scrapy does not provide a way to send custom headers to a proxy when making HTTPS requests. So we developed a downloader middleware you can use to do that: scrapy-proxy-headers. You can install it from PyPI, then add it to your settings like this:

DOWNLOAD_HANDLERS = {
  "https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
}

Now when you want make a request with a custom proxy header, instead of using request.headers , use request.meta["proxy_headers"]  like this:

request.meta["proxy_headers"] = {"X-ProxyMesh-Country": "US"}

You can also get custom proxy headers from the response, like this:

response.headers["X-ProxyMesh-IP"]

Rotating Proxies Middleware

The scrapy-rotating-proxies middleware package enables you to use rotating proxies, to check that the proxies are alive, and to adjust crawling speed.

You do not need the environment variable when you use scrapy-rotating-proxies. scrapy-rotating-proxies keeps track of working and non-working proxies, and periodically re-checks the non-working ones.

You can easily set up this middleware to use multiple proxies. Add ROTATING_PROXY_LIST  option with a list of proxies to settings.py:

ROTATING_PROXY_LIST = [     
    'proxy1.com:8000',     
    'proxy2.com:3128'
]
Further Information
For alternative setup methods and more information about the middleware, see Scrapy Rotating Proxies Middleware.

These notes reference the Rotating Proxies Middleware, but you may also find the suggestions helpful with other middlewares, especially those enabling multiple proxy use.

  • It's generally easier to debug with Scrapy's proxy settings than with middleware settings.
  • Some Scrapy users who activate the middleware may receive error messages indicating that their chosen proxy is "dead" although they have authenticated to an alive proxy server. If you're only using a single proxy, you don't need the multi-proxy feature of the middleware. Here, too, we would recommend turning off the middleware and using Scrapy's normal proxy settings.
  • Your request logs may show 200 "success" responses for many requests. For others, you may be getting 403 error codes indicating possible errors in your code or configuration. If you receive such errors, we suggest checking to make sure you're not generating requests directly from your code, that is, bypassing the needed proxy settings.

Random Proxy Middleware

The Rotating Proxies Middleware described above includes options for multiple proxies, but as an alternative you can also use RandomProxyMiddleware. This middleware processes Scrapy requests using a random proxy from a list to improve crawling speed and avoid IP bans.

Response Header

With every response, the remote site includes an X-ProxyMesh-IP header whose value is the IP used for the request. In order to access this header, you should use our scrapy-proxy-headers package. Then to use the same IP for a subsequent request, pass in this header unchanged.

Scrapy with Splash Request

For a splash request via the proxy, add a proxy argument to the SplashRequest object. Without this argument, you may receive a 503 service unavailable response. Here you view sample code for a splash request.

Selenium + Chrome

To configure the Python webdriver for Selenium to use Chrome, see  How do i set proxy for chrome in python webdriver. Be sure to use IP authentication before configuring Selenium.

Selenium + Firefox

To set the network proxy settings for Selenium to use Firefox, you can do something like this. (Be sure to use  IP authentication before configuring Selenium):

profile = webdriver.FirefoxProfile() profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", 'HOST')
profile.set_preference("network.proxy.http_port", 31280)
profile.set_preference("network.proxy.ssl", 'HOST')
profile.set_preference("network.proxy.ssl_port", 31280)
driver = webdriver.Firefox(firefox_profile=profile)

Other Python Libraries

Our python-proxy-headers library has examples for using proxies with various python libaries, and provides custom proxy header support. Currently it provides extension modules for the following libraries:

Still need help? Contact Us Contact Us