Python Proxy Configuration Examples
You can find various python code examples in our proxy-examples project.
What information do you need right now? Click on the applicable links.
Information needed |
Requests – examples |
Scrapy – Scrapy environment variable |
Rotating Proxy Middleware – Scrapy package |
Response Header – using the same IP |
Scrapy with Splash |
Selenium - setting for Chrome & Firefox |
Selenium - setting for Chrome & Firefox |
Other Libraries - urllib3, httpx, aiohttp |
Requests
Requests is a great Python library for doing HTTP requests, specifically version 2.7.0 and higher.
Configuration
Your proxies configuration should look like the example below. If you're making a request over HTTPS, you should not specify the HTTPS protocol at the beginning of the proxy server host, and instead specify HTTP.
proxies = { 'http': 'http://USERNAME:PASSWORD@HOST:PORT', 'https': 'http://USERNAME:PASSWORD@HOST:PORT' }
Authentication
This code example shows the most reliable way to use proxy authentication. But if you're using IP authentication, then you can remove USERNAME:PASSWORD@
in the proxies
dictionary.
import requests proxies = {'http': 'http://HOST:PORT', 'https': 'http://HOST:PORT'} response = requests.get('https://example.com', proxies=proxies)
Multiple Proxies
To use multiple proxy servers, you can randomly choose one for each request. Your code might look like this:
import random import requests proxy_choices = ['HOST1:PORT', 'HOST2:PORT'] proxy = random.choice(proxy_choices) proxies = { 'http': f'http://{proxy}', 'https': f'http://{proxy}' } response = requests.get('https://example.com', proxies=proxies)
Proxy Headers
If you use our requests_adapter
module from python-proxy-headers, you can pass in and receive our custom proxy headers, like this:
from python_proxy_headers import requests_adapter r = requests_adapter.get('https://api.ipify.org?format=json', proxies={'http': 'http://PROXYHOST:PORT', 'https': 'http://PROXYHOST:PORT'}, proxy_headers={'X-ProxyMesh-Country': 'US'}) r.headers['X-ProxyMesh-IP']
Single Proxy Example
Below is an example of a test you can run for a request on a single proxy server. Note that, in addition to the proxy address and port, you must define the protocol. If you're defining more than one protocol, you can use the same proxy.
import requests proxies = { 'http': 'http://PROXYHOST:PORT', 'https': 'http://PROXYHOST:PORT' } response = requests.get('http://xxxxx.xxx', proxies=proxies) print(response.headers) print(response.encoding) print(response.status_code) print(response.text) print(response.links)
If instead you get an error response, the message might look something like this, the likely cause of the error is a firewall issue:
ProxyError: HTTPConnectionPool(host='PROXYHOST:PORT): Max retries exceeded with url: <a href="http://xxxxx.xxx/">http://xxxxx.xxx/</a> (Caused by Proxy Error('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024D2567BC50>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')))
Please see Proxy Connection Problems for details.
Scrapy
For the Scrapy crawling framework, you must set the http_proxy
environment variable:
$ export http_proxy=http://USERNAME:PASSWORD@HOST:PORT
For HTTPS requests, use IP authentication and remove USERNAME:PASSWORD@
from the http_proxy
variable.
After setting the environment variable, you can activate middlewares that work with Scrapy.
Exception: You do not need the environment variable when you use the Rotating Proxies Middleware (scrapy-rotating-proxies).
Downloader Middleware for Custom Headers
By default, Scrapy does not provide a way to send custom headers to a proxy when making HTTPS requests. So we developed a downloader middleware you can use to do that: scrapy-proxy-headers. You can install it from PyPI, then add it to your settings like this:
DOWNLOAD_HANDLERS = { "https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler" }
Now when you want make a request with a custom proxy header, instead of using request.headers
, use request.meta["proxy_headers"]
like this:
request.meta["proxy_headers"] = {"X-ProxyMesh-Country": "US"}
You can also get custom proxy headers from the response, like this:
response.headers["X-ProxyMesh-IP"]
Rotating Proxies Middleware
The scrapy-rotating-proxies middleware package enables you to use rotating proxies, to check that the proxies are alive, and to adjust crawling speed.
You do not need the environment variable when you use scrapy-rotating-proxies. scrapy-rotating-proxies keeps track of working and non-working proxies, and periodically re-checks the non-working ones.
You can easily set up this middleware to use multiple proxies. Add ROTATING_PROXY_LIST
option with a list of proxies to settings.py
:
ROTATING_PROXY_LIST = [ 'proxy1.com:8000', 'proxy2.com:3128' ]
For alternative setup methods and more information about the middleware, see Scrapy Rotating Proxies Middleware.
These notes reference the Rotating Proxies Middleware, but you may also find the suggestions helpful with other middlewares, especially those enabling multiple proxy use.
- It's generally easier to debug with Scrapy's proxy settings than with middleware settings.
- Some Scrapy users who activate the middleware may receive error messages indicating that their chosen proxy is "dead" although they have authenticated to an alive proxy server. If you're only using a single proxy, you don't need the multi-proxy feature of the middleware. Here, too, we would recommend turning off the middleware and using Scrapy's normal proxy settings.
- Your request logs may show 200 "success" responses for many requests. For others, you may be getting
403
error codes indicating possible errors in your code or configuration. If you receive such errors, we suggest checking to make sure you're not generating requests directly from your code, that is, bypassing the needed proxy settings.
Random Proxy Middleware
The Rotating Proxies Middleware described above includes options for multiple proxies, but as an alternative you can also use RandomProxyMiddleware. This middleware processes Scrapy requests using a random proxy from a list to improve crawling speed and avoid IP bans.
Response Header
With every response, the remote site includes an X-ProxyMesh-IP header whose value is the IP used for the request. In order to access this header, you should use our scrapy-proxy-headers package. Then to use the same IP for a subsequent request, pass in this header unchanged.
Scrapy with Splash Request
For a splash request via the proxy, add a proxy
argument to the SplashRequest
object. Without this argument, you may receive a 503 service unavailable response. Here you view sample code for a splash request.
Selenium + Chrome
To configure the Python webdriver for Selenium to use Chrome, see How do i set proxy for chrome in python webdriver. Be sure to use IP authentication before configuring Selenium.
Selenium + Firefox
To set the network proxy settings for Selenium to use Firefox, you can do something like this. (Be sure to use IP authentication before configuring Selenium):
profile = webdriver.FirefoxProfile() profile.set_preference("network.proxy.type", 1) profile.set_preference("network.proxy.http", 'HOST') profile.set_preference("network.proxy.http_port", 31280) profile.set_preference("network.proxy.ssl", 'HOST') profile.set_preference("network.proxy.ssl_port", 31280) driver = webdriver.Firefox(firefox_profile=profile)
Please see our blog for Python topics:
Other Python Libraries
Our python-proxy-headers library has examples for using proxies with various python libaries, and provides custom proxy header support. Currently it provides extension modules for the following libraries: