Python Proxy Configuration Examples
What information do you need right now? Click on the applicable links.
Information needed |
Requests – examples |
Scrapy – Scrapy environment variable |
Rotating Proxy Middleware – Scrapy package |
Response Header – using the same IP |
Scrapy with Splash |
Selenium - setting for Chrome & Firefox |
Requests
Requests is a great Python library for doing HTTP requests, specifically version 2.7.0 and higher.
Configuration
Your proxies configuration should look like the example below. If you're making a request over HTTPS, you should not specify the HTTPS protocol at the beginning of the proxy server host, and instead specify HTTP.
proxies = { 'http': 'http://HOST:PORT', 'https': 'http://HOST:PORT' }
Authentication
This code example shows the most reliable way to use proxy authentication. But if you're using IP authentication, then you can remove USERNAME:PASSWORD@
in the proxies
dictionary.
>>> import requests >>> proxies = {'http': 'http://USERNAME:PASSWORD@HOST:PORT', 'https': 'http://USERNAME:PASSWORD@HOST:PORT'} >>> response = requests.get('http://example.com', proxies=proxies)
Multiple Proxies
To use multiple proxy servers, you can randomly choose one for each request. Your code might look like this:
>>> import random >>> import requests >>> proxy_choices = ['HOST1:PORT', 'HOST2:PORT'] ... >>> proxy = random.choice(proxy_choices) >>> proxies = {'http': 'http://%s' % proxy, 'https': 'http://%s' % proxy} >>> response = requests.get('http://example.com', proxies=proxies)
Below is an example of a test you can run for a request on a single proxy server. Note that, in addition to the proxy address and port, you must define the protocol. If you're defining more than one protocol, you can use the same proxy.
import requests proxies = { 'http': 'http://PROXYHOST:PORT', 'https': 'http://PROXYHOST:PORT' } response = requests.get('http://xxxxx.xxx', proxies=proxies) print(response.headers) print(response.encoding) print(response.status_code) print(response.text) print(response.links)
If instead you get an error response, the message might look something like this, the likely cause of the error is a firewall issue:
ProxyError: HTTPConnectionPool(host='PROXYHOST:PORT): Max retries exceeded with url: <a href="http://xxxxx.xxx/">http://xxxxx.xxx/</a> (Caused by Proxy Error('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024D2567BC50>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')))
Please see Proxy Connection Problems for details.
Scrapy
$ export http_proxy=http://USERNAME:PASSWORD@HOST:PORT
For HTTPS requests, use IP authentication and remove USERNAME:PASSWORD@
from the http_proxy variable.
After setting the environment variable, you can activate middlewares that work with Scrapy.
Exception: You do not need the environment variable when you use the Rotating Proxies Middleware (scrapy-rotating-proxies).
Downloader Middleware
This middleware provides a framework that hooks into the Scrapy request and response processes.
Follow the link for a description of the middleware and its workings.
To activate it, click the link.to Activating a downloader middleware.
Rotating Proxies Middleware
pip install scrapy-rotating-proxies
You do not need the environment variable when you use scrapy-rotating-proxies. scrapy-rotating-proxies keeps track of working and non-working proxies, and periodically re-checks the non-working ones.
ROTATING_PROXY_LIST
option with a list of proxies to settings.py:
ROTATING_PROXY_LIST = [ 'proxy1.com:8000', 'proxy2.com: # ... ]
For alternative setup methods and more information about the middleware, see Scrapy Rotating Proxies Middleware.
These notes reference the Rotating Proxies Middleware, but you may also find the suggestions helpful with other middlewares, especially those enabling multiple proxy use.
It's generally easier to debug with Scrapy's proxy settings than with middleware settings.
Although it works well with multiple proxies, the use of multiple proxies in Rotating Proxies Middleware can make it hard to debug your code. For efficient debugging, we recommend you use Scrapy normal proxy settings rather than the middleware settings.
Also, some Scrapy users who activate the middleware may receive error messages indicating that their chosen proxy is "dead" although they have authenticated to an alive proxy server. If you're only using a single proxy, you don't need the multiproxy feature of the middleware. Here, too, we would recommend turning off the middleware and using Scrapy's normal proxy settings.
Your request logs may show 200 "success" responses for many requests. For others, you may be getting 403
error codes indicating possible errors in your code or configuration. If you receive such errors, we suggest checking to make sure you're not generating requests directly from your code, that is, bypassing the needed proxy settings.
Random Proxy Middleware
The Rotating Proxies Middleware described above includes options for multiple proxies, but as an alternative you can also use RandomProxyMiddleware.
pip install scrapy_proxies
For more information on this middleware, click the link.
Response Header
With every response, the remote site includes an X-ProxyMesh-IP header whose value is the IP used for the request.
To use the same IP for a subsequent request, pass in this header unchanged.
def parse_response(self, response): print response.headers
Scrapy with Splash Request
For a splash request via the proxy, add a 'proxy' argument to the SplashRequest object. Without this argument, you may receive a 503 service unavailable response.
Click the link to view sample code for a splash request.
Selenium + Chrome
To configure the Python webdriver for Selenium to use Chrome, see How do i set proxy for chrome in python webdriver. Be sure to use IP authentication before configuring Selenium.
Selenium + Firefox
To set the network proxy settings for Selenium to use Firefox, you can do something like this. (Be sure to use IP authentication before configuring Selenium):
profile = webdriver.FirefoxProfile() profile.set_preference("network.proxy.type", 1) profile.set_preference("network.proxy.http", 'HOST') profile.set_preference("network.proxy.http_port", 31280) profile.set_preference("network.proxy.ssl", 'HOST') profile.set_preference("network.proxy.ssl_port", 31280) driver = webdriver.Firefox(firefox_profile=profile)
Please see our blog for Python topics: