CherryProxy - a filtering HTTP proxy extensible in Python

CherryProxy is a simple HTTP proxy written in Python 2.x, based on the CherryPy WSGI server and httplib, extensible for content analysis and filtering.

It has not been designed for operational use and the current version lacks some HTTP features (such as HTTPS support), so some websites will not display properly. However, it should be very useful for testing / demo / prototyping / educational purposes.

Why a new proxy

There are already quite a few HTTP proxies developed in Python, as shown by the excellent list maintained by xhaus. I needed a simple proxy with filtering features, so I looked at several of them. But they were either too simple with lacking features, or too complex to extend, so I decided to develop a different one.

I chose the CherryPy WSGI server because it provides a robust, thread-pooled HTTP server with good HTTP 1.1 support.

News:

  • 2011-11-22: moved CherryProxy to its own bitbucket project
  • 2011-11-15 v0.12: added parent proxy support

Download:

Get the zip archive from here, or use Mercurial to get the latest source code from here.

Install

On Windows, double-click on install.bat. On other systems, run "python setup.py install" from a shell.

License:

Open-source, BSD-style

Usage as a tool (simple proxy):

1) run CherryProxy.py [options]

Options:
  -h, --help            show this help message and exit
  -p PORT, --port=PORT  port for HTTP proxy, 8070 by default
  -a ADDRESS, --address=ADDRESS
                        IP address of interface for HTTP proxy (0.0.0.0 for
                        all, default=localhost)
  -f PROXY, --forward=PROXY
                        Forward requests to parent proxy, specified as
                        hostname[:port] or IP address[:port]
  -v, --verbose

2) setup your browser to use localhost:8070 as proxy

Usage in a Python Application:

- import cherryproxy
- create a subclass of cherryproxy.CherryProxy
- implement methods filter_request and/or filter_response to enable filtering as
  needed.
- see provided examples

Filtering API:

CherryProxy: class implementing a filtering HTTP proxy
 
To use it, create a class inheriting from CherryProxy and implement the
methods filter_request and filter_response as desired.
Then call the start method to start the proxy.
Note: the logging module needs to be initialized before creating a
CherryProxy object.
See the example scripts for more information.

__init__(self, address='localhost', port=8070, server_name='CherryProxy/0.12', debug=False, log_level=20, options=None, parent_proxy=None)
CherryProxy constructor
 
address: IP address of interface to listen to, or 0.0.0.0 for all
         (localhost by default)
port: TCP port for the proxy (8070 by default)
server_name: server name used in HTTP responses
debug: enable debugging messages if set to True
log_level: logging level (use constants from logging module)
options: None or optparse.OptionParser object to provide additional options
parent_proxy: parent proxy, either IP address or hostname, with optional
    port (example: 'myproxy.local:8080')
filter_request(self)
Method to be overridden:
Called to analyse/filter/modify the request received from the client,
after reading the full request with its body if there is one,
before it is sent to the server.
 
This method may call set_response() if the request needs to be blocked
before being sent to the server.
 
The following attributes can be read and MODIFIED:
    self.req.data: data sent with the request (POST or PUT)
    (and also all listed in filter_request_headers)
filter_request_headers(self)
Method to be overridden:
Called to analyse/filter/modify the request received from the client,
before reading the full request with its body if there is one,
before it is sent to the server.
 
This method may call set_response() if the request needs to be blocked
before being sent to the server.
 
The following attributes can be read and MODIFIED:
    self.req.headers: dictionary of HTTP headers, with lowercase names
    self.req.method: HTTP method, e.g. 'GET', 'POST', etc
    self.req.scheme: protocol from URL, e.g. 'http' or 'https'
    self.req.netloc: IP address or hostname of server, with optional
                     port, for example 'www.google.com' or '1.2.3.4:8000'
    self.req.path: path in URL, for example '/folder/index.html'
    self.req.query: query string, found after question mark in URL
 
The following attributes can be READ only:
    self.req.environ: dictionary of request attributes following WSGI
                      format (PEP 333)
    self.req.url: partial URL containing 'path?query'
    self.req.full_url: full URL containing 'scheme:netloc/path?query'
    self.req.length: length of request data in bytes, 0 if none
    self.req.content_type: content-type, for example 'text/html'
    self.req.charset: charset, for example 'UTF-8'
    self.req.url_filename: filename extracted from URL path
filter_response(self)
Method to be overridden:
Called to analyse/filter/modify the response received from the server,
after reading the full response with its body if there is one,
before it is sent back to the client.
 
This method may call set_response() if the response needs to be blocked
(e.g. replaced by a simple response) before being sent to the client.
filter_response_headers(self)
Method to be overridden:
Called to analyse/filter/modify the response received from the server,
before reading the full response with its body if there is one,
before it is sent back to the client.
 
This method may call set_response() if the response needs to be blocked
(e.g. replaced by a simple response) before being sent to the client.
set_response(self, status, reason=None, data=None, content_type='text/plain')
set a HTTP response to be sent to the client instead of the one from
the server.
 
- status: int, HTTP status code (see RFC 2616)
- reason: str, optional text for the response line, standard text by default
- data: str, optional body for the response, default="status reason"
- content_type: str, content-type corresponding to data
set_response_forbidden(self, status=403, reason='Forbidden', data=None, content_type='text/plain')
set a HTTP 403 Forbidden response to be sent to the client instead of
the one from the server.
 
- status: int, HTTP status code (see RFC 2616)
- reason: str, optional text for the response line, standard text by default
- data: str, optional body for the response, default="status reason"
- content_type: str, content-type corresponding to data
start(self)
start proxy server
stop(self)
stop proxy server

Bugs and suggestions

If you would like to report a bug or to send me suggestions, please send an e-mail to decalage at laposte.net, or use the project issue tracker.