Thursday, June 2, 2011

使用Privoxy实现通用选择性代理功能

所谓选择性代理,就是对特定的URL采取特定的代理方案。拿翻墙来说,如果是墙内的网站,我们当然不想翻出去再翻进来上了,太费劲,而且还未必翻得进来。

AutoProxy在Firefox用户中很流行,就是因为它提供了针对GFW的智能代理选择功能。这是一个伟大的扩展,而与扩展本身同样伟大的,是这个项目维护的GFWList。这个列表用一些简单的规则列出了无数当前被墙的网址。

但是AutoProxy的通用性是比较受限的,据我所知,除了Firefox以外,就只有Chrome是被支持的。于是就有人写了一个叫AutoProxy2Pac的工具,将AutoProxy的GFWList规则转换成PAC。由于主流浏览器都支持PAC,GFWList的通用性大大提升。

但是仍然有一个问题,就是除了浏览器之外呢?一般的软件支持HTTP代理一般不会有问题,但是给它一个PAC,它的反应最多就是:“虾米?”

解决的办法,就是把选择性代理的任务交给代理服务器。准确地说,这是一个二级代理。打个比方,这就像是网购,亲们向店主妹妹下单就是了,至于店主妹妹从哪里进货亲们是不管的。但店主妹妹就要考虑了,这匹马国内有,又是特产,必须从国内进货;那只鸟却不一样,国内绝种,得找国外的朋友代买才行。

这个店主妹妹的名字叫Privoxy。

Privoxy的好处是它支持HTTP和SOCKS的上游代理,并且允许用户使用灵活的规则配置选择性地使用代理。另外非常棒的一点是,当上游代理为SOCKS4a/SOCKS5的时候,它会要求上游代理进行DNS解析,从而有效避免DNS污染。这一点也是PAC本身无法实现的。

那么首先简单介绍一下Privoxy的使用。注意以下提到的文件路径适用于Linux,其他系统请参考Privoxy用户手册。一个简单的例子:

{+forward-override{forward-socks5 127.0.0.1:7127 .}}
.youtube.com

这段规则是说,对于youtube.com及其子域名使用位于127.0.0.1:7127的SOCKS5代理,比如SSH、Tor等。将上面这段代码写到/etc/privoxy/gfw.action,然后编辑配置文件/etc/privoxy/config,加上:

actionsfile gfw.action

现在启动privoxy服务:

# /etc/init.d/privoxy start

Privoxy的默认监听地址是127.0.0.1:8118,现在打开浏览器,把这个地址设置为HTTP代理的地址,接着访问YouTube试试。

不出意外的话,YouTube现在是通过代理访问的,而其他被墙网站是上不去的。很自然的,这个时候我们就想到了GFWList。

AutoProxy2Privoxy是一个将AutoProxy的规则转换成Privoxy规则的Shell脚本。即使你不能运行这个脚本,你仍然可以从这里下载成品gfw.action。记住修改第一行的代理地址,然后复制到/etc/privoxy/gfw.action

至此,店主妹妹学徒毕业,正式接任。

Wednesday, May 25, 2011

Set Up SSH to Bypass GFW - The Definitive Guide

In this article, I'm gonna show you how to set up SSH to bypass GFW, and how to make the most out of an SSH connection in various ways.

The presumed platform is Linux, but OS X might probably be fine.

Configure SSH

Now you have acquired an SSH account. You usually get a username/password pair, but you find it a bit inconvenient having to type the password every time you log in. Good news is you don't have to: by using public key authentication instead of password authentication.

Generate a new public/private key pair locally:

$ ssh-keygen -t rsa -C "Bye bye GFW" -f ~/.ssh/id_rsa_gfw
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cckpg/.ssh/id_rsa_gfw.
Your public key has been saved in /home/cckpg/.ssh/id_rsa_gfw.pub.
The key fingerprint is:
01:0f:f4:3b:ca:85:d6:17:a1:7d:f0:68:9d:f0:a2:db Bye bye GFW
The key's randomart image is:
+--[ RSA 2048]----+
|     .+   +      |
|       = o O .   |
|        = * *    |
|       o = +     |
|      o S .      |
|     o o =       |
|      o . E      |
|                 |
|                 |
+-----------------+

Here we leave the passphrase empty for convenience.

Now upload the public key to the server (sshd(8), scp(1)):

$ scp -p ~/.ssh/id_rsa_gfw.pub ssh.gfw:.ssh/authorized_keys

You will be prompted for your password (for the last time, yeah!). In the command above, ssh.gfw is a Host I defined in ~/.ssh/config:

Host ssh.gfw
  HostName 75.186.63.204
  Port 22
  User cckpg
  IdentityFile ~/.ssh/id_rsa_gfw
  HashKnownHosts yes
  StrictHostKeyChecking ask

See ssh_config(5) if something is unclear.

OK. SSH configuration is done. Now you don't ever have to type that password again. Just keep your private key ~/.ssh/id_rsa_gfw safe.

SSH As SOCKS Proxy

The famous ssh -D is something hard to not have heard of. Basically the -D option provides a SOCKS proxy listening at the address:port specified.

Here I just give you an example with several additional sane flags:

$ ssh -v24NnD 127.0.0.1:7127 ssh.gfw

See ssh(1) for explanations. You may also want -C for a slow connection.

To quickly test it out, set your browser to use SOCKS5 proxy at 127.0.0.1:7127.

For Firefox, the setting goes like this:

Edit -> Preferences -> Advanced -> Network -> Settings ->
Manual proxy configuration -> SOCKS Host:127.0.0.1 Port:7127 SOCKS v_5

Now, open some long-time-no-see websites, like twitter.com.

Use Proxy Auto-Config

Since our purpose is just to unblock the blocked contents, we don't really need to connect to all web sites through the proxy. In fact, some contents inside the GFW are not available from outside of it.

It is wise to only proxify connections that otherwise would be blocked or faked. How do you do that? One answer is Proxy Auto-Config.

OK, I see, but just how the GFW do I write a PAC file that covers all blocked sites? There're simply too many of them!

The answer is, as usual, Open Source: the autoproxy-gfwlist project maintains a GFW list, which covers a large list of blocked URLs, for the Firefox AutoProxy addon. You may just use this addon, but I'd like something lighter, and being Firefox-only, the addon is not as universal as a PAC file.

Next into the spotlight is, tada! the autoproxy2pac project! What it does is convert the autoproxy-gfwlist list to a PAC file. The URL to the PAC file is:

http://autoproxy2pac.appspot.com/pac/TYPE/IP/PORT

TYPE being either proxy or socks, and IP and PORT being the IP and port of your proxy. So in my case:

http://autoproxy2pac.appspot.com/pac/socks/127.0.0.1/7127

There is also an autoproxy2pac mirror provided by @yegle. Say thanks :)

The PAC file retrieved is cryptically encoded in order to bypass the GFW keyword filtering. It's OK to use it as is, but for the sake of readability and possible further modifications, I always decode it:

#!/bin/bash
# Produce an anti-GFW PAC (Proxy Auto Configuration), the easy way.

host=${1%:*} port=${1#*:} # e.g. 127.0.0.1:7127 as $1
url=https://yegle.net/autoproxy2pac/socks/$host/$port
js <(wget -O- "$url" | sed 's/eval(/print(/')

Look into the humanized PAC file, Line 7:

var PROXY = "SOCKS 127.0.0.1:7127";

which corresponds to what we specified in the autoproxy2pac URL. For Firefox, the proxy type specified can be PROXY | SOCKS | SOCKS4 | SOCKS5 (nsIProxyAutoConfig). PROXY is for HTTP(S) proxies, the rest for SOCKS.

OK. The hard part is resolved. Now you just point your browser to this PAC file. All major browsers support this feature. For Firefox 4, the setting goes:

Edit -> Preferences -> Advanced -> Network -> Settings ->
Automatic proxy configuration URL: file:///home/cckpg/gfw/autoproxy.pac

Proxify DNS Queries

Setting a SOCKS proxy for your browser does not necessarily proxify your DNS queries, meaning you still may not reach some of the sites (e.g. twitter.com) due to DNS cache pollution or hijacking. In Firefox, this can be eliminated by setting the following preference through about:config:

network.proxy.socks_remote_dns = true

For manual proxy configuration, this will bypass local DNS resolution totally, and request DNS resolutions from the remote host, ensuring better reliability and privacy at the same time.

For Proxy Auto-Config setups, even with socks_remote_dns, URLs that should be requested through SOCKS proxy still trigger local DNS queries. This does not affect browsing, since the effective DNS resolution is done remotely, but the counterintuitive behavior may be a privacy concern for some. To solve this issue, set the preference network.dns.disablePrefetch to true through about:config to disable DNS prefetch. Hosts that are not proxied will, of course, be resolved by your local DNS resolver.

There are other more general ways to do DNS queries on the remote side. We will cover those later.

Proxify Any Program

Not everything is done inside your browser, but not every program supports all types of proxies. Luckily, every problem tends to have got an answer for us ;)

For example, the excellent downloader wget does not understand SOCKS. This is where tsocks comes in:

$ tsocks wget -qO- http://ip.appspot.com

will print the IP address of the remote host. Cool? Indeed.

And there's something even cooler: proxychains. Apart from its ability to chain mixed types of proxies all into one proxy chain, it supports DNS tunneling. This is important, as I explained earlier regarding the Firefox preference network.proxy.socks_remote_dns.

To see the difference:

$ tsocks curl twitter.com --head --verbose
* About to connect() to twitter.com port 80 (#0)
*   Trying 159.24.3.173...
^C
$ proxychains curl twitter.com --head --verbose
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| twitter.com
|S-chain|-<>-127.0.0.1:7127-<><>-4.2.2.2:53-<><>-OK
|DNS-response| twitter.com is 199.59.148.10
* About to connect() to twitter.com port 80 (#0)
*   Trying 199.59.148.10...
|S-chain|-<>-127.0.0.1:7127-<><>-199.59.148.10:80-<><>-OK
connected
* Connected to twitter.com (199.59.148.10) port 80 (#0)
...

A WHOIS query reveals 159.24.3.173 to be "MCI Telecommunications Corporation", way too long a name for a microblogging system ("Twitter Inc." that is.)

The config /etc/proxychains.conf is quite self-explanatory:

strict_chain
proxy_dns
tcp_read_time_out 15000
tcp_connect_time_out 8000
[ProxyList]
socks5  127.0.0.1 7127

A potential glitch: setuid programs will not be proxified. I haven't tested it.

Privoxy As HTTP Proxy

By far, we have been focusing on proxying connections directly through the SOCKS proxy, and have managed to do so for almost all types of applications. In this section, I'm gonna introduce you an alternative approach.

Privoxy is an HTTP proxy that supports forwarding reqests to HTTP or SOCKS proxies. It is widely used in conjunction with Tor, for example.

Basically, we put privoxy between applications and the SSH SOCKS server, acting as an HTTP proxy as far as the applications are concerned.

This alternative approach simplifies things for HTTP(S) connections:

  • Any program that understands HTTP proxy, can talk to privoxy directly.
  • With SOCKS5, all DNS resolution will happen on the remote server, rendering network.proxy.socks_remote_dns moot, and all programs using this HTTP proxy will get correct DNS resolutions as well.
  • Privoxy supports flexible forwarding rules, effectively voiding the need for a PAC file, and being universal for any program.

If that sounds interesting, read on.

To setup privoxy to forward requests to your SSH SOCKS5 server, put this in /etc/privoxy/config:

forward-socks5 / 127.0.0.1:7127 .

Now make sure you have ssh -D running, and start the privoxy daemon with the command rc start privoxy, for example.

By default, privoxy listens on localhost:8118. Quickly test it out with your browser; for Firefox users:

Edit -> Preferences -> Advanced -> Network -> Settings ->
Manual proxy configuration -> HTTP Host:127.0.0.1 Port:8118

As for PAC, you need to configure the proxy type to PROXY. Note also, for PAC setups, DNS prefetch will always query local DNS resolvers regardless whether the proxy will do DNS resolution or not. Set network.dns.disablePrefetch to true to disable this behavior, as mentioned before.

Wait, screw Proxy Auto-Config! (Yes I promised.) Let's opt for privoxy's builtin, flexible pattern-based forward rules. I'm not going to cover the details here, but give you the answer to the question that is now on your mind: Yes, there is a written list of GFW proxy forward rules for privoxy! And of course, it's based on autoproxy-gfwlist, no doubt about that :)

You should read the README on the AutoProxy2Privoxy page for instructions, but here is what you need to do in brief. First you need to edit the following line in gfw.action according to address:port of your SOCKS proxy:

{+forward-override{forward-socks5 127.0.0.1:7127 .}}

Then issue the following commands as root:

# cp gfw.action /etc/privoxy/
# chown privoxy:privoxy /etc/privoxy/gfw.action
# chmod 660 /etc/privoxy/gfw.action

Now edit /etc/privoxy/config, adding this line:

actionsfile gfw.action

Also, comment out the SOCKS5 forward rule which we set before:

#forward-socks5 / 127.0.0.1:7127 .

Privoxy should automatically pick up the new config. Now just point your program to privoxy, who will automatically determine whether to forward to SOCKS or not.

As a sidenote, privoxy originally stands for "Privacy Enhancing Proxy". It has advanced filtering capabilities and can be used to enhance your privacy, or block obnoxious advertisements. I suggest you read more about it.

Share Your Proxy

In the examples above, we always set up the proxy to listen on the loop device, providing service exclusively to our own computer. In reality, you probably want to share this proxy with others. For example, to allow your iPhone to twitter through this proxy.

Usually it's safe to allow access from clients in your local home network, but if you want to serve clients from the internet, make sure you set up your firewall correctly.

Example for a privoxy setup (/etc/privoxy/config):

listen-address 192.168.1.100:8118

192.168.1.100 is the static address assigned to my computer by my router, so that I don't have to change this address every now and then.

Of course, you can put the proxy on your router, which is better. It just might take a little more work to set it up.

Now if you have your iPhone connected to the same wireless network, point it to the HTTP proxy at 192.168.1.100:8118. If it doesn't work, try restarting privoxy.

Another example to share the SOCKS proxy on all interfaces:

$ ssh -v24NnD :7127 ssh.gfw

Be careful with that.

That's about it briefly. For more, read the manpages and Google is your friend.

Other Things to Try

sshuttle turns SSH into a poor man's VPN.