Wednesday, May 25, 2011

Set Up SSH to Bypass GFW - The Definitive Guide

In this article, I'm gonna show you how to set up SSH to bypass GFW, and how to make the most out of an SSH connection in various ways.

The presumed platform is Linux, but OS X might probably be fine.

Configure SSH

Now you have acquired an SSH account. You usually get a username/password pair, but you find it a bit inconvenient having to type the password every time you log in. Good news is you don't have to: by using public key authentication instead of password authentication.

Generate a new public/private key pair locally:

$ ssh-keygen -t rsa -C "Bye bye GFW" -f ~/.ssh/id_rsa_gfw
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cckpg/.ssh/id_rsa_gfw.
Your public key has been saved in /home/cckpg/.ssh/id_rsa_gfw.pub.
The key fingerprint is:
01:0f:f4:3b:ca:85:d6:17:a1:7d:f0:68:9d:f0:a2:db Bye bye GFW
The key's randomart image is:
+--[ RSA 2048]----+
|     .+   +      |
|       = o O .   |
|        = * *    |
|       o = +     |
|      o S .      |
|     o o =       |
|      o . E      |
|                 |
|                 |
+-----------------+

Here we leave the passphrase empty for convenience.

Now upload the public key to the server (sshd(8), scp(1)):

$ scp -p ~/.ssh/id_rsa_gfw.pub ssh.gfw:.ssh/authorized_keys

You will be prompted for your password (for the last time, yeah!). In the command above, ssh.gfw is a Host I defined in ~/.ssh/config:

Host ssh.gfw
  HostName 75.186.63.204
  Port 22
  User cckpg
  IdentityFile ~/.ssh/id_rsa_gfw
  HashKnownHosts yes
  StrictHostKeyChecking ask

See ssh_config(5) if something is unclear.

OK. SSH configuration is done. Now you don't ever have to type that password again. Just keep your private key ~/.ssh/id_rsa_gfw safe.

SSH As SOCKS Proxy

The famous ssh -D is something hard to not have heard of. Basically the -D option provides a SOCKS proxy listening at the address:port specified.

Here I just give you an example with several additional sane flags:

$ ssh -v24NnD 127.0.0.1:7127 ssh.gfw

See ssh(1) for explanations. You may also want -C for a slow connection.

To quickly test it out, set your browser to use SOCKS5 proxy at 127.0.0.1:7127.

For Firefox, the setting goes like this:

Edit -> Preferences -> Advanced -> Network -> Settings ->
Manual proxy configuration -> SOCKS Host:127.0.0.1 Port:7127 SOCKS v_5

Now, open some long-time-no-see websites, like twitter.com.

Use Proxy Auto-Config

Since our purpose is just to unblock the blocked contents, we don't really need to connect to all web sites through the proxy. In fact, some contents inside the GFW are not available from outside of it.

It is wise to only proxify connections that otherwise would be blocked or faked. How do you do that? One answer is Proxy Auto-Config.

OK, I see, but just how the GFW do I write a PAC file that covers all blocked sites? There're simply too many of them!

The answer is, as usual, Open Source: the autoproxy-gfwlist project maintains a GFW list, which covers a large list of blocked URLs, for the Firefox AutoProxy addon. You may just use this addon, but I'd like something lighter, and being Firefox-only, the addon is not as universal as a PAC file.

Next into the spotlight is, tada! the autoproxy2pac project! What it does is convert the autoproxy-gfwlist list to a PAC file. The URL to the PAC file is:

http://autoproxy2pac.appspot.com/pac/TYPE/IP/PORT

TYPE being either proxy or socks, and IP and PORT being the IP and port of your proxy. So in my case:

http://autoproxy2pac.appspot.com/pac/socks/127.0.0.1/7127

There is also an autoproxy2pac mirror provided by @yegle. Say thanks :)

The PAC file retrieved is cryptically encoded in order to bypass the GFW keyword filtering. It's OK to use it as is, but for the sake of readability and possible further modifications, I always decode it:

#!/bin/bash
# Produce an anti-GFW PAC (Proxy Auto Configuration), the easy way.

host=${1%:*} port=${1#*:} # e.g. 127.0.0.1:7127 as $1
url=https://yegle.net/autoproxy2pac/socks/$host/$port
js <(wget -O- "$url" | sed 's/eval(/print(/')

Look into the humanized PAC file, Line 7:

var PROXY = "SOCKS 127.0.0.1:7127";

which corresponds to what we specified in the autoproxy2pac URL. For Firefox, the proxy type specified can be PROXY | SOCKS | SOCKS4 | SOCKS5 (nsIProxyAutoConfig). PROXY is for HTTP(S) proxies, the rest for SOCKS.

OK. The hard part is resolved. Now you just point your browser to this PAC file. All major browsers support this feature. For Firefox 4, the setting goes:

Edit -> Preferences -> Advanced -> Network -> Settings ->
Automatic proxy configuration URL: file:///home/cckpg/gfw/autoproxy.pac

Proxify DNS Queries

Setting a SOCKS proxy for your browser does not necessarily proxify your DNS queries, meaning you still may not reach some of the sites (e.g. twitter.com) due to DNS cache pollution or hijacking. In Firefox, this can be eliminated by setting the following preference through about:config:

network.proxy.socks_remote_dns = true

For manual proxy configuration, this will bypass local DNS resolution totally, and request DNS resolutions from the remote host, ensuring better reliability and privacy at the same time.

For Proxy Auto-Config setups, even with socks_remote_dns, URLs that should be requested through SOCKS proxy still trigger local DNS queries. This does not affect browsing, since the effective DNS resolution is done remotely, but the counterintuitive behavior may be a privacy concern for some. To solve this issue, set the preference network.dns.disablePrefetch to true through about:config to disable DNS prefetch. Hosts that are not proxied will, of course, be resolved by your local DNS resolver.

There are other more general ways to do DNS queries on the remote side. We will cover those later.

Proxify Any Program

Not everything is done inside your browser, but not every program supports all types of proxies. Luckily, every problem tends to have got an answer for us ;)

For example, the excellent downloader wget does not understand SOCKS. This is where tsocks comes in:

$ tsocks wget -qO- http://ip.appspot.com

will print the IP address of the remote host. Cool? Indeed.

And there's something even cooler: proxychains. Apart from its ability to chain mixed types of proxies all into one proxy chain, it supports DNS tunneling. This is important, as I explained earlier regarding the Firefox preference network.proxy.socks_remote_dns.

To see the difference:

$ tsocks curl twitter.com --head --verbose
* About to connect() to twitter.com port 80 (#0)
*   Trying 159.24.3.173...
^C
$ proxychains curl twitter.com --head --verbose
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| twitter.com
|S-chain|-<>-127.0.0.1:7127-<><>-4.2.2.2:53-<><>-OK
|DNS-response| twitter.com is 199.59.148.10
* About to connect() to twitter.com port 80 (#0)
*   Trying 199.59.148.10...
|S-chain|-<>-127.0.0.1:7127-<><>-199.59.148.10:80-<><>-OK
connected
* Connected to twitter.com (199.59.148.10) port 80 (#0)
...

A WHOIS query reveals 159.24.3.173 to be "MCI Telecommunications Corporation", way too long a name for a microblogging system ("Twitter Inc." that is.)

The config /etc/proxychains.conf is quite self-explanatory:

strict_chain
proxy_dns
tcp_read_time_out 15000
tcp_connect_time_out 8000
[ProxyList]
socks5  127.0.0.1 7127

A potential glitch: setuid programs will not be proxified. I haven't tested it.

Privoxy As HTTP Proxy

By far, we have been focusing on proxying connections directly through the SOCKS proxy, and have managed to do so for almost all types of applications. In this section, I'm gonna introduce you an alternative approach.

Privoxy is an HTTP proxy that supports forwarding reqests to HTTP or SOCKS proxies. It is widely used in conjunction with Tor, for example.

Basically, we put privoxy between applications and the SSH SOCKS server, acting as an HTTP proxy as far as the applications are concerned.

This alternative approach simplifies things for HTTP(S) connections:

  • Any program that understands HTTP proxy, can talk to privoxy directly.
  • With SOCKS5, all DNS resolution will happen on the remote server, rendering network.proxy.socks_remote_dns moot, and all programs using this HTTP proxy will get correct DNS resolutions as well.
  • Privoxy supports flexible forwarding rules, effectively voiding the need for a PAC file, and being universal for any program.

If that sounds interesting, read on.

To setup privoxy to forward requests to your SSH SOCKS5 server, put this in /etc/privoxy/config:

forward-socks5 / 127.0.0.1:7127 .

Now make sure you have ssh -D running, and start the privoxy daemon with the command rc start privoxy, for example.

By default, privoxy listens on localhost:8118. Quickly test it out with your browser; for Firefox users:

Edit -> Preferences -> Advanced -> Network -> Settings ->
Manual proxy configuration -> HTTP Host:127.0.0.1 Port:8118

As for PAC, you need to configure the proxy type to PROXY. Note also, for PAC setups, DNS prefetch will always query local DNS resolvers regardless whether the proxy will do DNS resolution or not. Set network.dns.disablePrefetch to true to disable this behavior, as mentioned before.

Wait, screw Proxy Auto-Config! (Yes I promised.) Let's opt for privoxy's builtin, flexible pattern-based forward rules. I'm not going to cover the details here, but give you the answer to the question that is now on your mind: Yes, there is a written list of GFW proxy forward rules for privoxy! And of course, it's based on autoproxy-gfwlist, no doubt about that :)

You should read the README on the AutoProxy2Privoxy page for instructions, but here is what you need to do in brief. First you need to edit the following line in gfw.action according to address:port of your SOCKS proxy:

{+forward-override{forward-socks5 127.0.0.1:7127 .}}

Then issue the following commands as root:

# cp gfw.action /etc/privoxy/
# chown privoxy:privoxy /etc/privoxy/gfw.action
# chmod 660 /etc/privoxy/gfw.action

Now edit /etc/privoxy/config, adding this line:

actionsfile gfw.action

Also, comment out the SOCKS5 forward rule which we set before:

#forward-socks5 / 127.0.0.1:7127 .

Privoxy should automatically pick up the new config. Now just point your program to privoxy, who will automatically determine whether to forward to SOCKS or not.

As a sidenote, privoxy originally stands for "Privacy Enhancing Proxy". It has advanced filtering capabilities and can be used to enhance your privacy, or block obnoxious advertisements. I suggest you read more about it.

Share Your Proxy

In the examples above, we always set up the proxy to listen on the loop device, providing service exclusively to our own computer. In reality, you probably want to share this proxy with others. For example, to allow your iPhone to twitter through this proxy.

Usually it's safe to allow access from clients in your local home network, but if you want to serve clients from the internet, make sure you set up your firewall correctly.

Example for a privoxy setup (/etc/privoxy/config):

listen-address 192.168.1.100:8118

192.168.1.100 is the static address assigned to my computer by my router, so that I don't have to change this address every now and then.

Of course, you can put the proxy on your router, which is better. It just might take a little more work to set it up.

Now if you have your iPhone connected to the same wireless network, point it to the HTTP proxy at 192.168.1.100:8118. If it doesn't work, try restarting privoxy.

Another example to share the SOCKS proxy on all interfaces:

$ ssh -v24NnD :7127 ssh.gfw

Be careful with that.

That's about it briefly. For more, read the manpages and Google is your friend.

Other Things to Try

sshuttle turns SSH into a poor man's VPN.