2006-04-29

Porn site? What porn site?

Part of the advantages of running your own home network is the amount of control you have over how things behave. For instance, a Linux box serves as the gateway for my entire home network to the internet. On this server, I run my own DNS.

One day, as I was checking my Hotmail account, and I thought to myself, "I wonder if someone has registered hotmale.com. It's probably some 'Am I Hot or Not' kind of thing." Well, needless to say, it wasn't. It was much more explicit. So, after closing it quickly, I started wondering if there was a way to add a list of known porn sites to my DNS server, so that it would resolve these addresses to an unreachable address, the same way it does with known ad servers now.

I googled for a porn site list, and eventually I found one that would work nicely at Internet Filter's site. The only problem is, it wasn't in a readily-consumable format for me to put into my DNS. Further searching, though, found this site (which describes using a proxy auto-config file in IE to selectively block sites). The method described there is pretty unsuitable for my purposes, but he did have a .vbs file available that would parse through Internet Filter's web site and extract the list of porn sites, exporting them to a file. I modified it slightly so that it would prepend zone " and append " {type master; file "/etc/bind/db.empty";}; to each URL, and away it went.

The only issue was, Internet Filter's site included IP addresses in its list. These don't go to the DNS to resolve, so putting them in would be pointless. With more time and patience, I could have modified Eric Phelps's script to exclude those, but I just decided to do it by hand. It's only 429,318 records to parse through, and with a regular expression, it's easy to find the IP addresses.

All that was left was to copy it to the server, translate the MS-DOS CR-LF to a Unix LF (I was surprised I could use cat -T [edit: a better way is tr -d "\015" < dos_file > unix_file, as it doesn't mess with any other non-printable characters]), and add it to my bind config.

Unfortunately, now it's not resolving any addresses. All requests time out. There are just too many sites in the list. Ah well, it was worth a shot. But I'm holding on to this, because there may be a way to use this in the future.

No comments: