Pi-Hole in Go

Quick Update

I’ve not been writing here much lately, and I think it’s for a pretty good reason. My daughter was born about a month ago, and I have been very busy there! In getting ready for the baby, my “office” has become the baby’s room, which means that my hardware projects are safely stashed away in the basement (for now). I do intend to get back to them once I get a decent schedule in place.

Pi-hole

Earlier this year I set up Pi-hole in my home. For those unaware, Pi-hole is a network-wide ad blocker. This means that any device using my network at home will benefit from ad blocking. I’ve had a wildly unnecessary Dell server running in my basement for the last few years, and initially set it up there, since it had a static IP already. Configuring my router to use the server as a DNS server was fairly straight-forward, and for the most part it worked flawlessly.

There were some minor issues I had when updating it, which Google helped me to solve. These issues generally were due to nsswitch.conf, and for some reason it seemed like I had something else on the server that was fighting with the Pi-hole over config files. I finally decided to shut down the server, and set up a new instance Pi-hole from scratch on an old Raspberry Pi I had around, and that issue seems to have resolved itself (for now?).

Out of curiosity, I started to take a peek at the codebase and realized that a good chunk of the code is simply shell scripts. On the one hand, this is impressive and fairly portable! On the other, it seemed like something I could set out to write for myself, as a learning exercise.

Background Research

I knew the very high-level basics of what DNS is going into this: essentially a way to convert a name (e.g. google.com) into a resolvable IP address (e.g. 172.217.10.78). I knew that it relied on UDP for communication, for speed and simplicity (I learned later that sometimes a TCP connection is used in certain cases).

I found the “DNS Protocol” article on NS1’s website to be pretty informative, and used this as a good starting point. I came up with a pretty simple sketch for what I wanted to do:

  1. Listen for UDP packets on port 53
  2. When a DNS request is received, determine whether to block it or not using a black/white list
  3. Respond to the client with a valid response

The Code

As I’ve found over time, Go has some pretty great packages already built in. In this case, Google has created the dnsmessage package, which handles a lot of the packing/unpacking of packets, and makes it easy to work with the protocol.

Listen for UDP packets

This is where I started. I created an event loop that would listen for requests and farm them out to handlers. The code for this is quite straight-forward, and really didn’t change much from the first time I wrote it:

    s.conn, err = net.ListenUDP("udp", &net.UDPAddr{Port: 53})
	if err != nil {
		log.Fatalf("Failed to listen to UDP port %v", err)
	}
	defer s.conn.Close()

	for {
		packetBuffer := make([]byte, 512)
		_, remoteAddr, err := s.conn.ReadFromUDP(packetBuffer)
		if err != nil {
			log.Fatalf("Failed to read packets from UDP port: %v", err)
		}

		go handleReceivedDnsRequest(packetBuffer, remoteAddr)
	}

That’s it! The code here is fairly self-explanatory, so I won’t say much except to mention that I chose to call handleReceivedDnsRequest(...) as a goroutine, so that this code can be parallelized. I don’t expect to have much of a load at home, but it seemed like good practice to do it this way.

Black/White Lists

To get started, I decided to take a very simple approach to filtering domains: I’m not going to support regular expressions to start with. I’d love to add this down the road, but to start, I think its a good simplification to simply use explicit lists. To this end, I decided that the blacklist and whitelist would be simply []string slices. Down the road, it may make sense to use a database for this, but for a starting point, this should work nicely.

Blacklist

I decided to use Pi-hole’s approach of using Hosts files as a way to define block lists. Hosts files are well-known and simple to understand, and it will allow me to quickly build lists of domains to block.

Again to simplify things, I chose to discard any IP information in hosts files. This is for two reasons: essentially I plan to “block” the domains–I don’t need them to redirect to specific places, and because it’s less data to carry around.

I wrote a quick parser to retrieve Hosts files from a given URL (e.g. https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts) and parse it into a map of domain name to IP address. Then I simply iterated through the map to retrieve the keys (i.e. domain names), discarding the IP address information.

Whitelist

The whitelist is expected to be much shorter, and probably more personalized. As such, I don’t expect this to be parsed from several remote lists, and even if it is, I suspect the parsing will be of a simple list, not a map, so using a Hosts file for this is unnecessary.

Logic

So we’ve received the DNS request and parsed it into a Go structure. The next step is to figure out whether or not to block this request. The logic I’ve chosen is simply:

allowedRequest = whitelisted or not(blacklisted)

Put another way: we will only block requests that are not on the whitelist and are on the blacklist.

A likely unnecessary thing I did was to support multiple “DNS Questions” (a DNS Question is essentially one domain name to look up) per DNS Request. In practice this is apparently fairly rare, but it was pretty simple to implement, so I went ahead and did it anyways. Basically the logic is to filter the questions into “block” and “allow” lists, creating DNS Answers for the blocked Questions that point to 0.0.0.0 (or ::/0 for an IPv6 Question), and forwarding the allowed DNS Questions to an upstream DNS Resolver (e.g. 8.8.8.8).

The way I implemented this upstream communication isn’t great. Since each incoming DNS request is handled in its own goroutine, I chose to have the upstream communication done over its own UDP channel. This means that if several DNS requests come in, there will be several concurrent UDP packet exchanges with the upstream server. In a future version, maybe I will choose to use a single centralized thread for communicating with the upstream DNS resolver. That said, this could also be a bottleneck and UDP ports are cheap, so there isn’t really a good reason to change it. Another thing I am relatively unhappy with is the logic for when something goes wrong in communicating with the upstream server. Right now, the goroutine will block indefinitely if the request or response UDP packet is dropped.

Once we have all of this information, we construct the response to the DNS request, which means essentially putting a list of DNS Answers into a packet that correspond to the DNS Questions that were asked of us, and sending a UDP packet back to the requester.

Testing

I tested this using dig in Ubuntu on Windows (WSL):

jfisher@JFisher-Desktop:~ $ dig @127.0.0.1 google.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5330
;; flags: qr aa; QUERY: 0, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; ANSWER SECTION:
google.com.             140     IN      A       172.217.10.110

;; Query time: 20 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Sep 20 21:04:14 EDT 2019
;; MSG SIZE  rcvd: 38

jfisher@JFisher-Desktop:~ $ dig @127.0.0.1 itunes.net

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 itunes.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8573
;; flags: qr aa; QUERY: 0, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; ANSWER SECTION:
itunes.net.             14399   IN      A       165.160.13.20
itunes.net.             14399   IN      A       165.160.15.20

;; Query time: 29 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Sep 20 21:04:21 EDT 2019
;; MSG SIZE  rcvd: 54    

and then with a blocked domain (using this block list):

jfisher@JFisher-Desktop:~ $ dig @127.0.0.1 googletagservices.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 googletagservices.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53880
;; flags: qr aa; QUERY: 0, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; ANSWER SECTION:
googletagservices.com.  0       IN      A       0.0.0.0

;; Query time: 14 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Sep 20 21:08:19 EDT 2019
;; MSG SIZE  rcvd: 49

Note the response is 0.0.0.0, as expected of a blocked domain.

My Code

If you want to take a peek at my code, it is on my GitHub: https://github.com/jonathanfisher/DnsFilter.

Going Forward

Some next steps, if I ever get to it:

  1. Track statistics of blocked domains
  2. Add a UI similar to Pi-hole’s
  3. Fix the above-mentioned bug where a dropped UDP packet will effectively leak resources

I am not sure that this method of blocking ads/tracking/etc. will be around very long. A simple way for some developers to get around this is to embed their own calls to DNS servers in their applications, bypassing the need for network-level DNS servers.