The Effect of DNS on Tor’s Anonymity

Overview  •  Writing  •  Code  •  Data  •  Contact

Overview

The domain name system (DNS) is a fundamental part of the Internet, mapping human-readable domains to machine-readable IP addresses. When fetching a web page in a browser, a DNS request almost always precedes the actual web traffic. This is also the case when using Tor Browser, the privacy-enhanced browser developed by The Tor Project to provide millions of users with anonymity online.

Overview A lot of research has gone into improving the Tor network, but its use of DNS has received little attention. In this research project, we set out to learn how DNS can harm the anonymity of Tor users, and how adversaries can leverage the DNS protocol to deanonymize users, as illustrated by the diagram to the right. We study (i) how exposed the DNS protocol is compared to web traffic, (ii) how Tor exit relays are configured to use DNS, (iii) how existing website fingerprinting attacks can be enhanced with DNS, and (iv) how effective these enhanced website fingerprinting attacks are at Internet-scale.

We show how an attacker can use DNS requests to mount highly precise website fingerprinting attacks: Mapping DNS traffic to websites is highly accurate even with simple techniques, and correlating the observed websites with a website fingerprinting attack greatly improves the precision when monitoring relatively unpopular websites. Our results show that DNS requests from Tor exit relays traverse numerous autonomous systems that subsequent web traffic does not traverse. We also find that a set of exit relays, at times comprising 40% of Tor’s exit bandwidth, uses Google’s public DNS servers—an alarmingly high number for a single organization. We believe that Tor relay operators should take steps to ensure that the network maintains more diversity into how exit relays resolve DNS domains.

What does our work mean for Tor users? As we outline in our blog post, we don’t believe that there is any immediate cause for concern. While our attacks work well in simulations, not many entities are in a position to mount them. Besides, they require non-trivial engineering effort to be reliable, and The Tor Project is already working on improved website fingerprinting defenses.

Writing

The main outcome of this research project is a paper that is going to be published at the Network and Distributed System Security Symposium in February 2017. In addition, we published detailed replication instructions, to make it easier to reproduce our results. All our writing is listed below.

Code

We have developed a tool, ddptr, which stands for “DNS Delegation Path Traceroute.” The tool determines the DNS delegation path for a fully qualified domain name, and then runs UDP traceroutes to all DNS servers on the path. These traceroutes are then compared to a TCP traceroute to the web server behind the same fully qualified domain name.

Baidu web path Baidu DNS path Now imagine that our machine is trying to establish a TCP connection to baidu.com. How many autonomous systems will our network packets traverse? The two images to the right show an example. (Click on the images for a larger version.) First, our machine has to resolve the domain before it can send packets to the IP address. The left image shows UDP traceroutes to all DNS1 servers in the delegation path for “baidu.com,” namely 192.58.128.30, 192.43.172.30, and 202.108.22.220. In total, these traceroutes traversed 13 different autonomous systems, illustrated by the rectangular boxes. The right image shows a TCP traceroute to “baidu.com.” The traceroute traversed at least four autonomous systems. In this simple example, we see that the DNS resolution process for baidu.com exposes our traffic to more autonomous system than the actual TCP connection, provided we run our own DNS resolver.

We also publish the (mostly Python and R) scripts that we used to analyse and plot our data. The git repository also contains the LaTeX source of our paper and the project page you are looking at.

git clone https://github.com/NullHypothesis/tor-dns.git

Data

We publish the following datasets. Each tarball contains a README.txt file that explains the respective dataset. We also want to encourage you to replicate our work and reproduce all our datasets. Our replication guide is meant to ease this task.

Exit resolver dataset

The following dataset is a collection of .pcap files that we captured on the authoritative DNS server for tor.nymity.ch. We used this dataset to identify the DNS resolvers of Tor exit relays. The tarball contains a README file that provides more details.

DNS exposure dataset

The following dataset contains the output of the tool ddptr, which we ran on a VPS operated by OVH. The tarball contains a README file that provides more details.

DNS request number dataset

The following dataset contains the number of DNS requests per five minute interval as recorded on our exit relay. The dataset contains two files, one for a reduced exit policy, and one for an exit policy containing only port 80 and 443.

Internet-scale simulation dataset

The following dataset contains data for the (i) fraction of compromised streams and (ii) time until first compromise for 10,000 simulated Tor users. We generated the data with TorPS and by running traceroutes.

Popularity of Alexa’s top 10,000 domains

The following dataset contains the popularity of Alexa’s top 10,000 web sites. We obtained the data from the respective Amazon AWS API.

DNS requests for Alexa top 1,000,000 domains

The following datasets contain all DNS requests recorded by Tor Browser 5.5.4 when configured to not to browse over Tor for Alexa top 1,000,00 on April 15th 2016. The data was collected using tbdnsw as part of the DefecTor toolset.

Website fingerprinting dataset for Alexa top 9,000x100 + Alexa 909,000x1

The following datasets contain a website fingerprinting dataset with 100 samples of Alexa top 9,000 (monitored sites) and one sample each of Alexa top 909,000 (unmonitored) collected with Tor Browser 5.5.4. The data was collected using tbw as part of the DefecTor toolset. The toolset also contains tools for extracting data. We use the same format for cells and extracted features as Wang et al.

Contact

We are a team of five researchers from three universities. Feel free to copy all of us if you have any questions or remarks.

At Princeton University:

At Karlstad University:

At KTH Royal Institute of Technology:


Last update: 2016-12-19