Anonymity networks are designed to provide users with privacy and are often used to overcome censorship. These networks encrypt and relay traffic through multiple destinations to hide the origin of internet traffic to provide this privacy. The networks make private the actions of an Internet user but their operation is not secret, so their service can be blocked. In order to overcome this weakness these services can masquerade as other types of service, such as a Skype call. This is not a perfect solution for those seeking confidentiality however as it is still possible to detect different forms of traffic by analysing the flow of the traffic.
To better understand the capabilities of this technique Shabar and Zincir-Heywood created a machine learning based approach to flow analysis. Their study explored the application of flow analysis to detect data traffic from anonymity networks.
Collecting privacy network data might adversely affect network users’ privacy, which presents a complications for researchers of privacy networks. Research has often relied on data collected from a simulated environment or from researchers themselves. This research was based on the Anon17 dataset produced by the NIMS lab over a period of three years and which contains traffic from three well-known anonymity networks, Tor, JonDonym and I2P. The dataset has been modified to anonymize the data while maintaining its research value. This dataset has been made available to other researchers.
The researchers used a selection of machine learning algorithms to identify the extracted traffic flows. They were able to identify the traffic successfully with greater than 90% accuracy, up to 99%. The accuracy of detection varied based on the applications, the anonymity network, and the user’s configuration. Since anonymity networks may be diverse and the flow analysis is widely used to study anonymity networks, it is of importance to explore the boundaries of the flow analysis potential in anonymity network analysis. Using a category-based approach, the study found that the flow analysis has high accuracy in the identification of the encrypted anonymity networks.
Flow based traffic analysis can identify concealed privacy network traffic on a connection.