Predicting Dark Web Connections Using Machine Learning Models
Introduction¶
The dark web, a hidden part of the internet accessible through technologies like the TOR browser, provides users with a high degree of privacy and anonymity. While these features are beneficial for legitimate purposes, they are also exploited for illegal activities, making it essential to classify and monitor traffic originating from such sources.
This project focuses on using machine learning, specifically Random Forests and Xgboost, to classify internet connections as originating from TOR (dark web) or non-TOR (regular web) traffic. By analyzing patterns in darknet data, this project aims to identify suspicious connections and contribute to cybersecurity efforts.
The dataset used in this project is stored as tor_data.rda and consists of internet traffic characteristics, including various features that distinguish TOR and non-TOR connections. Through exploratory data analysis (EDA) and the application of machine learning models, this project evaluates the effectiveness of Random Forests in distinguishing between these two types of traffic. Additionally, hyperparameter tuning is conducted to optimize the model’s performance.
