Skip to main content Skip to local navigation

DNS over HTTPS ( BCCC-CIRA-CIC-DoHBrw-2020 )

The 'BCCC-CIRA-CIC-DoHBrw-2020' dataset was created to address the imbalance in the 'CIRA-CIC-DoBre-2020' dataset. Unlike the 'CIRA-CIC-DoHBrw-2020' dataset, which is skewed with about 90% malicious and only 10% benign Domain over HTTPS (DoH) network traffic, the 'BCCC-CIRA-CIC-DoHBrw-2020' dataset offers a more balanced composition. It includes equal numbers of malicious and benign DoH network traffic instances, with 249,836 instances in each category. This balance was achieved using the Synthetic Minority Over-sampling Technique (SMOTE). The 'BCCC-CIRA-CIC-DoHBrw-2020' dataset comprises three CSV files: one for malicious DoH traffic, one for benign DoH traffic, and a third that combines both types.

The full research paper outlining the details of the dataset and its underlying principles:

https://link.springer.com/article/10.1007/s12083-023-01597-4“Unveiling DoH Tunnel: Toward Generating a Balanced DoH EncryptedTraffic Dataset and Profiling malicious Behaviour using InherentlyInterpretable Machine Learning“, Sepideh Niktabe, Arash Habibi Lashkari, Arousha Haghighian Roudsari, Peer-to-Peer Networking and Applications, Vol. 17, 2023

Download Dataset: