Skip to main content Skip to local navigation
Home » Cybersecurity Data Analyzers & Datasets » Cybersecurity Datasets (Intelligence-led Security) » Large-Scale Intrusion Detection Dataset (BCCC-CSE-CIC-IDS2018)

Large-Scale Intrusion Detection Dataset (BCCC-CSE-CIC-IDS2018)

The BCCC-CSE-CIC-IDS2018 dataset is an enhanced version of CSE-CIC-IDS2018 with 46 million labelled records and 300 features, addressing key issues to improve data quality and reliability for behavioral profiling in IDS research. Labeling inconsistencies, particularly for DoS attacks, were corrected by aligning attack labels with attacker IPs instead of timestamps. NTLFlowLyzer, a new network traffic analyzer, was developed to resolve anomalies in extracted features and refine feature implementation. Additionally, protocol issues were fixed by removing UDP-based attacks previously misclassified due to TCP-specific analysis. Attacks with insufficient flow counts were retained but excluded from analysis and profiling. The dataset now includes an expanded feature set to detect evolving cyber threats better, making it a robust benchmark for AI-driven IDS/IPS research.

The full research paper outlining the details of the dataset and its underlying principles:

"Toward Generating a Large Scale Intrusion Detection Dataset and Intruders Behavioral Profiling Using Network and Transportation Layers Traffic Flow Analyzer (NTLFlowLyzer)", MohammadMoein Shafi, Arash Habibi Lashkari & Arousha Haghighian Roudsari, Journal of Network and Systems Management, Vol 33, article 44, 2025

Download Dataset: