Skip to main content Skip to local navigation
Home » Posts tagged 'Dataset' (Page 2)

Dataset

Source Code Authorship Attribution (YU-SCAA-2022)

Source Code Authorship Attribution (SCAA) is a technique used to identify the actual author of source code within a corpus. Although it poses a privacy threat to open-source programmers, it is significantly helpful in developing forensic-based applications, such as ghostwriting detection, copyright dispute settlements, identifying authors of malicious applications using source code, and other code […]

Smart Contracts Vulnerabilities (BCCC-SCsVuls-2024)

The BCCC-SCsVuls-2024 dataset is a comprehensive resource for analyzing and detecting vulnerabilities in Solidity-based smart contracts, featuring 111,897 meticulously labeled samples across 11 vulnerabilities such as Re-entrancy (17,698), IntegerUO (16,740), DenialOfService (12,394), and Secure contracts (26,914). The dataset was curated from reputable sources like Smart Bugs, Ethereum SCs, and SmartScan-Dataset, ensuring diverse and representative vulnerability […]

Intrusion Detection Dataset (BCCC-CIC-IDS2017)

Using NLFlowLyzer, we successfully generated the “BCCC-CIC-IDS2017” dataset by extracting key flows from raw network traffic data of CIC-IDS2017, resulting in CSV files integrating essential network and transport layer features. This new dataset offers a structured approach for analyzing intrusion detection, combining diverse traffic types into multiple sub-categories. The “BCCC-CIC-IDS2017” dataset enriches the depth and […]

Large-Scale Intrusion Detection Dataset (BCCC-CSE-CIC-IDS2018)

The BCCC-CSE-CIC-IDS2018 dataset is an enhanced version of CSE-CIC-IDS2018 with 46 million labelled records and 300 features, addressing key issues to improve data quality and reliability for behavioral profiling in IDS research. Labeling inconsistencies, particularly for DoS attacks, were corrected by aligning attack labels with attacker IPs instead of timestamps. NTLFlowLyzer, a new network traffic […]

Large-Scale Multisources Malware Analysis Dataset using Network Traffic and Memory (BCCC-Mal-NetMem-2025)

The BCCC-Mal-NetMem-2025 dataset comprises over several million labeled records from controlled experiments involving 15 malware categories and 32 individual malware samples. These categories include ransomware, Trojan downloaders, coin miners, remote access tools (RATs), spyware, backdoors, and worms. The data was collected by executing each malware in isolated Windows environments equipped with real-time network and memory […]

Encrypted Traffic Dataset (BCCC-DarkNet-2025)

BCCC-DarkNet-2025 is an augmented, research-driven dataset that supports encrypted traffic analysis and threat detection across anonymized communication networks. It integrates and extends two benchmark datasets, CIC-Darknet2020 and Darknet-Dataset-2020, selected for their robust coverage of encryption protocols and darknet-specific traffic behaviors. The dataset includes diverse encrypted traffic types like VPN, Tor, I2P, Freenet, and ZeroNet, with multi-class labeling and protocol-specific annotations. These […]

Cloud DDoS Attacks (BCCC-cPacket-Cloud-DDoS-2024)

The distributed denial of service attack poses a significant threat to network security. The effectiveness of new detection methods depends heavily on well-constructed datasets. After an in-depth analysis of 16 publicly available datasets and identifying their shortcomings across various dimensions, the 'BCCC-cPacket-Cloud-DDoS-2024' is meticulously created, addressing challenges identified in previous datasets through a cloud infrastructure. […]

DNS over HTTPS ( BCCC-CIRA-CIC-DoHBrw-2020 )

The 'BCCC-CIRA-CIC-DoHBrw-2020' dataset was created to address the imbalance in the 'CIRA-CIC-DoBre-2020' dataset. Unlike the 'CIRA-CIC-DoHBrw-2020' dataset, which is skewed with about 90% malicious and only 10% benign Domain over HTTPS (DoH) network traffic, the 'BCCC-CIRA-CIC-DoHBrw-2020' dataset offers a more balanced composition. It includes equal numbers of malicious and benign DoH network traffic instances, with […]