Skip to main content Skip to local navigation
Home » Data Analyzers & Datasets » Cybersecurity Data Analyzers

Cybersecurity Data Analyzers

As part of our Understanding Cybersecurity Series (UCS) knowledge mobilization program, we design, develop, and release cybersecurity data analyzers as open-source packages. Our contribution to open-source projects stems from our belief in the open-source culture, which we consider a driving force for accessible development and a better world. We are convinced that open-source projects promote innovation, collaboration, and healthy competition, making them valuable to the community.

8. Network and Transportation Layers Flow Analyzer (NTLFlowLyzer)

The NTLFlowLyzer generates bidirectional flows from the Network and Transportation Layers of network traffic, where the first packet determines the forward (source to destination) and backward (destination to source) directions. Hence, the statistical time-related features can be calculated separately in the forward and backward directions. Additional functionalities include selecting features from the list of existing features, adding new features, and controlling the duration of flow timeout.

Related published papers:

MohammadMoein Shafi, Arash Habibi Lashkari, Vicente Rodriguez, and Ron Nevo, ”Toward Generating a New Realistic Cloud_based Distributed Denial of Service (DDoS) Dataset and Intrusion Traffic Characterization”, Information, Vol. 15, 2024

More Information & Download Source Code:

8. Benign User Profiler (BUP)

The BUP is responsible for profiling the abstract behavior of human interactions and generating naturalistic, benign background traffic. Profiles can be applied to a diverse range of network protocols with different topologies because they represent the abstract properties of human and attack behavior. Once a benign profile is derived from users, an agent or human operator can generate realistic benign events on the network. Organizations and researchers can use this approach to generate realistic benign data easily; therefore, there is no need to anonymize data sets.

Related published papers:

MohammadMoein Shafi, Arash Habibi Lashkari, Vicente Rodriguez, and Ron Nevo, ”Toward Generating a New Realistic Cloud_based Distributed Denial of Service (DDoS) Dataset and Intrusion Traffic Characterization”, Information, Vol. 15, 2024

More Information & Download Source Code:

7. Smart Contracts Vulnerability Analyzer (SCsVulLyzer-V1.0)

The SCsVulLyzer is a Python-based tool designed to analyze and extract key metrics from Ethereum smart contracts written in Solidity. It employs a suite of functions to dissect the contract's source code, compiling it to obtain its abstract syntax tree (AST), bytecode, and opcodes. The analyzer calculates entropy of the bytecode to assess its randomness and security, determines the frequency of certain opcodes to understand the contract's complexity, and evaluates the usage of key Solidity keywords to gauge coding patterns. This modular and extensible tool provides a comprehensive snapshot of a smart contract's structure and behavior, facilitating developers and auditors in optimizing and securing Ethereum blockchain applications.

Related published papers:

Sepideh Hajihosseinkhani, Arash Habibi Lashkari, Ali Mizani Oskui, “Unveiling Vulnerable Smart Contracts: Toward Profiling Vulnerable Smart Contracts using Genetic Algorithm and Generating Benchmark Dataset”, Blockchain: Research and Applications, Vol. 4, December 2023

More Information & Download Source Code:

6. Authorship Attribution Analyzer (AuthAttLyzer)

The source code of a program often contains some attributes and peculiarities that might can be used to identify the program as they reflect individual coding styles, similar to writer having specific identifiable hand-writings. These stylistic or peculiarities patterns vary from very basic artifacts in the code layout and comments to very fine or subtle habits in control flow of the program or the syntax used. The challenging task of identification of the author of the source code based on these attributes is called as Source Code Authorship Attribution (SCAA). AuthAttLyzer is a source code analyzer that can extract several features including N-rgrams, Word-based embeddings, and Abstract Syntax Tree (AST) features.

Related published papers:

Abhishek Chopra , Nikhill Vombatkere , Arash Habibi Lashkari,”AuthAttLyzer: A Robust defensive distillation-based Authorship Attribution framework”, The 12th International Conference on Communication and Network Security (ICCNS), China, 2022

More Information & Download Source Code:

5. PDF Malware Analyzer (PDFMalLyzer)

Over the years, PDF has been the most widely used document format due to its portability and reliability. Unfortunately, PDF popularity and its advanced features have allowed attackers to exploit them in numerous ways. There are various critical PDF features that an attacker can misuse to deliver a malicious payload. This program extracts 31 different features from a set of pdf files specified by the user and writes them on a csv file. The resulting csv file can be further studied for variety of purposes, most importantly for detecting malicious pdf files.

Related published papers:

Maryam Issakhani, Princy Victor, Ali Tekeoglu, and Arash Habibi Lashkari1, “PDF Malware Detection Based on Stacking Learning”, The International Conference on Information Systems Security and Privacy, February 2022

More Information & Download Source Code:

4. IMAP Bot AnaLyzer (IMAPBotLyzer)

Credential stuffing is an attack that obtains stolen account credentials, usually sourced from data breaches. It is a technique used to exploit the fact that many people use the same username and password for multiple accounts. Credential stuffing has become a great matter of concern for the Internet Mail Access Protocol (IMAP), a popular method for accessing electronic mail and news messages maintained on a remote server. A significant vulnerability in IMAP and other legacy email protocols is that it cannot support MFA and depends on only a username and password for authentication, leaving it susceptible to credential stuffing. As bots generally carry out credential stuffing attacks, a promising countermeasure is to identify and block them before they can login. Our objective is to use two types of behavioral biometrics - mouse dynamics and keystroke dynamics - for profiling human and bot to distinguish between them. In this project, we introduced a supervised learning bot detection system using mouse and keystroke dynamics and compared the classification of the Random Forest(RF), Decision Tree(DT), Support Vector Machine(SVM), and K-Nearest Neighbors(KNN) machine learning algorithms to identify which model achieves the best overall result.

Related published papers:

“Detecting IMAP Credential Stuffing Bots Using Behavioural Biometrics“, Ashley Barkworth, Rehnuma Tabassum and Arash Habibi Lashkari, 12th International Conference on Communication and Network Security (ICCNS2022), China

More Information & Download Source Code:

3. Volatility Memory Analyzer (VolMemLyzer)

Memory forensics is a fundamental step that inspects malicious activities during live malware infection. Memory analysis not only captures malware footprints but also collects several essential features that may be used to extract hidden original code from obfuscated malware. There are significant efforts in analyzing volatile memory using several tools and approaches. These approaches fetch relevant information from the kernel and user space of the operating system to investigate running malware. However, the fetching process will accelerate if the most dominating features required for malware classification are readily available. Volatility Memory Analyzer (VolMemLyzer) is a python code to extract more than 36 features to analyze the malicious activities in a memory snapshot using Volatility tool.

Related published papers:

Arash Habibi Lashkari, Beiqi Li, Tristan Lucas Carrier, Gurdip Kaur, "VolMemLyzer: Volatile Memory Analyzer for Malware Classification using Feature Engineering", Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), IEEE 978-1-7281-6937-8/20, Canada, ON, McMaster University, 2021

More Information & Download Source Code:

2. DNS over HTTPS (DoH) Analyzer (DoHLyzer)

Set of tools to capture HTTPS traffic, extract statistical and time-series features from it, and analyze them with a focus on detecting and characterizing DoH (DNS-over-HTTPS) traffic.

Related published papers:

Mohammadreza MontazeriShatoori, Logan Davidson, Gurdip Kaur and Arash Habibi Lashkari, "Detection of DoH Tunnels using Time-series Classification of Encrypted Traffic", The 5th Cyber Science and Technology Congress (2020) (CyberSciTech 2020), Vancouver, Canada, August 2020

More Information & Download Source Code:

1. Static and Dynamic Android App Analyzer (AndroidApplyzer)

This research focuses on classifying android samples using static and dynamic analysis. The first version of this package covers the data collection and static feature extraction. The second version focuses on developing a classification model using AI for static features. The third version has the dynamic analysis module and related features to improve the classifier.

Related published papers:

Abir Rahali, Arash Habibi Lashkari, Gurdip Kaur, Laya Taheri, Francois Gagnon, and Frédéric Massicotte, "DIDroid: Android Malware Classification and Characterization Using Deep Image Learning", 10th International Conference on Communication and Network Security, Tokyo, Japan, November 2020, https://doi.org/10.1145/3442520.3442521

David Sean Keyes, Beiqi Li, Gurdip Kaur, Arash Habibi Lashkari, Francois Gagnon, Fr´ed´eric Massicotte, "EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics", Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), IEEE 978-1-7281-6937-8/20, Canada, ON, McMaster University, 2021

More Information & Download Source Code: