Skip to content

geobour98/malware-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Malware Classification

The goal of this diploma thesis is the application of Artificial Intelligence and Machine Learning algorithms, which detect malware in the traffic of TLS protocol. The TLS protocol is the one used extensively for client and server communication over the Web. It provides packet encryption in order to protect messages, transactions and other activities. The novelty of this work is that the packages are not decrypted to check if they contain malware, but Machine Learning (ML) algorithms are trained, based on features, certificates and metadata we choose, to predict with great accuracy whether it is malware or not. As a result, we do not violate users’ personal data, such as passwords, since packets are not decrypted and we also minimize delays in the communication of client and server, which would not happen if we decrypted each packet separately.

Credits

Demo

Merge benign datasets:

geobour98@archlinux:~/thesis-malware-ml/benign_dataset$ mergecap -F pcap -w ben_output.pcap *.pcap

Merge malware datasets:

geobour98@archlinux:~/thesis-malware-ml/malware_dataset$ mergecap -F pcap -w mal_output.pcap *.pcap

Extract TLS metadata from benign datasets:

geobour98@archlinux:~/thesis-malware-ml/benign_dataset$ mercury -r ben_output.pcap --select=tls --metadata --certs-json > ben_output.json

Extract TLS metadata from malware datasets:

geobour98@archlinux:~/thesis-malware-ml/malware_dataset$ mercury -r mal_output.pcap --select=tls --metadata --certs-json > mal_output.json

Modify benign datasets to valid JSON:

geobour98@archlinux:~/thesis-malware-ml/benign_dataset$ python3 final_json.py

Modify malware datasets to valid JSON:

geobour98@archlinux:~/thesis-malware-ml/malware_dataset$ python3 final_json.py

Perform the classification:

geobour98@archlinux:~/thesis-malware-ml$ python3 classify.py

Number of Benign flows:  56691
Number of Malware flows:  30122

Random Forests: 

Precision: 1.000
Recall: 1.000
F1-score: 1.000
Accuracy: 1.000 

KNN: 

Precision: 0.990
Recall: 0.968
F1-score: 0.979
Accuracy: 0.983 

SVM: 

Precision: 0.835
Recall: 0.927
F1-score: 0.879
Accuracy: 0.895

Warning

The code needs several improvements mostly due to hardcoded values and repeated code. No plans on updating the code base.