Dynamic Malware Analysis

By Christopher Simaan, Cameron Monast, Arnav Vora, Teong Seng, Yvana Mouawad, Andy Huang, Mark Epstein, Salma Alandary

Mar 11, 2024

Tags: winter-2024cyber-lab

Dynamic Malware Analysis using PANDA

In this lab, we delved into the world of Dynamic Program Analysis, working in small groups of 2-3 to create "plugins" to analyze and detect different subsets of malware recorded on a Windows Virtual Machine.

Introduction to Dynamic Analysis

PANDA-RE

"PANDA is an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. PANDA can be controlled from the command line, through our Python package, or even a Jupyter notebook." -- PANDA website

Plugins with PANDA

Alongside being runnable through Python, PANDA offers the ability to create custom plugins in C/C++ that can be run alongside its own. This allows PANDA to be incredibly versatile as we can create our own plugins to detect different types of malware. A more in-depth guide to writing PANDA plugins can be found here.

Ransomware Detection

In the development of our ransomware detection plugin, we focused on a simple yet highly effective strategy: closely monitoring the amount of write operations carried out by the processes within the system. The reason for this emphasis lies in the typical behavior of ransomware, which involves rapidly encrypting files and consequently leading to an unusually high number of write operations. This distinct behavior of ransomware serves as a clear indicator that can aid in the early detection of such malicious software.

To effectively utilize this indicator, we initially set a standard for normal write operation levels. This standard acts as a reference point, enabling our plugin to differentiate between regular system operations and the irregular patterns associated with ransomware. By establishing a threshold for what is considered normal, any process surpassing this threshold in terms of write operations will trigger a flag, identifying it as potentially harmful.

This approach centered around monitoring processes allows our plugin to pinpoint and highlight any suspicious activities that deviate from the established norms.

Malware Replication Detection

Synopsis

The goal behind our plugin is to detect when malware is replicating. In a few words, the process is to taint a suspicious file that may try to copy itself, find the pc where the taint resurfaces and use it to find the asid of the process responsible

Steps

The following steps are more of a proof of concept, but illustrate how idea put forth in the synopsis may be carried out. In the example pictures, we used a particularly simple recording that involved copying (using the cp command) a file 10 characters long.

We used the file_taint plugin to apply labels to the bytes of a file, which can then be queried with the tainted_instr plugin which returns the instructions (specifically, the program counter numbers) involving tainted data.

We used the pc_search plugin to figure out the corresponding guest instruction values (i.e. ASID values).

Finally, we used asidstory on the replay to get the ranges of instruction values for each process. We then mapped the guest instruction values we got from pc_search to a specific process. That is, we found the ranges that contained our guest instruction values, which told us what processes those instructions were a part of. This lets us determine if copy instructions were used and thus, if the malware is replicating.

Malicious IP Detection

For this plugin, we wanted to meaningfully look through network traffic and identify telltale signs of malware existence. Specifically, we planned to scrutinize destination IPs for packets being sent out to. Subsequently, we'll flag potentially malicious IPs through a VirusTotal API. One other goal was to experiment with PyPANDA, a python interface that allows for interaction with PANDA.

After building the regular C++ binaries required for PANDA, we promptly install the pandare python package (contains PyPANDA) and any other pypanda dependencies. The verification objective for our plugin is simple, send a ping to 8.8.8.8 (Google's public DNS server) and be able to confirm that this isn't a suspicious IP. Initially, we prepared a recording that performs this ping, and believed that we can do analysis on this recording with PyPANDA as long as we specified the architecture (i386) and memory allocated for the system captured in the recording. To use PyPANDA, we instantiate a Panda object that takes the recording's system information as parameters. Oddly enough, we get allocated system memory mismatch errors despite specifying the memory in different formats/values. When we first ran the PyPANDA involved script, there was a prompt install of a certain i386 image, so it may be the case that PyPANDA only identifies this certain system configuration. A quick look through PyPANDA plugin examples revealed that most implementations involve making the recording and immediately analyzing the recording within the same session. Based on this record and replay workflow, in the same convention, we instead extended our script to perform the ping and then do an analysis as a quick remedy.

We loaded the C++ based network plugin from base Panda (one of many pyPANDA abilities), which we will use to produce a PCAP file that has all the network traffic info seen in the recording. We then rerun the 8.8.8.8 ping recording with this plugin and feed the outputted PCAP file to another script that extracts IP addresses using pyshark (module that allows for python packet parsing using wireshark dissectors). We then pass the list of obtained IP addresses to the VirusTotal API which successfully performed our verification. Although the results were as expected, we hope to look into the Panda object instantiation more deeply in the future, and potentially debug the issue with system specification.

Performing the record and replay with network plugin

tempsnip Parsing outputted PCAP and returning IP verification

Conclusion

Overall we managed to get quite a lot of interesting detection capability out of PANDA both using its existing plugins and creating our own. Dynamic malware analysis has many benefits in terms of automating the process but our detection ability relies on specific things being true and malware that is context-aware can often times bypass our detection. Regardless of dynamic malware analysis' pros and cons, this project provides an interesting proof of concept in using PANDA to do dynamic malware analysis.