From Assistant to Analyst: The Power of Gemini 1.5 Pro for Malware Analysis
Bernardo Quintero
Executive Summary
- A growing amount of malware has naturally increased workloads for defenders and particularly malware analysts, creating a need for improved automation and approaches to dealing with this classic threat.
- With the recent rise in generative AI tools, we decided to put our own Gemini 1.5 Pro to the test to see how it performed at analyzing malware. By providing code and using a simple prompt, we asked Gemini 1.5 Pro to determine if the file was malicious, and also to provide a list of activities and indicators of compromise.
- We did this for multiple malware files, testing with both decompiled and disassembled code, and Gemini 1.5 Pro was notably accurate each time, generating summary reports in human-readable language. Gemini 1.5 Pro was even able to make an accurate determination of code that — at the time — was receiving zero detections on VirusTotal.
- In our testing with other similar gen AI tools, we were required to divide the code into chunks, which led to vague and non-specific outcomes, and affected the overall analysis. Gemini 1.5 Pro, however, processed the entire code in a single pass, and often in about 30 to 40 seconds.
Introduction
The explosive growth of malware continues to challenge traditional, manual analysis methods, underscoring the urgent need for improved automation and innovative approaches. Generative AI models have become invaluable in some aspects of malware analysis, yet their effectiveness in handling large and complex malware samples has been limited. The introduction of Gemini 1.5 Pro, capable of processing up to 1 million tokens, marks a significant breakthrough. This advancement not only empowers AI to function as a powerful assistant in automating the malware analysis workflow but also significantly scales up the automation of code analysis. By substantially increasing the processing capacity, Gemini 1.5 Pro paves the way for a more adaptive and robust approach to cybersecurity, helping analysts manage the asymmetric volume of threats more effectively and efficiently.
Traditional Techniques for Automated Malware Analysis
The foundation of automated malware analysis is built on a combination of static and dynamic analysis techniques, both of which play crucial roles in dissecting and understanding malware behavior. Static analysis involves examining the malware without executing it, providing insights into its code structure and unobfuscated logic. Dynamic analysis, on the other hand, involves observing the execution of the malware in a controlled environment to monitor its behavior, regardless of obfuscation. Together, these techniques are leveraged to gain a comprehensive understanding of malware.
Parallel to these techniques, AI and machine learning (ML) have increasingly been employed to classify and cluster malware based on behavioral patterns, signatures, and anomalies. These methodologies have ranged from supervised learning, where models are trained on labeled datasets, to unsupervised learning for clustering, which identifies patterns without predefined labels to group similar malware.
Despite technological advancements, the increasing complexity and volume of malware present substantial challenges. While ML enhances the detection of malware variants, it remains inadequate against completely new threats. This detection gap allows advanced attacks to slip through cybersecurity defenses, compromising system protection.
Generative AI as Malware Analysis Assistant
Code Insight, unveiled at the RSA Conference 2023, marked a significant step forward in leveraging generative AI (gen AI) for malware analysis. This novel feature of Google's VirusTotal platform specializes in analyzing code snippets and generating reports in natural language, effectively emulating the approach of a malware analyst. Initially supporting PowerShell scripts, Code Insight later expanded to other scripting languages and file formats, including Batch, Shell, VBScript, and Office documents.
By processing the code and generating summary reports, Code Insight assists analysts in understanding the behavior of the code and identifying attack techniques. This includes uncovering hidden functionalities, malicious intent, and potential attack vectors that might be missed by traditional detection methods.
However, due to the inherent constraints of large language models (LLMs) and their limited token input capacity, the size of files that Code Insight could handle was restricted. Although there have been continuous improvements to increase the maximum file size limit and support more formats, analyzing binaries and executables still poses a significant challenge. When these files are disassembled or decompiled, their code size typically surpasses the processing capabilities of the LLMs available at the time. Consequently, gen AI models have functioned primarily as assistants to human analysts, enabling the analysis of specific code fragments from binaries rather than processing the entire code, which is often too voluminous for these models.
Reverse Engineering: The Human Face of Malware Analysis
Reverse engineering is arguably the most advanced malware analysis technique available to cybersecurity professionals. This process involves disassembling the binaries of malicious software and carrying out a meticulous examination of the code. Through reverse engineering, analysts can uncover the exact functionality of malware and understand its execution flow. However, this method is not without its challenges. It requires an immense amount of time, a deep level of expertise, and an analytical mindset to interpret each instruction, data structure, and function call to reconstruct the malware's logic and uncover its secrets.
Furthermore, scaling reverse engineering efforts poses a significant challenge. The scarcity of specialized talent in this field exacerbates the difficulty of conducting these analyses at scale. Given the intricate and time-consuming nature of reverse engineering, the cybersecurity community has long sought ways to augment this process, making it more efficient and accessible.
Gemini 1.5 Pro: Scalable Reverse Engineering for Malware Analysis
The ability to process prompts of up to 1 million tokens enables a qualitative leap in malware analysis, particularly in the realm of reverse engineering. This advancement finally brings the power of gen AI to the analysis of binaries and executables, a task previously reserved for highly skilled human analysts due to its complexity.
How does Gemini 1.5 Pro achieve this?
- Increased capacity: With its expanded token limit, Gemini 1.5 Pro can entirely analyze some disassembled or decompiled executables in a single pass, eliminating the need to break down code into smaller fragments. This is crucial because fragmenting code can lead to a loss of context and important correlations between different parts of the program. When analyzing only small snippets, it is difficult to understand the overall functionality and behavior of the malware, potentially missing key insights into its purpose and operation. By analyzing the entire code at once, Gemini 1.5 Pro gains a holistic understanding of the malware, allowing for more accurate and comprehensive analysis.
- Code interpretation: Gemini 1.5 Pro can interpret the intent and purpose of the code, not just identify patterns or similarities. This is possible due to its training on a massive dataset of code, encompassing assembly language from various architectures, high-level languages like C, and pseudo-code produced by decompilers. This extensive knowledge base, combined with its understanding of operating systems, networking, and cybersecurity principles, allows Gemini 1.5 Pro to effectively emulate the reasoning and judgment of a malware analyst. As a result, it can predict the malware's actions and provide valuable insights even for never-seen-before threats. For more information on this, see the zero day case study section later in this post.
- Detailed analysis: Gemini 1.5 Pro can generate summary reports in human-readable language, making the analysis process more accessible and efficient. This goes far beyond the simple verdicts typically provided by traditional machine learning algorithms for classification and clustering. Gemini 1.5 Pro's reports can include detailed information about the malware's functionality, behavior, and potential attack vectors, as well as indicators of compromise (IOCs) that can be used to feed other security systems and improve threat detection and prevention capabilities.
Let's explore a practical case study to examine how Gemini 1.5 Pro performs in analyzing decompiled code with a representative malware sample. We processed two WannaCry binaries automatically using the Hex-Rays decompiler, without adding any annotations or additional context. This approach resulted in two C code files, one 268 KB and the other 231 KB in size, which together amount to more than 280,000 tokens for processing by the LLM.
In our testing with other similar gen AI tools, we faced the necessity of dividing the code into chunks. This fragmentation often compromised the comprehensiveness of the analysis, resulting in vague and non-specific outcomes. These limitations highlight the challenges of using such tools with complex code bases.
Gemini 1.5 Pro, however, marks a significant departure from these constraints. It processes the entire decompiled code in a single pass, taking just 34 seconds to deliver its analysis. The initial summary provided by Gemini 1.5 Pro is notably accurate, showcasing its ability to handle large and complex datasets seamlessly and effectively:
- Issues a malicious verdict associated with ransomware
- Identifies some files as IOCs (c.wnry and tasksche.exe)
- Acknowledges the use of an algorithm to generate IP addresses and perform network scans to find targets on port 445/SMB to spread to other computers
- Identifies URL/domain (WannaCry's "killswitch") and relevant registry key and mutex
While it might seem that Gemini 1.5 Pro's report of WannaCry is based on pre-trained knowledge of this specific malware, this isn't the case. The analysis comes from the model's ability to independently interpret the code. This will become even clearer as we look at the upcoming examples where Gemini 1.5 Pro analyzes unfamiliar malware samples, demonstrating its wide-ranging capabilities.
LLM on Code: Disassembled vs. Decompiled
In the previous example showcasing WannaCry analysis, there was a crucial step before feeding the code to the LLM: decompilation. This process, which transforms binary code into a higher-level representation like C, is fully automated and mirrors the initial steps taken by malware analysts when manually dissecting malicious software. But what is the difference between disassembled and decompiled code, and how does it impact LLM analysis?
- Disassembly: This process converts binary code into assembly language, a low-level representation specific to the processor architecture. While human-readable, assembly code is still quite complex and requires significant expertise to understand. It is also much longer and more repetitive than the original source code.
- Decompilation: This process attempts to reconstruct the original source code from the binary. While not always perfect, decompilation can significantly improve readability and conciseness compared to disassembled code. It achieves this by identifying high-level constructs like functions, loops, and variables, making the code easier to understand for analysts.
Given these factors, when using LLMs for binary analysis, decompilation offers several advantages on efficiency and scalability. The shorter and more structured output from decompilation fits more readily within the processing constraints of LLMs, allowing for a more efficient analysis of large or complex binaries. In fact, the output from a decompiler is five to 10 times more concise than that produced by a disassembler.
Disassembly is necessary to perform accurate decompilation and remains an invaluable tool in certain scenarios where detailed, low-level analysis is crucial. Given the structured and higher-level nature of decompiled output, there are specific circumstances where disassembly provides insights that decompilation cannot match.
Fortunately, Gemini 1.5 Pro demonstrates equal capability in processing both high-level languages and assembly across various architectures. Thus, our implementation for automating binary analysis can utilize both strategies or adopt a hybrid approach, as suited to the specific circumstances of each case. This flexibility allows us to tailor our analysis method to the nature of the binary in question, optimizing for efficiency, depth of insight, and the specific objectives of the analysis, whether that means dissecting the logic and flow of the program or diving into the intricate details of its low-level operations.
Next, we'll examine a case where we directly employ disassembly for analysis. This time, we're working with a more recent and unknown binary; in fact, the executable submitted to VirusTotal is flagged as malicious by only four out of the 70 VirusTotal anti-malware engines, and only in a generic sense, without providing any details about the malware family that could offer further clues about its behavior.
After automatic preprocessing with HexRays/IDA Pro, the 306.50 KB executable binary produces a 1.5 MB assembly file that Gemini 1.5 Pro can process in a single pass within 46 seconds , thanks to its large token window in the prompt. This capability allows for an analysis of the entire assembly output, offering detailed insights into the binary's operations.
This case of the unknown binary showcases the remarkable capabilities of Gemini 1.5 Pro. Despite only four out of 70 anti-malware engines on VirusTotal flagging the file as malicious—using only generic signatures—Gemini 1.5 Pro identified the file as malicious, providing a detailed explanation for its verdict. The file is likely a game cheat designed to inject a game hack dynamic-link library (DLL) into the Grand Theft Auto video game process. The designation of "malicious" may depend on perspective: deemed malicious by the game's developers or their security team focused on anti-cheating measures, yet potentially desirable for some players. Nevertheless, this automated first-pass analysis is not only impressive but also illuminating regarding the nature and intent of the binary.
Unveiling the Unknown: A Case Study in Zero-Day Detection
The true test of any malware analysis tool lies in its ability to identify never-before-seen threats undetected by traditional methods and proactively protecting systems from zero-day attacks. Here, we examine a case where an executable file is undetected by any anti-virus or sandbox on VirusTotal.
The 833 KB file, medui.exe, was decompiled into 189,080 tokens and subsequently processed by Gemini 1.5 Pro in a mere 27 seconds to produce a complete malware analysis report in a single pass.
This analysis revealed suspicious functionalities, leading Gemini 1.5 Pro to issue a malicious verdict. Based on its observations, it concluded that the primary goal of this malware is to steal cryptocurrency by hijacking Bitcoin transactions and evading detection through the disabling of security software.
This showcases Gemini's ability to go beyond simple pattern matching or ML classification and leverage its deep understanding of code behavior to identify malicious intent, even in previously unseen threats. This is a significant advancement in the field of malware analysis, as it allows us to proactively detect and respond to new and emerging threats that traditional methods might miss.
From Assistant to Analyst
Gemini 1.5 Pro unlocks impressive capabilities, enabling the analysis of large volumes of decompiled and disassembled code. It has the potential to significantly change our approach to fighting malware by enhancing efficiency, accuracy, and our ability to scale in response to a growing number of threats.
However, it's important to remember that this is just the beginning. While Gemini 1.5 Pro represents a significant leap forward, the field of gen AI is still in its infancy. There are several challenges that need to be addressed to achieve truly robust and reliable automated malware analysis:
- Obfuscation and packing: Malware authors are constantly developing new techniques to obfuscate their code and evade detection. In response, there's a growing need to not only continuously improve gen AI models but also to enhance the preprocessing of binaries before analysis. Adopting dynamic approaches that utilize various preprocessing tools can more effectively unpack and deobfuscate malware. This preparatory step is crucial for enabling gen AI models to accurately analyze the underlying code, ensuring they keep pace with evolving obfuscation techniques and remain effective in detecting and understanding sophisticated malware threats.
- Increasing binary size: The complexity of modern software is mirrored in the growing size of its binaries. This trend presents a significant challenge, as the majority of gen AI models are constrained by much lower token window limits. In contrast, Gemini 1.5 Pro stands out by supporting up to 1 million tokens—currently the highest known capacity in the field. Nevertheless, even with this remarkable capability, Gemini 1.5 Pro may encounter limitations when handling exceptionally large binaries. This underscores the ongoing need for advancements in AI technology to accommodate the analysis of increasingly large files, ensuring comprehensive and effective malware analysis as software complexity continues to escalate.
- Evolving attack techniques: As attackers continuously innovate, crafting new methods to bypass security measures, the challenge for gen AI models extends beyond simple adaptability. These models must not only learn and recognize new threats but also evolve in conjunction with the efforts of researchers and developers. There's a need to devise new methods for automating the preprocessing of threat data, which would enrich the context provided to AI models. For instance, integrating additional data from static and dynamic analysis tools, such as sandbox reports, plus the decompiled and disassembled code, can significantly enhance the models' understanding and detection capabilities.
The journey towards scaling automated malware analysis is ongoing, but Gemini 1.5 Pro marks a significant milestone. Give Gemini 1.5 Pro a try; we look forward to seeing the innovative ways the community leverages it to enhance security operations.
At GSEC Malaga, we continue to research and develop ways to apply these models effectively in AI, pushing the boundaries of what's possible in cybersecurity and contributing to a safer digital future.
Malware Details
The following table contains details on the malware samples discussed in this post.
Prompt
The following is the exact prompt used in all the examples covered in the post. The only exception is the example where the word "disassembled" is used instead of "decompiled" because, as explained, we're working with disassembled code rather than decompiled code to show that Gemini 1.5 Pro can interpret both.