Chasing CnC Servers - False positives
Mandiant
Written by: Atif Mushtaq
In my last article, I discussed how tricky it can be to track botnets through their command and control (CnC) servers. My last article was more focused on the false negatives (missed detections) aspect of this approach. Today, I will discuss the false positive (incorrect alerts/detections) issue in detail.
Tracking botnets through the CnC servers requires a few assumptions. One such assumption is that "every CnC is a bad resource or is at least distinguishable from the good ones". That's not completely true. Sounds strange? Let me explain.
There are many possible cases where a CnC server (IP and/or domain) is not owned by or completely under the control of the bot herder(s).
There are some key possible scenarios:
1. Shared Server DNS
Bot herders often use compromised Web servers for hosting their CnCs. One directory under the document root might be used to keep botnet data and the rest of the server might be hosting the legitimate Web contents. This creates a very tricky situation. So, a DNS query for this server no longer means a bot is attempting to connect to its CnC server. Rather, an actual user may be legitimately browsing this Web server. Taking actions on such alerts would not only declare a clean machine as infected, but may also block the user from accessing a legitimate resource.
This is such a widespread case that Zeus/Zbot tracker maintains a list of those domains which were used as the CnC at some point in time and were found to be cleaned after the investigation.
2. Shared IPs
There are many botnets that don't use DNS queries to locate their CnC servers. Let's look at big botnets like Rustock and Pushdo. In these cases, bots identify their CnC server using the direct IP address assigned to these CnC servers. The CnC servers are typically purchased using stolen credit cards and are abandoned by the bot herders very quickly. Once bot herders abandon these CnC servers, these IP blocks can be reassigned to a legitimate party.
3. Public Web 2.0 sites
We are seeing a trend where many bot herders have started using public services, like Web 2.0 sites, for hosting their CnCs. This includes, but is not limited to, Twitter, Rapidshare, multimania.co.uk, lycos.co.uk, Amazon.com's cloud services and a huge list of IRC, FTP and SMTP servers. This type of CnC infrastructure makes CnC signature-based botnet detection very false positive prone and almost impossible from a practical standpoint because all the malware is doing is talking to legitimate servers/domains.
The recent "Here you have" email worm is a prime example of this. Bot herders were hosting their files on members.lycos.uk and multimania.co.uk.
4. p2p botnets
p2p botnets like Storm, Nugache and Waledac don't have a central command and control mechanism. By not using centralized CnC servers, these botnets render CnC server based detection completely impossible.
5. No CnC
There are many malware which don't have any CnC mechanism. While most of these malware are viruses and worms were developed just for fun, there are some modern malware attacks that have also been seen without any CnC communication. Bot herders use "ransomware" to encrypt user's files and ask victims to talk to them via email in order to pay and recover their files.
Some academic research shows that its possible to detect CnC domains through statistical analysis of the domain. This is an attempt to solve the false negative issues associated with CnC based botnet detection. The problem is that, although this approach might solve some of the false negative issues, it makes false positive issues even worse. This approach relies heavily on historical NS and domain whois records. This approach might sometimes distinguish a bad domain from a legitimate domain, but the point is that not every bad domain is a CnC domain. Everyday bad guys register hundreds of spam/phishing domains to evade spam filters. The occurrence of a bad domain is not always an indication of a machine hosting bots. For instance many spam filters scan through email contents and sometimes lookup the domains inside email bodies. This scanning would result in generating lots of DNS queries for bad domains. These anomaly detector may take the DNS queries generated by the these filters as suspicious behavior and declare the source IP as an infected machine. A funny situation, isn't it?
Another widespread case occurs when a user comes across a rogue, spam or exploit domain due to a pop up or Web page redirection. In the case of a social engineering attack, a user may simply ignore it or due to system incompatibility the exploit may not work (Windows is not the only OS in town). Resultant DNS queries would mislead such anomaly detectors. To work around this fundamental problem these types of anomaly detectors often introduce a concept of "under observation" machine. These sensors initially mark a particular system as suspicious after seeing the first so called "bad domain" and assume that if it is a botted machine then it would do these DNS queries periodically. Keeping a machine under observation for an extended period of time means that you are letting CnC communication go out until you are reasonably sure. How long does it take for password stealers to upload stolen data? In a lab run of several Zbot samples, I noticed that on average it takes Zbot less than a minute to upload cached passwords. Whats the use of a detection after crooks have emptied out your banking accounts?
With this type of approach one would end up analyzing terabytes of unique Command and Control (CnC) data and still miss lots of detection. This much data itself shows a race condition where security guys would keep on collecting new CnC data and in the meantime, the bad guys would move to new CnCs.
Antivirus vendors face a similar problem. They collect millions of unique samples on a daily basis and still miss a lot. The theory that if the bad guys want to break in, they will, and stopping the CnC communication can prevent all the damage is like not opting for a vaccination but rather waiting for the infections to come and then fighting the disease with antibiotics. We all know that sometimes antibiotics cause more damage than the actual disease. A successful defense should be a multi-layered approach.
This is how I define the ideal mlti-layered approach to defending today's networks:
1. Address the core problems via better policies, such as patching out-of-date software (to stop old exploits) and ongoing user training (to mitigate social engineering). Limiting these types of vulnerabilities as much as you can is a good first step.
2. However, threats like zero-day attacks will pass through this first line of defense. So, try to stop it at the network layer before it exploits endpoint software or human mistakes.
3. If both of the above approaches don't work and an infection sets in, then at least block the malware from communicating back to its CnCs and exfilrating data. This is yet another mitigation step to stop malware from propagating and/or stealing user credentials.
Please note here that blocking CnC communication can't prevent other damages caused by modern malware like system resource consumption, corrupting and/or hijacking user's file etc. Malware like Bancos edits the Windows hosts file to redirect users to fake banking Web sites. So, even if the CnC communication is stopped, the end user's hosts file still needs to be cleaned. Otherwise, the user would keep on visiting the phishing Web sites and losing his/her valuable information and money. Moreover, modern malware comes in the form of wolf packs. The inability to stop even one CnC communication could result in having infection again.
In order to backup my findings, I will finish my article with some real world malware examples.
"Here you have" email worm communicating to multimania.co.uk.
IRC.Mechbot bot connecting to DAL IRC network
It forms its own communication channel to send/receive CnC commands. You can never black list this CnC server without causing lots of false positive alarms.
Waledac a P2P botnet connecting to its peers
No central server (DNS and/or IP)
Pushdo and Cutwail
Neither of these use DNS to locate their CnCs. The absence of DNS requests makes DNS based statistical analysis useless.
Abuse of Perfect Keylogger software
Many people are not aware of malware who communicate over SMTP. Here is an example, a malware based on the Perfect Keylogger software package is uploading stolen information directly to AOL's email service. Does it make AOL a CnC server?
These are just few examples, time doesn't permit me to discuss each and every case. There are 100s of such cases where malware communicates to CnC servers which are almost impossible to blacklist. The time has come to leave this outdated black list approach and find new ways to detect modern malware.
Atif Mushtaq @ FireEye Malware Intelligence Lab
Detailed Questions/Comments : research SHIFT-2 fireeye DOT COM