Jump to Content
Threat Intelligence

Fuzzing Image Parsing in Windows, Part Four: More HEIF

July 5, 2022
Mandiant

Written by: Dhanesh Kizhakkinan


Continuing our discussion of image parsing vulnerabilities in the Windows HEIF codec, we take a look at analyzing a new crash, reconstructing function symbols, and the root cause analysis of the vulnerability, CVE-2022-24457. This vulnerability is present on a default install of Windows 10 and 11 and only requires browsing to a folder containing the malicious image file to trigger the vulnerability. The vulnerability is triggered when Windows attempts to automatically generate a thumbnail for the image. All vulnerabilities have been remediated by Microsoft following the disclosure by Mandiant.

The Crash - CVE-2022-24457

https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-Fuzzing-Blog-Figure-1_lwhd.max-900x900.jpg

Figure 1: Crash details

The crash in Figure 1 is an out of bound memory write within an AVX2 instruction; note that the crash function is considerably large and contains a notable series of AVX2 instructions. After a quick look around the decompilation, the operations, and internal calls to memcpy functions, we can deduce this as an AVX2 optimized version of memcpy. An out-of-bounds write inside the memcpy function is a good crash for further analysis.

Identifying other functions

With the crash function identified as memcpy, we next try to identify other functions in the binary. For this, we go one frame up the call stack and look at the decompilation in Figure 2. We can see that this decompliation contains a call to sub_18017dd88, which appears to be a logging function with the function name as its second parameter.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image-fuzzing-figure-2.max-800x800.png

Figure 2: Decompilation showing debug logging call

Some software ships with logging capabilities which can be helpful for analyzing crashes and performance issue in a production environment. In this case, the logging calls can help us to reconstruct multiple function names and understand the implemented functionality of these functions, which in turn helps us to easily determine the root cause of vulnerabilities. Looking at the cross references, we can see over 5000 calls to this logging function (see Figure 3).

https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-Fuzzing-Figure-3_pwcn.max-1000x1000.jpg

Figure 3:  Cross references to logging function

Given the large number of logging calls, we write a script which recovers the function names from the second argument and renames the caller functions.

The IDA Script

An IDAPython script was written to automate the process. It works according to the following algorithm.

  1. Get list of cross references (xrefs) to the logging function
  2. Get unique caller functions from the list of xrefs
  3. Decompile each caller
  4. Find the logging function call and retrieve the second argument as the function name
  5. Rename the caller function with the retrieved function name

I decided to write a generic script which can be reused in other projects. For that, I used IDA’s decompiler API to avoid processor and calling convention specific code. The script also makes use of our decompiler wrapper FIDL. The script is provided in Table 1.

 

from idc import *
from idaapi import *
from idautils import *
import FIDL.decompiler_utils as du

# f_name: The logging function name
# indx: 0 based argument index to retrieve
def rename(f_name, indx):
    f_ea = get_name_ea_simple(f_name)
    if f_ea == BADADDR:
        print("Failed to resolve address for {}".format(f_name))
        return

    callers = set()
    # Get a set of unique callers
    for ref in XrefsTo(f_ea, True):
        if not ref.iscode:
            continue
        
        f = get_func(ref.frm)
        if f is None:
            continue
            
        f_ea = f.start_ea
        callers.add(f_ea)
    
    for caller_ea in callers:
        current_fname = get_func_name(caller_ea)
        # Rename only if the function name starts with sub_
        if current_fname.startswith("sub_"):
            c = du.find_all_calls_to_within(f_name, caller_ea) 
            try:
                # Validate the logging function and arguments
                if len(c) > 0 and len(c[0].args) > indx:
                    f_name_str = c[0].args[indx].val
                    set_name(caller_ea, "{}".format(f_name_str), SN_FORCE)
                else:
                    print("Failed in {}\n".format(current_fname))
            except:
                print("Exception in {}\n".format(current_fname))

rename('sub_18017DD88', 1)
Table 1: Function renamer IDAPython script

With thousands of functions renamed in IDA, it gets considerably easier to do a full root-cause analysis. Even though we use IDA for static analysis, WinDBG + Time Travel Debugging (TTD) is regularly used for most of the dynamic analysis. We port our renamed symbols into WinDBG by using an IDA plugin: FakePDB. FakePDB creates a PDB from the IDA database, which can be loaded in WinDBG to enhance our debugging/tracing capabilities. An example is shown in Figure 4.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image-fuzzing-figure-4.max-800x800.png

Figure 4: Ported symbols in WinDBG

Root cause analysis of the bug

The relevant code from the function msheif_store!CHEIFStreamReader::ReadItemData is presented in Table 2.

 

/*
    length calculation from CHEIFItemInfoEntry::GetDataSize
    0x309 + 0xee7        => 0x11f0
    0x11f0 + 0xfffff200    => 0x1000003f0
*/
QWORD currentOffset = 0;
QWORD length = 0x1000003f0; // CHEIFItemInfoEntry::GetDataSize
status = MFCreateMemoryBuffer(length, allocBuff);
if (status  < 0)
{
    // bail
}
while (1)
{
    ...
    // 0xee7 + 0x0 >= 0x1000003f0
    if (currentSize + currentOffset >= length)
    {
        // bail
    }
    // crash
    OptimizedMemcpy(currentOffset + allocBuff, srcBuff, currentSize);
    currentOffset += currentSize;
    ...
}
Table 2: Vulnerable code in function ReadItemData

Astute readers will quickly point at the if condition for a possible integer overflow scenario. But in this case, such a scenario is mitigated while calculating the length in CHEIFItemInfoEntry::GetDataSize. The vulnerability is only visible when we look closely at the MFCreateMemoryBuffer function and its parameters. Figure 5 shows the function’s documentation from MSDN.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-Fuzzing-Figure-5_fjhe.max-1000x1000.jpg

Figure 5: MFCreateMemoryBuffer function documentation

The MFCreateMemoryBuffer function accepts a 32-bit DWORD as the length parameter and returns an allocated buffer. But if we look at Table 2, we can see that the length parameter passed to the function is a 64-bit QWORD. In such an instance, the compiler decides to truncate the QWORD to DWORD. In this case, length 0x1000003f0 gets truncated to much smaller 0x3f0. This allocates a smaller buffer and larger data gets copied into the buffer, causing the out-of-bounds write.

From the function names CHEIFStreamReader::ReadItemData, we guess that the vulnerability occurs while trying to parse the item box. Further backtracing the calls, we see the lengths are read from the function CItemLocationAtom::ParseAtom, which points to the iloc box shown in Figure 6.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image-fuzzing-figure-6.max-800x800.png

Figure 6: iloc box

Looking at the box content, we can see all the three length values (0x309, 0xEE7 and 0xFFFFF200) specified in the box. Now we can look at the iloc specification to figure out the exact details for those lengths. HEIF is based on ISO Base Media File Format (ISOBMFF) but getting the right specification with box parsing algorithms tends to be complicated or paywalled.

Another approach we can try, is to look at open-source implementations of HEIF image parsers such as libheif or nokiatech-heif. Running our PoC file through decoding routines gives us the exact details of the lengths from the iloc box as shown in Figure 7.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-Fuzzing-Figure-7_ahum.max-900x900.jpg

Figure 7: iloc parsing in a debugger

The three lengths we see in our PoC file are called extent lengths. Microsoft’s HEIF implementation reads all the extent lengths and adds them together before the resulting length is used in allocating memory through the API function MFCreateMemoryBuffer. This API truncates the lengths to a DWORD and allocates a smaller buffer, causing the out-of-bounds write.

Patch

Microsoft patched this vulnerability in March 2022 by bailing out with an error if the total length is greater than 0xC8000000 (~3GiB).

Conclusion

Part four of this blog series presents a vulnerability in Microsoft’s HEIF decoder and shows how to reconstruct symbols to do a full root-cause analysis of the vulnerability. A list of latest reported vulnerabilities in HEIF codec can be found in the following appendix and found referenced in the Mandiant Vulnerability Disclosures.

Appendix

 

CVE id Submitted Date Fixed Date Vulnerability type
CVE-2022-22007 22-April-2021 08-March-2022 Heap overflow
CVE-2022-21926 13-September-2021 08-February-2022 Heap overflow
CVE-2022-21917 17-September-2021 11-January-2022 Heap overflow
CVE-2022-21927 17-September-2021 08-February-2022 Heap overflow
CVE-2022-22006 17-September-2021 08-March-2022 Heap overflow
CVE-2022-21844 23-September-2021 08-February-2022 Heap overflow
CVE-2022-24453 19-October-2021 08-March-2022 Heap overflow
CVE-2022-24457 19-October-2021 08-March-2022 Heap overflow
CVE-2022-24456 14-November-2021 08-March-2022 Heap overflow
CVE-2022-24532 04-December-2021 12-April-2022 Heap overflow
Posted in