[RTF] Analysis of a Document-Based Malware Sample
Sample Information
File type: Doc
Hash: ffc735ea518844e8fb5276905b5368a23f9953ee18c235c51aebf9553dc2974f
File size: 95,210 Byte
Overview
Although this malware uses the .doc extension, looking at the internal data shows that it is actually an rtf (Rich Text Format) document.
It performs malicious activity by exploiting the vulnerability in EQNEDT32.EXE.
The vulnerability used here is CVE-2018-0798, which is triggered by the way an object is processed in memory.
Static Analysis
To analyze the sample statically, I examined the file with rtfobj and rtfdump.

rtfobj did not reveal anything especially useful about this sample.
However, rtfdump.py [filename] provided more useful information.

To understand the malicious behavior in an RTF file, the objdata section has to be analyzed.
So I looked deeper into the internal data.

Using rtfdump.py [filename] -s 4 produced the unusual data shown below, which I then converted into ASCII.

Adding -H to the same command makes it possible to view the converted ASCII data directly.
Most of the data is not immediately recognizable, but one readable string stands out:
eQuaTIon.3
This string identifies an object type inside the RTF document.
equation.3 means that the file contains an Equation Editor object.
RTF files can embed OLE objects, and in that case the object is identified not by the file extension but by a class name like the one shown above.
After that, there is more unknown data. Microsoft defines a specific structure for this format, so the analysis can continue from that structure.
3B B9 BD 5C(4byte) : OLE version
02 00 00 00(4byte) : Format ID
0B 00 00 00(4byte) : Class Name Length
65 51 75 61 54 49 4F 6E 2E 33(Nbyte): Class Name eQuaTIon.3
00 00 00 00(4byte): Topic Name
00 00 00 00(4byte): Item Length
C6 05 00 00(4byte): Data Size
To summarize the important parts:
Class Name Lengthdetermines the length of the class name.Data Sizedetermines the size of the data that follows immediately after it.
At this point, the following can be concluded:
1. The sample uses EQNEDT32.EXE
2. It contains an unknown data blob afterward
3. When the RTF file is opened, EQNEDT32 loads that unknown data
That still does not tell us much.
The next step is to execute the sample and find out what that unknown data really is.
Dynamic Analysis
Using the information gathered so far, I opened the document and started dynamic analysis.
When the document is opened, EQNEDT32 is launched at the same time.
The goal was to determine what kind of data is being loaded during that process.

After this function returns, a change in the stack can be observed.
Execution then jumps through ret to an unknown address.
I examined what the function was doing internally.
Inside the function, another helper routine was modifying the stack, and it became clear that this was the function responsible for triggering the vulnerability.

That function uses a loop to write data onto the stack.
The internal loop has the following structure, and it writes data one byte at a time through call.

Before the loop, the stack looked clean as shown below.

After the loop, memory had clearly been overwritten.

The important point is not only that memory was overwritten, but that the return address changed when the function returned.
After stepping out of the function and following the ret, execution moved to another unknown address.
Continuing further eventually led to the address shown below.

That address turned out to point to the OLE object data embedded in the RTF file.
Looking at the instruction bytes there, the data appeared very familiar.

It was the same unknown blob observed earlier during static analysis, and it became clear that the data was shellcode.
Conclusion
This malware embeds shellcode inside the RTF structure.
EQNEDT32.EXE loads data from the OLE object, and the way that data is processed in memory triggers the vulnerability.
댓글남기기