VMProtect reversing(2)

Part 2: Introduction to Optimization
Author: RolfRolles

Basically, VMProtect bytecode and the IR differ from x86 assembly language in four ways:

1) It's a stack machine;
2) The IR contains "temporary variables";
3) It contains what I've called a "scratch" area, upon which computations are performed
rather than in the registers;
4) In the case of VMProtect Ultra, it's obfuscated (or rather, "de-optimized") in certain
ways.

It turns out that removing these four aspects from the IR is sufficient preparation for
compilation into sensible x86 code. We accomplish this via standard compiler optimizations
applied locally to each basic block. In general, there are a few main compiler optimizations
used in this process. The first one is "constant propagation". Consider the following C code.

int x = 1;
function(x);

Clearly x will always be 1 when function is invoked: it is defined on the first line,
and is not re-defined before it is used in the following line (the definition in line one
"reaches" the use in line two; alternatively, the path between the two is "definition-clear"
with respect to x). Thus, the code can be safely transformed into "function(1)".
If the first line is the only definition of the variable x, then we can replace all uses of
x with the integer 1. If the second line is the only use of the variable x, then we can
eliminate the variable.

The next is "constant folding". Consider the following C code.

int x = 1024;
function(x*1024);

By the above remarks, we know we can transform the second line into "function(1024*1024);".
It would be silly to generate code that actually performed this multiplication at
run-time: the value is known at compile-time, and should be computed then.
We can replace the second line with "function(1048576);", and in general we can replace any
binary operation performed upon constant values with the computed result of that operation.

Similar to constant propagation is "copy propagation", as in the below.

void f(int i)
{
int j = i;
g(j);
}

The variable j is merely a copy of the variable i, and so the variable i can be substituted
in for j until the point where either is redefined. Lacking redefinitions,
j can be eliminated entirely.

The next optimization is "dead-store elimination". Consider the following C code.

int y = 2;
y = 1;

The definition of the variable y on line one is immediately un-done by the one on line two.
Therefore, there is no reason to actually assign the value 2 to the variable;
the first store to y is a "dead store", and can be eliminated by the standard liveness-based
compiler optimization known as "dead-store elimination", or more generally
"dead-code elimination". Here's an example from the VMProtect IR.

ecx = DWORD Scratch:[Dword(44)]
ecx = DWORD Scratch:[Dword(20)]

After dead-store-eliminating the first instruction, it turns out no other instructions use
Scratch:[Dword(44)], and so its previous definition can be eliminated as well.

Part 3: Optimizing and Compiling
Author: RolfRolles

The reader should inspect this unoptimized IR listing before continuing.  In an attempt to
keep this entry from becoming unnecessarily long, the example snippets will be small, but
for completeness a more thorough running example is linked throughout the text.

We begin by removing the stack machine features of the IR.  Since VMProtect operates on
disassembled x86 code, and x86 itself is not a stack machine, this aspect of the protection
is unnatural and easily removed.  Here is a 15-line fragment of VMProtect IR.

push Dword(-88)
push esp
push Dword(4)
pop t3
pop t4
t5 = t3 + t4
push t5
push flags t5
pop DWORD Scratch:[Dword(52)]
pop t6
pop t7
t8 = t6 + t7
push t8
push flags t8
pop DWORD Scratch:[Dword(12)]
pop esp

All but two instructions are pushes or pops, and the pushes can be easily matched up with
the pops.  Tracking the stack pointer, we see that, for example, t3 = Dword(4).
A simple analysis allows us to "optimize away" the push/pop pairs into assignment statements.
Simply iterate through each instruction in a basic block and keep a stack describing the
source of each push.  For every pop, ensure that the sizes match and record the location
of the corresponding push.  We wish to replace the pop with an assignment to the popped
expression from the pushed expression, as in

t3 = Dword(4)
t4 = esp
t7 = Dword(-88)

With the stack aspects removed, we are left with a more conventional listing containing many
assignment statements.  This optimization substantially reduces the number of instructions
in a given basic block (~40% for the linked example) and opens the door for other optimizations.
The newly optimized code is eight lines, roughly half of the original:

t3 = Dword(4)
t4 = esp
t5 = t3 + t4
DWORD Scratch:[Dword(52)] = flags t5
t6 = t5
t7 = Dword(-88)
t8 = t6 + t7
DWORD Scratch:[Dword(12)] = flags t8
esp = t8

A complete listing of the unoptimized IR versus the one with the stack machine features removed
is here, which should be perused before proceeding.

Now we turn our attention to the temporary variables and the scratch area.  Recall that the
former were not part of the pre-protected x86 code, nor the VMProtect bytecode -- they were
introduced in order to ease the IR translation.  The latter is part of the VMProtect bytecode,
but was not part of the original pre-protected x86 code.  Since these are not part of the
languages we are modelling, we shall eliminate them wholesale.  On a high level, we treat
each temporary variable, each byte of the scratch space, and each register as being a variable
defined within a basic block, and then eliminate the former two via the compiler optimizations
previously discussed.

Looking again at the last snippet of IR, we can see several areas for improvement.  First,
consider the variable t6.  It is clearly just a copy of t5, neither of which are redefined
before the next use in the assignment to t8.  Copy propagation will replace variable t6
with t5 and eliminate the former.  More generally, t3, t4, and t7 contain either constants
or values that are not modified between their uses.  Constant and copy propagation will
substitute the assignments to these variables in for their uses and eliminate them.

The newly optimized code is a slender three lines compared to the original 15; we have removed
80% of the IR for the running example.

DWORD Scratch:[Dword(52)] = flags Dword(4) + esp
esp = Dword(4) + esp + Dword(-88)
DWORD Scratch:[Dword(12)] = flags Dword(4) + esp + Dword(-88)

The side-by-side comparison can be found here.

The IR now looks closer to x86, with the exception that the results of computations are being
stored in the scratch area, not into registers.  As before, we apply dead-store elimination,
copy and constant propagation to the scratch area, removing dependence upon it entirely in the
process.  See here for a comparison with the last phase.

Here is a comparison of the final, optimized code against the original x86:

push ebp                               push    ebp
ebp = esp                              mov     ebp, esp
push Dword(-1)                         push    0FFFFFFFFh
push Dword(4525664)                    push    450E60h
push Dword(4362952)                    push    offset sub_4292C8
eax = DWORD FS:[Dword(0)]              mov     eax, large fs:0
push eax                               push    eax
DWORD FS:[Dword(0)] = esp              mov     large fs:0, esp
eflags = flags esp + Dword(-88)
esp = esp + Dword(-88)                 add     esp, 0FFFFFFA8h
push ebx                               push    ebx
push esi                               push    esi
push edi                               push    edi
DWORD SS:[Dword(-24) + ebp] = esp      mov     [ebp-18h], esp
call DWORD [Dword(4590300)]            call    dword ptr ds:unk_460ADC
vmreturn Dword(0) + Dword(4638392)

Code generation is an afterthought.

1楼减肥产品排行榜回复该评论

发表于2010-8-9 11:24:42

你是专业学代码的吗

Eddy 于 2010-8-9 11:50:43 回复
不是的只是业余爱好

VMProtect reversing(2)

Eddy 发布于2010-8-9 11:9:43 分类: 加密解密已浏览loading次网友评论1条 我要评论

关于 VMProtect reversing 逆向分析的相关文章

最新评论及回复

大家最关注

大家最喜欢

文章归档

VMProtect reversing(2)

Eddy 发布于2010-8-9 11:9:43 分类: 加密解密 已浏览loadingstrBatchCount += "spn501=501,"次 网友评论1条 我要评论

关于 VMProtect reversing 逆向分析 的相关文章

最新评论及回复

大家最关注

大家最喜欢

文章归档

Eddy 发布于2010-8-9 11:9:43 分类: 加密解密已浏览loading次网友评论1条我要评论

关于 VMProtect reversing 逆向分析的相关文章