Improving Disassembly and Decompilation
Improving Disassembly and Decompilation
or
Moderately Advanced Ghidra Usage
1/ 82
Table of Contents
Improving Disassembly
2/ 82
Table of Contents
Troubleshooting Decompilation
3/ 82
Intro and Setup
Contents
4/ 82
Intro and Setup
Introduction
Intro
Like any SRE tool, Ghidra makes assumptions which
sometimes need to be adjusted by reverse engineers.
These slides describe techniques for recognizing problematic
situations and steps you can take to improve Ghidra’s analysis.
These slides assume basic familiarity with Ghidra.
Note: the materials for the “Beginner” and “Intermediate”
Ghidra classes are included with the Ghidra distribution.
5/ 82
Intro and Setup
Setup
Setup
First, create a new project for the example files used by these
slides.
Next, import the files. They are located in
hghidra diri/docs/GhidraClass/ExerciseFiles/Advanced
The easiest way to do this is to use the Batch Importer
(File → Batch Import... from the Project Window).
6/ 82
Improving Disassembly
Contents
Improving Disassembly
Evaluating Analysis: The Entropy and Overview Windows
Non-Returning Functions
Function Start Patterns
7/ 82
Improving Disassembly
Evaluating Analysis: The Entropy and Overview Windows
Evaluation
Use the entropy and overview sidebars to get a quick sense of
how well a binary has been analyzed/disassembled.
For instance, the entropy sidebar can tell you whether your
binary has regions which are likely encrypted or compressed.
To activate these sidebars, use the dropdown menu in the
Listing (immediately to the right of the camera icon).
8/ 82
Improving Disassembly
Non-Returning Functions
Non-returning Functions
Some functions, like exit or abort, are non-returning
functions. Such functions do not return to the caller after
executing. Instead, they do drastic things like halting the
execution of the program.
Suppose panic is a function that does not return. The
compiler is free to put whatever it wants (e.g., data) after
calls to panic.
If Ghidra does not know that panic is non-returning, it will
assume that bytes after calls to panic are instructions and
attempt to disassemble them.
9/ 82
Improving Disassembly
Non-Returning Functions
Non-returning Functions
The Non-Returning Functions - Known analyzer recognizes
a number of standard non-returning functions by name and
automatically handles them correctly.
The Non-Returning Functions - Discovered analyzer
attempts to discover non-returning functions by gathering
evidence during disassembly.
If a non-returning function manages to slip by these analyzers,
it can wreak havoc on analysis. Fortunately, there are ways to
recognize and fix this situation.
10/ 82
Improving Disassembly
Non-Returning Functions
11/ 82
Improving Disassembly
Non-Returning Functions
12/ 82
Improving Disassembly
Non-Returning Functions
Finding Functions
Ghidra uses many techniques to find bytes to disassemble and
to group instructions together into function bodies.
One such technique is to search for function start patterns.
These are patterns of bits (with wildcards allowed) that
indicate that a particular address is likely the start of a
function.
These patterns are based on two facts:
1. Functions often start in similar ways (e.g., setting up the stack
pointer, saving callee-saved registers)
2. Similar things occur immediately before a function start
(return of previous function, padding bytes,...)
13/ 82
Improving Disassembly
Function Start Patterns
Finding Functions
Ghidra has an experimental plugin for exploring how functions
already found in a program begin and using that information
to find additional functions.
To enable it from the Code Browser: File → Configure...,
click on the (upper right) plug icon, and select the Function
Bit Patterns Explorer plugin.
Then select Tools → Explore Function Bit Patterns from
the Code Browser.
Hovering over something in the tool and pressing F1 will bring
up the Ghidra help (this works for most parts of Ghidra).
14/ 82
Improving Disassembly
Function Start Patterns
Finding Functions
The general strategy is to explore the instruction trees and
byte sequences, select/combine/mine for interesting patterns,
then send them to the Pattern Clipboard for evaluation. See
the help for details.
Another useful feature is the Disassembled View (accessed
through the Window menu of the Code Browser). This
allows you to see what the bytes at the current address would
disassemble to without actually disassembling them.
15/ 82
Improving Decompilation: Data Types
Contents
16/ 82
Improving Decompilation: Data Types
Defining Structures
17/ 82
Improving Decompilation: Data Types
Defining Structures
18/ 82
Improving Decompilation: Data Types
Defining Structures
20/ 82
Improving Decompilation: Data Types
Defining Classes
Defining Classes
If a variable is known to be a this parameter, right-clicking on
it will yield a menu with the option Auto Fill in Class
Structure instead of Auto Fill in Structure.
21/ 82
Improving Decompilation: Data Types
Defining Classes
23/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls
24/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls
25/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls
26/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls
28/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls
29/ 82
Improving Decompilation: Function Calls
Contents
30/ 82
Improving Decompilation: Function Calls
Introduction
31/ 82
Improving Decompilation: Function Calls
Function Signatures: Listing vs. Decompiler
32/ 82
Improving Decompilation: Function Calls
Function Signatures: Listing vs. Decompiler
Decompiler Parameter ID
The Decompiler Parameter ID Analyzer (Analysis → One
Shot → Decompiler Parameter ID) uses the decompiler
and an exploration of the call tree to determine parameter,
return type, and calling convention information about
functions in a program. This analyzer can be quite useful
when you have some rich type information, such as known
types from library calls. However, if you run this analyzer too
early or before fixing problems, you can end up propagating
bad information all over the program.
Note: this analyzer will commit the signature of each function.
34/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site
Overriding Signatures
It is possible to override a function’s signature at a particular
call site.
This is basically only ever needed for variadic functions
(functions which take a variable number of arguments), or to
adjust the arguments of indirect calls. In other cases you
should edit the signature of the called function directly.
To override a signature, right-click on the function call in the
decompiler and select Override Signature.
To remove an override, right-click and select Remove
Signature Override.
35/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site
36/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site
37/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site
38/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site
38/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions
39/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions
40/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions
41/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions
42/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions
42/ 82
Improving Decompilation: Function Calls
Multiple Storage Locations
43/ 82
Improving Decompilation: Function Calls
Multiple Storage Locations
44/ 82
Improving Decompilation: Function Calls
Multiple Storage Locations
45/ 82
Improving Decompilation: Function Calls
Inlining Functions
Inlining Functions
Some special functions have side effects that the decompiler
needs to know about for correct decompilation. You can
handle this situation by marking them as inline.
If foo is marked as inline, calls to foo will be replaced by the
body of foo during decompilation.
To mark foo as inline, edit foo’s signature and check the
In Line function attribute.
46/ 82
Improving Decompilation: Function Calls
Inlining Functions
Inlining Functions
Inlining a function is related to the notion of a call fixup,
where calls to certain functions are replaced with snippets of
Pcode.
These functions are recognized by name and have the call
fixup applied automatically.
Examples include functions related to structured exception
handling in Windows.
You can also select from pre-defined call fixups when editing a
function signature.
Note: there are no fixups defined for x86 64 binaries compiled
with gcc, so the Call Fixup selector is greyed out for the
exercise files.
47/ 82
Improving Decompilation: Function Calls
Inlining Functions
48/ 82
Improving Decompilation: Function Calls
Inlining Functions
49/ 82
Improving Decompilation: Function Calls
Inlining Functions
49/ 82
Improving Decompilation: Control Flow
Contents
50/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements
51/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements
53/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements
54/ 82
Improving Decompilation: Control Flow
Shared Returns
Shared Returns
If a callerOne ends with call to callee, compilers will
sometimes perform an optimization which replaces that final
call with a jump.
If callerOne and callerTwo both end with calls to callee,
this optimization will result in callerOne and callerTwo
ending with jumps to callee.
The Shared Return Analyzer detects this situation and
modifies the flow of the jump instruction to have type
CALL RETURN. This will change how the functions are
displayed in the decompiler.
You can also do this manually, in case the analyzer missed
something (for example, if only one of the functions sharing a
final call/jump has been found and disassembled).
55/ 82
Improving Decompilation: Control Flow
Shared Returns
Opaque Predicates
One anti-disassembly technique is to create an if-else
statement with a condition that always evalutes to the same
value, but complicated enough for this to be difficult to
determine statically.
This is an example of an opaque predicate.
The branch that is never taken can contain bytes sequences
intended to thwart static analysis, such as sequences which
disassemble to jumps to invalid targets.
57/ 82
Improving Decompilation: Control Flow
Control Flow Oddities
58/ 82
Improving Decompilation: Control Flow
Control Flow Oddities
59/ 82
Improving Decompilation: Control Flow
Control Flow Oddities
60/ 82
Improving Decompilation: Control Flow
Control Flow Oddities
61/ 82
Improving Decompilation: Control Flow
Control Flow Oddities
61/ 82
Improving Decompilation: Data Mutability
Contents
62/ 82
Improving Decompilation: Data Mutability
Changing Data Mutability
Data Mutability
Data Mutability refers to the assumptions Ghidra makes
regarding whether a particular data element can change.
There are three data mutability settings:
1. normal
2. constant
3. volatile
There are two ways to change data mutability:
1. Right-click on the (defined) data in the Listing and select
Settings...
2. Set the mutability of an entire block of memory through the
Memory Map (Window → Memory Map from the Code
Browser).
63/ 82
Improving Decompilation: Data Mutability
Constant Data
Constant Data
The decompiler will display the contents of a memory location
if the contents are marked as constant.
Otherwise it will display a pointer to the location.
64/ 82
Improving Decompilation: Data Mutability
Constant Data
65/ 82
Improving Decompilation: Data Mutability
Volatile Data
Volatile Data
Marking a data element as volatile tells the decompile to
assume that the value of a variable could change at any time.
This can prevent certain simplifications.
66/ 82
Improving Decompilation: Data Mutability
Volatile Data
67/ 82
Improving Decompilation: Data Mutability
Volatile Data
68/ 82
Improving Decompilation: Setting Register Values
Contents
69/ 82
Improving Decompilation: Setting Register Values
71/ 82
Improving Decompilation: Setting Register Values
72/ 82
Troubleshooting Decompilation
Contents
Troubleshooting Decompilation
Identifying Problems in the Decompiled Code
Potential Causes
Potential Fixes
Compiler vs. Decompiler
73/ 82
Troubleshooting Decompilation
Identifying Problems in the Decompiled Code
74/ 82
Troubleshooting Decompilation
Identifying Problems in the Decompiled Code
75/ 82
Troubleshooting Decompilation
Potential Causes
Potential Causes
1. The decompiler has a function signature wrong (either the
signature of the function being decompiled or one of its
callees).
2. A common situation is some kind of size mismatch, for
example, the decompiler thinks that a call returns a 32-bit
value but sees all of RAX being used. But then where did the
high 32 bits come from?
3. There’s a register that actually contains a global parameter or
is set as the side effect of a called function.
76/ 82
Troubleshooting Decompilation
Potential Fixes
Potential Fixes
To fix these issues, the first step is to try to determine if the
decompiler is making an assumption that’s false.
Oftentimes, you can correct such errors by:
I correcting function signatures
I correcting sizes of data types
I marking functions as inline
For example, if you see in RAX in the decompiled view, you
should check if there’s a call to a function whose return type
is mistakenly marked as void.
77/ 82
Troubleshooting Decompilation
Potential Fixes
Useful Tools
Script: FindPotentialDecompilerProblems.java:
Decompiles all functions in a program, looks for problems, and
displays them in a navigable table.
Script: CompareFunctionSizesScript.java: Decompiles all
functions in a program and displays a table which contains the
size of each function (in instructions) and the size of each
decompiled function (in Pcode operations). If a function has
many instructions but the decompiled version is small, there
could be an incorrect assumption regarding the return value.
From the Code Browser, Edit → Tool Options... →
Decompiler → Analysis → uncheck Eliminate unreachable
code: might help diagnose issues.
78/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler
79/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler
Exercise
1. Open and analyze the file compilerVsDecompiler.
2. The functions calls memcmp and calls memcmp fixed len
implement memcmp using the CMPSB.REPE instruction.
3. Compare the decompiled view of these two functions. What
differences do you see?
4. What accounts for these differences? (hint: examine the
assembly code)
5. Note: To compare two functions side-by-side, bring up the
Functions window (Window → Functions from the Code
Browser), select the two functions, right click and select
Compare Functions. Use the tabs to switch between the
Listing and Decompiler views.
80/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler
Solution
(advance for solutions)
81/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler
Solution
(advance for solutions)
1. calls memcmp fixed len contains in ZF and in CF in the
decompiled code, whereas calls memcmp does not.
2. In calls memcmp fixed len, the compiler knows that the
loop will be executed at least once (RCX is set to 8).
3. However, in calls memcmp, the loop might be executed 0
times (RCX is set to param3).
4. This means that the compiler must initialize the flags ZF and
CF in calls memcmp, but does not have to in
calls memcmp fixed len, since the loop is guaranteed to
execute at least once and that comparison will set the flags.
(continued)
81/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler
Solutions
6. This is the purpose of the CMP RDX,RDX instruction
calls memcmp (which does not occur in
calls memcmp fixed len).
7. The decompiler doesn’t do the analysis to prove that a loop
must execute at least once.
8. So in the decompiler’s view, the values in ZF and CF at the
beginning of calls memcmp fixed len might contribute to
the return value (in the “case” when the loop body does not
execute).
82/ 82