100% found this document useful (1 vote)

155 views

Improving Disassembly and Decompilation

The document discusses techniques for improving Ghidra's disassembly and decompilation of binary files. It describes how to evaluate analysis quality using entropy and overview windows, handle non-returning functions, find additional functions using start patterns, define data structures to clean up decompiled code, and decompose virtual function calls. The exercises guide applying these techniques, such as auto-creating structures from variables.

Uploaded by

Andrea Oliveri

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

155 views

Improving Disassembly and Decompilation

Uploaded by

Andrea Oliveri

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Improving Disassembly and Decompilation

or
Moderately Advanced Ghidra Usage

1/ 82
Table of Contents

Intro and Setup

Improving Disassembly

Improving Decompilation: Data Types

Improving Decompilation: Function Calls

Improving Decompilation: Control Flow

2/ 82
Table of Contents

Improving Decompilation: Data Mutability

Improving Decompilation: Setting Register Values

Troubleshooting Decompilation

3/ 82
Intro and Setup

Contents

Intro and Setup

Introduction
Setup

4/ 82
Intro and Setup
Introduction

Intro
Like any SRE tool, Ghidra makes assumptions which
sometimes need to be adjusted by reverse engineers.
These slides describe techniques for recognizing problematic
situations and steps you can take to improve Ghidra’s analysis.
These slides assume basic familiarity with Ghidra.
Note: the materials for the “Beginner” and “Intermediate”
Ghidra classes are included with the Ghidra distribution.

5/ 82
Intro and Setup
Setup

Setup
First, create a new project for the example files used by these
slides.
Next, import the files. They are located in
hghidra diri/docs/GhidraClass/ExerciseFiles/Advanced
The easiest way to do this is to use the Batch Importer
(File → Batch Import... from the Project Window).

6/ 82
Improving Disassembly

Contents

Improving Disassembly
Evaluating Analysis: The Entropy and Overview Windows
Non-Returning Functions
Function Start Patterns

7/ 82
Improving Disassembly
Evaluating Analysis: The Entropy and Overview Windows

Evaluation
Use the entropy and overview sidebars to get a quick sense of
how well a binary has been analyzed/disassembled.
For instance, the entropy sidebar can tell you whether your
binary has regions which are likely encrypted or compressed.
To activate these sidebars, use the dropdown menu in the
Listing (immediately to the right of the camera icon).

8/ 82
Improving Disassembly
Non-Returning Functions

Non-returning Functions
Some functions, like exit or abort, are non-returning
functions. Such functions do not return to the caller after
executing. Instead, they do drastic things like halting the
execution of the program.
Suppose panic is a function that does not return. The
compiler is free to put whatever it wants (e.g., data) after
calls to panic.
If Ghidra does not know that panic is non-returning, it will
assume that bytes after calls to panic are instructions and
attempt to disassemble them.

9/ 82
Improving Disassembly
Non-Returning Functions

Non-returning Functions
The Non-Returning Functions - Known analyzer recognizes
a number of standard non-returning functions by name and
automatically handles them correctly.
The Non-Returning Functions - Discovered analyzer
attempts to discover non-returning functions by gathering
evidence during disassembly.
If a non-returning function manages to slip by these analyzers,
it can wreak havoc on analysis. Fortunately, there are ways to
recognize and fix this situation.

10/ 82
Improving Disassembly
Non-Returning Functions

Exercise: Non-returning Functions

1. Open and analyze the file noReturn. Note: for all exercises,
use the default analyzers unless otherwise specified.
2. Open the Bookmarks window and examine the Error
bookmarks. There should be two errors.
3. These errors are due to one non-returning function that
Ghidra doesn’t know about. Identify this function and mark it
as non-returning (right-click on the name of the function in
the decompiler, select Edit Function Signature and select
the No Return box).
4. Verify that the errors are corrected after marking the function
as non-returning.

11/ 82
Improving Disassembly
Non-Returning Functions

Exercise: Non-returning Functions

(advance for solutions)

12/ 82
Improving Disassembly
Non-Returning Functions

Exercise: Non-returning Functions

(advance for solutions)
The function loopForever is non-returning.
Note: You can configure how much evidence the
Non-Returning Functions - Discovered analyzer requires
before deciding that function is non-returning via Analysis →
Auto Analyze ... from the Code Browser. If you lower the
evidence threshold, this analyzer will mark loopForever as
non-returning.
Also, the script FixupNoReturnFunctions.java will analyze a
program and present a list of potentially non-returning
functions. It will also allow you to mark a function as
non-returning and repair any damage.
12/ 82
Improving Disassembly
Function Start Patterns

Finding Functions
Ghidra uses many techniques to find bytes to disassemble and
to group instructions together into function bodies.
One such technique is to search for function start patterns.
These are patterns of bits (with wildcards allowed) that
indicate that a particular address is likely the start of a
function.
These patterns are based on two facts:
1. Functions often start in similar ways (e.g., setting up the stack
pointer, saving callee-saved registers)
2. Similar things occur immediately before a function start
(return of previous function, padding bytes,...)

13/ 82
Improving Disassembly
Function Start Patterns

Finding Functions
Ghidra has an experimental plugin for exploring how functions
already found in a program begin and using that information
to find additional functions.
To enable it from the Code Browser: File → Configure...,
click on the (upper right) plug icon, and select the Function
Bit Patterns Explorer plugin.
Then select Tools → Explore Function Bit Patterns from
the Code Browser.
Hovering over something in the tool and pressing F1 will bring
up the Ghidra help (this works for most parts of Ghidra).

14/ 82
Improving Disassembly
Function Start Patterns

Finding Functions
The general strategy is to explore the instruction trees and
byte sequences, select/combine/mine for interesting patterns,
then send them to the Pattern Clipboard for evaluation. See
the help for details.
Another useful feature is the Disassembled View (accessed
through the Window menu of the Code Browser). This
allows you to see what the bytes at the current address would
disassemble to without actually disassembling them.

15/ 82
Improving Decompilation: Data Types

Contents

Improving Decompilation: Data Types

Defining Structures
Defining Classes
Decompiling Virtual Function Calls

16/ 82
Improving Decompilation: Data Types
Defining Structures

Defining Data Types

One of the best ways to clean up the decompiled code is to
define data structures.
You can do this manually through the Data Type Manager.
You can also have Ghidra help you by right-clicking on a
variable in the decompiler view and selecting
I Auto Create (Class) Structure, or
I Auto Fill in (Class) Structure.
Note: If you happen to have a C header file, you can parse
data types from it by selecting File → Parse C Source...
from the Code Browser (doesn’t support C++ header files
yet).

17/ 82
Improving Decompilation: Data Types
Defining Structures

Exercise: Auto-creating Structures

1. Open and analyze the file createStructure.
This file contains two functions of interest: setFirstAndThird
and setSecondAndFourth.
The first parameter to each of these two function has type
exampleStruct *, where exampleStruct is defined as follows:
typedef struct {
long a
int b
char *c;
short d
} exampleStruct;

18/ 82
Improving Decompilation: Data Types
Defining Structures

Exercise: Auto-creating Structures

2. Navigate to setFirstAndThird.
3. In the decompiler view, change the type of the second
parameter to long and the third parameter to char *
4. In the decompiler view, right-click on param1 and select
Auto Create Structure.
5. Right-click on the default structure name (astruct) in the
decompiler and select Edit Data Type...
6. Change the name of the structure to exampleStruct and the
names of the defined fields to a and c.
7. Note that this isn’t all of the fields in the structure, just the
ones that were used in this function.
(continued)
19/ 82
Improving Decompilation: Data Types
Defining Structures

Exercise: Auto-creating Structures

8. Now navigate to setSecondAndFourth.
9. Change the type of the first parameter to exampleStruct *,
the type of the second to int, and the type of the third to
short.
10. Right-click on the first parameter and select Auto Fill in
Structure.
11. Edit the structure again to add the names from the structure
definition for the new fields (you can also select each field in
the decompiler and press L).
12. Revel in how much better the decompilation of the two
functions looks!

20/ 82
Improving Decompilation: Data Types
Defining Classes

Defining Classes
If a variable is known to be a this parameter, right-clicking on
it will yield a menu with the option Auto Fill in Class
Structure instead of Auto Fill in Structure.

21/ 82
Improving Decompilation: Data Types
Defining Classes

Exercise: Defining Classes

1. Open and analyze the file animals.
2. In the Listing, press G (goto). In the resulting pop-up, enter
getAnimalAge.
3. This will bring up the Go To... dialog, where you can select
between the two functions with the name getAnimalAge (the
functions are in different namespaces).
Note: There are other windows, such as the Functions
window, in which there is no default namespace column. You
can add a namespace column by right-clicking on any column
name and selecting Add/Remove Columns... You can also
configure the display of certain columns by right-clicking on
the column name.
(continued)
22/ 82
Improving Decompilation: Data Types
Defining Classes

Exercise: Defining Classes

4. Select Dog::getAnimalAge in the pop-up. This will cause
the Code Browser to navigate to Dog::getAnimalAge().
Note: Alternatively, you can quickly navigate to the functions
in a class using the Classes folder of the Symbol Tree.
5. Verify that in the decompiler view, right-clicking on the token
Dog yields a menu with Auto Fill in Class Structure as an
option. Note that Ghidra has already created an empty
structure named Dog.

23/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls

Exercise: Virtual Function Tables

1. Here is what the end of main looks like in the source code:
Animal *a;
...
a->printInfo(); //non-virtual
a->printSound(); //virtual
a->printSpecificFact(); //virtual
int animalAge = a->getAnimalAge(); //virtual
delete(a);
return animalAge;
Navigate to the function main and examine Ghidra’s
decompilation.
(continued)

24/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls

Exercise: Virtual Function Tables

2. The task is to get the names of the virtual functions to show
up in the decompiler. At a high level, the steps are:
I For each virtual function foo of the class Animal, create a
function definition, which is a data type representing the
signature of foo.
I Create a data type for the vftable of Animal. This data type
will be a structure whose fields are the function signature data
types (in order).
I Change the first field of the Animal data type to be a pointer
to the vftable data type.
(continued)

25/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls

Exercise: Virtual Function Tables

3. First, create a function definition for each of the virtual
functions
I void printSound(void)
I void printSpecificFact(void)
I int getAnimalAge(void)
by right-clicking on animals in the Data Type Manager and
selecting New → Function Definition...
For each function, enter the signature and select thiscall for
the calling convention.

26/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls

Exercise: Virtual Function Tables

4. Now, right-click on animals in the Data Type Manager and
select New → Structure...
5. Give the new structure the name Animal vftable.
6. Fill in the structure with the data types corresponding to the
virtual functions of the class Animal. You can do this by
double-clicking in an entry in the DataType column and
entering a name of a virtual function.
Notes:
I The order of the functions in the vftable is the same as the
order they are called in the source code snippet.
I Be sure to give each field in the vftable structure a name (use
the name of the corresponding virtual function).
(continued)
27/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls

Exercise: Virtual Function Tables

7. Alternatively:
I Find the vftable for Animal (from the Code Browser,
Search → For Address Tables...) and look for the table
consisting of calls to cxa pure virtual.
I Apply the three function definition data types to the pointers
in the table in the appropriate order.
I Select the table in the Listing, right-click, Data → Create
Structure
8. In main, re-type the variable passed to printInfo to have type
Animal * and re-name it to a.
9. Right-click on a and select Auto Fill in Structure (note that
this does not say Auto Create Structure since Ghidra
automatically created a default empty Animal structure).

28/ 82
Improving Decompilation: Data Types
Decompiling Virtual Function Calls

Exercise: Virtual Function Tables

10. Finally, edit the Animal structure itself so that the first field
is an element of type Animal vftable * with name
Animal vftable.
11. Verify that the virtual function names appear in the
decompilation of main.

29/ 82
Improving Decompilation: Function Calls

Contents

Improving Decompilation: Function Calls

Introduction
Function Signatures: Listing vs. Decompiler
The Decompiler Parameter ID Analyzer
Overriding a Signature at a Call Site
Custom Calling Conventions
Multiple Storage Locations
Inlining Functions

30/ 82
Improving Decompilation: Function Calls
Introduction

Function Signatures and Calls

In this section, we focus on issues involving function
signatures and function calls.

31/ 82
Improving Decompilation: Function Calls
Function Signatures: Listing vs. Decompiler

Refresher on Function Signatures in Ghidra:

Sometimes the signature of a function shown in the Listing
(or in the Functions window) will not match the signature
shown in the decompiler.
This happens because the decompiler performs its own
analysis to determine the function’s signature.
The decompiler re-analyzes the function each time it is
decompiled.
The signature shown in the Listing is created when the
function is (re-)created. This is the signature that is stored in
the Ghidra program database.

32/ 82
Improving Decompilation: Function Calls
Function Signatures: Listing vs. Decompiler

Refresher on Function Signatures in Ghidra:

To transfer the decompiler’s signature to the Listing,
right-click on the function in the decompiler and select
Commit Params/Return. The transfered signature will be
saved to the program database.
The situation is the same for the local variables of a function:
right-click on the function in the decompiler and select
Commit Locals.
Note: Usually it’s better not to commit locals and instead to
let the decompiler assign types to them automatically.
Committing locals can interfere with type propagation.
Editing a function’s signature manually, from either the
Listing or the decompiler, commits the new signature to the
program database.
33/ 82
Improving Decompilation: Function Calls
The Decompiler Parameter ID Analyzer

Decompiler Parameter ID
The Decompiler Parameter ID Analyzer (Analysis → One
Shot → Decompiler Parameter ID) uses the decompiler
and an exploration of the call tree to determine parameter,
return type, and calling convention information about
functions in a program. This analyzer can be quite useful
when you have some rich type information, such as known
types from library calls. However, if you run this analyzer too
early or before fixing problems, you can end up propagating
bad information all over the program.
Note: this analyzer will commit the signature of each function.

34/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site

Overriding Signatures
It is possible to override a function’s signature at a particular
call site.
This is basically only ever needed for variadic functions
(functions which take a variable number of arguments), or to
adjust the arguments of indirect calls. In other cases you
should edit the signature of the called function directly.
To override a signature, right-click on the function call in the
decompiler and select Override Signature.
To remove an override, right-click and select Remove
Signature Override.

35/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site

Aside: The System V AMD64 ABI

For reference when doing the exercises, here is the calling
convention used by Linux on x86 64:
I First 6 integer/pointer args are passed in RDI, RSI, RDX,
RCX, R8, R9.
I First 8 floating point args are passed in XMM0-XMM7.
I Additional args are passed on the stack.
I For variadic functions, the number of floating point args
passed in the XMM registers is passed in AL.

36/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site

Exercise: Overriding Signatures

1. Open and analyze the file override.so, then navigate to the
function overrideSignature. Override the signature of the
call to printf, if necessary, using the format string to
determine number and types of the parameters to the call.
Some of the parameters to printf are global variables;
determine and apply their types.

37/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site

Exercise: Overriding Signatures

(advance for solution)

38/ 82
Improving Decompilation: Function Calls
Overriding a Signature at a Call Site

Exercise: Overriding Signatures

(advance for solution)
Signature:
printf(char *,int,long,double,char *,int,int,int,int)
Types:
a: int
b: long
c: double
d: char *

38/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions

Custom Calling Conventions

Sometimes a function will use a non-standard calling
convention.
In such a case, you can set the calling convention manually.
To do this, right-click on the function in the decompiler and
select Edit Function Signature.
In the resulting window, select Use Custom Storage under
Function Attributes.

39/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions

Exercise: Custom Calling Conventions

1. Open and analyze the file custom, then navigate to the
function main.
2. main calls the functions sum and diff, which have custom
calling conventions.
3. Examine the bodies and call sites of sum and diff to
determine their signatures and custom calling conventions.
4. Edit each of the two functions and select Use Custom
Storage.
5. Type the correct signature into the text window and press
enter.
(continued...)

40/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions

Exercise: Custom Calling Conventions

6. Click on the entries in the Storage column to set the storage
for each parameter/return value.
7. In the resulting Storage Address Editor window, click Add
to add storage, then click on each table entry to modify.
8. You might find it helpful to remove some of the variable
references Ghidra adds in the Listing, particularly to stack
variables. To do this, Edit → Tool Options → Listing
Fields → Operands Field from the Code Browser.

41/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions

Exercise: Custom Calling Conventions

(advance for solutions)

42/ 82
Improving Decompilation: Function Calls
Custom Calling Conventions

Exercise: Custom Calling Conventions

(advance for solutions)
long sum(long, long): return in RAX, args in R14, R15.
long diff(long, long): return in RBX, args in
[RSP + 0x8], [RSP + 0x10]

42/ 82
Improving Decompilation: Function Calls
Multiple Storage Locations

Multiple Storage Locations

You may have noticed that you can add multiple storage
locations for one parameter when editing a function signature.
This is used (for example) for functions which return register
pairs.

43/ 82
Improving Decompilation: Function Calls
Multiple Storage Locations

Exercise: Multiple Storage Locations

1. Open and analyze the file ldiv, then navigate to the function
main.
2. In the decompiler, right-click on the call to ldiv and select
Edit Function Signature. How does ldiv use multiple
storage locations for a function variable?
(advance for solution)

44/ 82
Improving Decompilation: Function Calls
Multiple Storage Locations

Exercise: Multiple Storage Locations

The result of ldiv is returned in the register pair RDX:RAX
(RAX contains the quotient, RDX contains the remainder).

45/ 82
Improving Decompilation: Function Calls
Inlining Functions

Inlining Functions
Some special functions have side effects that the decompiler
needs to know about for correct decompilation. You can
handle this situation by marking them as inline.
If foo is marked as inline, calls to foo will be replaced by the
body of foo during decompilation.
To mark foo as inline, edit foo’s signature and check the
In Line function attribute.

46/ 82
Improving Decompilation: Function Calls
Inlining Functions

Inlining Functions
Inlining a function is related to the notion of a call fixup,
where calls to certain functions are replaced with snippets of
Pcode.
These functions are recognized by name and have the call
fixup applied automatically.
Examples include functions related to structured exception
handling in Windows.
You can also select from pre-defined call fixups when editing a
function signature.
Note: there are no fixups defined for x86 64 binaries compiled
with gcc, so the Call Fixup selector is greyed out for the
exercise files.
47/ 82
Improving Decompilation: Function Calls
Inlining Functions

Exercise: Inlining Functions

1. Open and analyze the file inline, then navigate to the
function main.
2. When provided with the correct number of command line
arguments, this function should parse argv[1] and argv[2]
into unsigned long values and print their sum. The task is to
get the decompiler to show this.
3. First, ensure that main has the correct signature
(int main(int argc, char **argv)).
4. Next, override the signature of the call to printf if necessary,
so that it agrees with the format string.
(continued)

48/ 82
Improving Decompilation: Function Calls
Inlining Functions

Exercise: Inlining Functions

5. The decompilation will still be incorrect. Marking
adjustStack and restoreStack as inline yields correct
decompilation. Why?

49/ 82
Improving Decompilation: Function Calls
Inlining Functions

Exercise: Inlining Functions

5. The decompilation will still be incorrect. Marking
adjustStack and restoreStack as inline yields correct
decompilation. Why?

adjustStack decreases the stack pointer by 16, which violates

the calling convention. Since the default behavior of the
decompiler is to assume that a function follows the calling
convention, it assumes that the call to adjustStack does not
change the value of the stack pointer. This assumption leads
to incorrect analysis. If you mark adjustStack and
restoreStack as inline, their bodies will be incorporated into
main during decompilation and the changes to the stack
pointer will be tracked.

49/ 82
Improving Decompilation: Control Flow

Contents

Improving Decompilation: Control Flow

Fixing Switch Statements
Shared Returns
Control Flow Oddities

50/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements

Fixing Switch Statements

Sometimes you will see warnings in the decompiler view
stating that there are too many branches to recover a
jumptable.
One reason for this is that there actually is a jump table, but
the decompiler can’t determine bounds on the switch variable.
In such cases, you can add the jump targets manually and
then run the script SwitchOverride.java.
Note: To find such locations in a program, run the script
FindUnrecoveredSwitchesScript.java.

51/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements

Exercise: Fixing Switch Statements

1. Open and analyze the file switch, then navigate to the
function main. The decompiler view should contain a warning
about an unrecovered jumptable.
2. The global variable array is the jumptable.
3. Navigate to array in the Listing and press p to define the first
element to be a pointer. Note: this will clear any data type
information that Ghidra assigned to array automatically.
4. Now press [ to define an array. Enter 10 for the number of
elements.
5. This will trigger disassembly at each of the addresses in the
jumptable, but these addresses are not yet part of the function
main.
(continued...)
52/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements

Exercise: Fixing Switch Statements

6. Navigate to the JMP instruction which jumps to array + an
offset.
7. Press R to bring up the References Editor and click on the
mnemonic (JMP).
8. You can use the green plus to add a COMPUTED JUMP
reference to each address stored in the jumptable one at a
time.
9. Alternatively:
I Select the JMP instruction
I Select → Forward Refs from the Code Browser.
I Select → Forward Refs again.
I Drag the selection onto the References Editor Dialog.

53/ 82
Improving Decompilation: Control Flow
Fixing Switch Statements

Exercise: Fixing Switch Statements

10. Right click on the label main in the Listing, then select
Function → Re-create Function.
11. The jump targets are now part of main, which you can verify
by examining the Function graph.
12. Finally, navigate back to the JMP instruction and use the
Script Manager to run SwitchOverride.java.

54/ 82
Improving Decompilation: Control Flow
Shared Returns

Shared Returns
If a callerOne ends with call to callee, compilers will
sometimes perform an optimization which replaces that final
call with a jump.
If callerOne and callerTwo both end with calls to callee,
this optimization will result in callerOne and callerTwo
ending with jumps to callee.
The Shared Return Analyzer detects this situation and
modifies the flow of the jump instruction to have type
CALL RETURN. This will change how the functions are
displayed in the decompiler.
You can also do this manually, in case the analyzer missed
something (for example, if only one of the functions sharing a
final call/jump has been found and disassembled).
55/ 82
Improving Decompilation: Control Flow
Shared Returns

Exercise: Shared Returns

1. Uncheck the Shared Return Calls analyzer before analyzing
sharedReturn.
2. This file has been stripped of symbols. To find main, navigate
to entry and look for the call to libc start main. The first
argument to this call corresponds to the main method in the
source code.
3. main contains two calls to non-library functions. Each callee
contains a JMP instruction corresponding to what was a
function call in the source code.
4. Find these JMP instructions, right-click, select Modify
Instruction Flow..., and change the flow to
CALL RETURN. Verify that a new function call appears in
the decompilation.
56/ 82
Improving Decompilation: Control Flow
Control Flow Oddities

Opaque Predicates
One anti-disassembly technique is to create an if-else
statement with a condition that always evalutes to the same
value, but complicated enough for this to be difficult to
determine statically.
This is an example of an opaque predicate.
The branch that is never taken can contain bytes sequences
intended to thwart static analysis, such as sequences which
disassemble to jumps to invalid targets.

57/ 82
Improving Decompilation: Control Flow
Control Flow Oddities

Exercise: Opaque Predicates

1. Open and analyze the file opaque, then navigate to the
function main.
2. main contains an opaque predicate. Find it and fix it with the
instruction patcher by changing a conditional jump to an
unconditional jump.
3. To patch an instruction, right-click on it in the Listing and
select Patch Instruction.
4. Hint: The opaque predicate is based on the fact that if you
square an integer and reduce mod 4, you can only ever get 0
or 1. Look for a multiplication, modular reduction (optimized
to a bitmask), and comparison in the assembly.

58/ 82
Improving Decompilation: Control Flow
Control Flow Oddities

Jumps Within Instructions

The decompiler can repeatedly disassemble the same byte as
part of different instructions as it follows flow.
The listing can’t do this: each byte has to be assigned to one
instruction.
One consequence is that the decompilation can be correct
even if the listing shows a disassembly error.
This can happen when encountering certain anti-disassembly
techniques.

59/ 82
Improving Decompilation: Control Flow
Control Flow Oddities

Exercise: Jumps Within Instructions

1. Open and analyze the file jumpWithinInstruction, then
navigate to the function main.
2. You should see an error in the disassemly but correct
decompilation (with a warning). What’s going on?

60/ 82
Improving Decompilation: Control Flow
Control Flow Oddities

Exercise: Jumps Within Instructions

(advance for solutions)

61/ 82
Improving Decompilation: Control Flow
Control Flow Oddities

Exercise: Jumps Within Instructions

(advance for solutions)
eb ff is JMP .+1. After this instruction executes, ff c0 are
the bytes of the next instruction to execute. Clear the
instruction corresponding to eb ff and then disassemble
starting at ff to reveal the instructions that execute after
JMP .+1.
Note: After clearing and disassembling, right-click on the
SUB instruction and select Fallthrough → Auto Override,
which will set the fallthrough address to be the address of the
next instruction after SUB (skipping data). You should verify
that setting this override makes the function graph look
better.

61/ 82
Improving Decompilation: Data Mutability

Contents

Improving Decompilation: Data Mutability

Changing Data Mutability
Constant Data
Volatile Data

62/ 82
Improving Decompilation: Data Mutability
Changing Data Mutability

Data Mutability
Data Mutability refers to the assumptions Ghidra makes
regarding whether a particular data element can change.
There are three data mutability settings:
1. normal
2. constant
3. volatile
There are two ways to change data mutability:
1. Right-click on the (defined) data in the Listing and select
Settings...
2. Set the mutability of an entire block of memory through the
Memory Map (Window → Memory Map from the Code
Browser).

63/ 82
Improving Decompilation: Data Mutability
Constant Data

Constant Data
The decompiler will display the contents of a memory location
if the contents are marked as constant.
Otherwise it will display a pointer to the location.

64/ 82
Improving Decompilation: Data Mutability
Constant Data

Exercise: Constant Data

1. Open and analyze the file dataMutability, then navigate to
the function main.
2. Change the settings of the target of the pointer variable
writeable to constant by right-clicking and selecting Data →
Settings... in the Listing. Verify that the changes are
reflected in the decompiler.
3. Restore the data mutability and change it again by modifying
the permissions of the appropriate block in the Memory Map.

65/ 82
Improving Decompilation: Data Mutability
Volatile Data

Volatile Data
Marking a data element as volatile tells the decompile to
assume that the value of a variable could change at any time.
This can prevent certain simplifications.

66/ 82
Improving Decompilation: Data Mutability
Volatile Data

Exercise: Volatile Data

1. Note that the decompiler prints warning comments at the top
of main indicating that unreachable code blocks have been
removed.
2. You can prevent this by selecting Edit → Tool Options →
Decompiler → Analysis and unchecking Eliminate
unreachable code.
3. After doing this, you will see the global variable status appear
in the decompilation. Note that it is set to zero and then
tested. This is a hint that status might be volatile.

67/ 82
Improving Decompilation: Data Mutability
Volatile Data

Exercise: Volatile Data

4. Mark the data element labelled status as volatile and verify
that additional code appears in the decompilation of the
function main (make sure to re-enable unreachable code
elimination in the decompiler if you’ve disabled it).
5. Note: You might have to override the signature on the call to
printf to get all of its arguments to appear in the
decompilation.

68/ 82
Improving Decompilation: Setting Register Values

Contents

Improving Decompilation: Setting Register Values

69/ 82
Improving Decompilation: Setting Register Values

Setting Register Values

Setting a context register (for example, to select ARM or
Thumb mode) is a common reason to set register values in
Ghidra.
Additionally, if you set a register value at the beginning of a
function, the value will be sent to the decompiler.
To set a register value, right-click on an address in the Listing
and select Set Register Values...
This can be helpful if a register is used to store a global
variable. Additionally, it can sometimes be helpful to set
register values when trying to understand a function. The
decompiler will perform additional transformations, which may
yield a simplified view of how the function behaves in
restricted cases.
70/ 82
Improving Decompilation: Setting Register Values

Exercise: Global Variables

1. Open and analyze the file globalRegVars.so, then navigate to
the function initRegisterPointerVar.
2. This function stores the address of a global variable into a
register. Determine the address and the register.
3. Set the value of the register to be the address at the
beginning of the functions setRegisterPointerVar and
getRegisterPointerVar. If you do it correctly,
getRegisterPointerVar should decompile to
{
return c;
}

71/ 82
Improving Decompilation: Setting Register Values

Exercise: Simplifying Transformations

1. Open and analyze the file setRegister, then navigate to the
function switchFunc. Set the switch variable (in RDI) to a
few different values and observe the effect on the decompiled
code.

72/ 82
Troubleshooting Decompilation

Contents

Troubleshooting Decompilation
Identifying Problems in the Decompiled Code
Potential Causes
Potential Fixes
Compiler vs. Decompiler

73/ 82
Troubleshooting Decompilation
Identifying Problems in the Decompiled Code

in , unaff , and extraout

Occasionally, you may see variables in the decompiler view
whose names begin with in , unaff , or extraout .
in or unaff : this typically indicates that a register is read
before it is written (and it does not contain a parameter
passed to the function).
Variables that begin with extraout can occur when the
decompiler thinks that a value is being used that should have
been killed by a call.

74/ 82
Troubleshooting Decompilation
Identifying Problems in the Decompiled Code

Pcode in the Decompiler View

Occasionally, you might see Pcode operations in the
decompiler code.
Examples: ZEXT, SEXT, SUB, CONCAT,...
See the “Decompiler” section in the Ghidra help.

75/ 82
Troubleshooting Decompilation
Potential Causes

Potential Causes
1. The decompiler has a function signature wrong (either the
signature of the function being decompiled or one of its
callees).
2. A common situation is some kind of size mismatch, for
example, the decompiler thinks that a call returns a 32-bit
value but sees all of RAX being used. But then where did the
high 32 bits come from?
3. There’s a register that actually contains a global parameter or
is set as the side effect of a called function.

76/ 82
Troubleshooting Decompilation
Potential Fixes

Potential Fixes
To fix these issues, the first step is to try to determine if the
decompiler is making an assumption that’s false.
Oftentimes, you can correct such errors by:
I correcting function signatures
I correcting sizes of data types
I marking functions as inline
For example, if you see in RAX in the decompiled view, you
should check if there’s a call to a function whose return type
is mistakenly marked as void.

77/ 82
Troubleshooting Decompilation
Potential Fixes

Useful Tools
Script: FindPotentialDecompilerProblems.java:
Decompiles all functions in a program, looks for problems, and
displays them in a navigable table.
Script: CompareFunctionSizesScript.java: Decompiles all
functions in a program and displays a table which contains the
size of each function (in instructions) and the size of each
decompiled function (in Pcode operations). If a function has
many instructions but the decompiled version is small, there
could be an incorrect assumption regarding the return value.
From the Code Browser, Edit → Tool Options... →
Decompiler → Analysis → uncheck Eliminate unreachable
code: might help diagnose issues.

78/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler

Compiler vs. Decompiler

Sometimes compilers can prove certain facts about special
cases and use these facts to emit optimized code.
This can have consequences for the decompiled code.
This isn’t an error, just something to keep in mind.

79/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler

Exercise
1. Open and analyze the file compilerVsDecompiler.
2. The functions calls memcmp and calls memcmp fixed len
implement memcmp using the CMPSB.REPE instruction.
3. Compare the decompiled view of these two functions. What
differences do you see?
4. What accounts for these differences? (hint: examine the
assembly code)
5. Note: To compare two functions side-by-side, bring up the
Functions window (Window → Functions from the Code
Browser), select the two functions, right click and select
Compare Functions. Use the tabs to switch between the
Listing and Decompiler views.

80/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler

Solution
(advance for solutions)

81/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler

Solution
(advance for solutions)
1. calls memcmp fixed len contains in ZF and in CF in the
decompiled code, whereas calls memcmp does not.
2. In calls memcmp fixed len, the compiler knows that the
loop will be executed at least once (RCX is set to 8).
3. However, in calls memcmp, the loop might be executed 0
times (RCX is set to param3).
4. This means that the compiler must initialize the flags ZF and
CF in calls memcmp, but does not have to in
calls memcmp fixed len, since the loop is guaranteed to
execute at least once and that comparison will set the flags.
(continued)

81/ 82
Troubleshooting Decompilation
Compiler vs. Decompiler

Solutions
6. This is the purpose of the CMP RDX,RDX instruction
calls memcmp (which does not occur in
calls memcmp fixed len).
7. The decompiler doesn’t do the analysis to prove that a loop
must execute at least once.
8. So in the decompiler’s view, the values in ZF and CF at the
beginning of calls memcmp fixed len might contribute to
the return value (in the “case” when the loop body does not
execute).

82/ 82

Field Manual For Ceedling
100% (1)
Field Manual For Ceedling
99 pages
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin PDF ebook with Full Chapters Now
No ratings yet
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin PDF ebook with Full Chapters Now
45 pages
Malware Analysis Syllabus
No ratings yet
Malware Analysis Syllabus
3 pages
A1 - Addition: Name Grade Date Add
100% (4)
A1 - Addition: Name Grade Date Add
4 pages
Digital Forensics: Investigating NIST Data Leakage Case
No ratings yet
Digital Forensics: Investigating NIST Data Leakage Case
140 pages
Basics of Binary Firmware Analysis
No ratings yet
Basics of Binary Firmware Analysis
6 pages
A Survey of Network Traffic Monitoring and Analysis Tools PDF
No ratings yet
A Survey of Network Traffic Monitoring and Analysis Tools PDF
24 pages
Introduction To Software Reverse Engineering With Ghidra Session 2: C To ASM
No ratings yet
Introduction To Software Reverse Engineering With Ghidra Session 2: C To ASM
49 pages
Ghidra Cheat Sheet
No ratings yet
Ghidra Cheat Sheet
2 pages
Ghidra Cheat Sheet
No ratings yet
Ghidra Cheat Sheet
2 pages
NSA's Codebreaker Challenge: Fall 2019
No ratings yet
NSA's Codebreaker Challenge: Fall 2019
21 pages
Recon 2019
100% (1)
Recon 2019
51 pages
Building A Malware Analysis Lab
No ratings yet
Building A Malware Analysis Lab
9 pages
ROMhacking PSP Basics
No ratings yet
ROMhacking PSP Basics
37 pages
Ghidra Installation Guide
No ratings yet
Ghidra Installation Guide
10 pages
Malware Reverse Engineering Handbook
No ratings yet
Malware Reverse Engineering Handbook
56 pages
Reverse Engineering Malware With Ghidra - Lab
No ratings yet
Reverse Engineering Malware With Ghidra - Lab
2 pages
Function Hooking and Windows DLL Injection PDF
No ratings yet
Function Hooking and Windows DLL Injection PDF
15 pages
Reversing With Ghidra-InFILTRATE2020
No ratings yet
Reversing With Ghidra-InFILTRATE2020
3 pages
V It C C++ I S: e A Io Wi "Ga E"::: Reverse Engineering
No ratings yet
V It C C++ I S: e A Io Wi "Ga E"::: Reverse Engineering
1 page
Enhanced Isometrics
100% (1)
Enhanced Isometrics
28 pages
Ghidra Cutter Esp32使用
No ratings yet
Ghidra Cutter Esp32使用
1 page
Buffer Overflow Exploits: Taken Shamelessly From: /courses/cse451/05sp/section/ove Rflow1
No ratings yet
Buffer Overflow Exploits: Taken Shamelessly From: /courses/cse451/05sp/section/ove Rflow1
27 pages
Penetration Testing Framework 0.59 PDF
No ratings yet
Penetration Testing Framework 0.59 PDF
40 pages
Flowcode 7 Component Creation
No ratings yet
Flowcode 7 Component Creation
9 pages
2022-Detecting Obfuscated Function Clones in Binaries Using Machine Learning
No ratings yet
2022-Detecting Obfuscated Function Clones in Binaries Using Machine Learning
14 pages
Wire Shark
No ratings yet
Wire Shark
5 pages
Attacking Hexagon Recon Montreal 2019
No ratings yet
Attacking Hexagon Recon Montreal 2019
85 pages
Immunity Debugger
No ratings yet
Immunity Debugger
10 pages
Tag Name Code Example Browser View: (Tip) Visit Our Site (Tip)
No ratings yet
Tag Name Code Example Browser View: (Tip) Visit Our Site (Tip)
7 pages
Moabi - Breaking Sandboxing - RUXCON 2013
No ratings yet
Moabi - Breaking Sandboxing - RUXCON 2013
60 pages
N64 Decompiling With Ghidra RetroReversing
No ratings yet
N64 Decompiling With Ghidra RetroReversing
35 pages
Crash Dump Analysis: System Debugging in Linux
No ratings yet
Crash Dump Analysis: System Debugging in Linux
12 pages
Cce-Edusat Session For Computer Fundamentals
No ratings yet
Cce-Edusat Session For Computer Fundamentals
37 pages
IP Header: Type of Service (TOS) Total Length
No ratings yet
IP Header: Type of Service (TOS) Total Length
1 page
Detailed Guide On TSHARK
No ratings yet
Detailed Guide On TSHARK
44 pages
EQEmu Guidebook13
100% (1)
EQEmu Guidebook13
203 pages
Blue Team Crash Course
No ratings yet
Blue Team Crash Course
41 pages
Malware Analysis Professional: The Necessary Theory: Part 3
No ratings yet
Malware Analysis Professional: The Necessary Theory: Part 3
52 pages
Malware Analysis CIS-672: Lecture 03: Inspecting PE Header
No ratings yet
Malware Analysis CIS-672: Lecture 03: Inspecting PE Header
41 pages
Android Security A Survey of Issues, Malware Penetration Dan Defense
No ratings yet
Android Security A Survey of Issues, Malware Penetration Dan Defense
24 pages
Malware Analysis Professional: Anti-Reversing Tricks: Part 3
No ratings yet
Malware Analysis Professional: Anti-Reversing Tricks: Part 3
54 pages
Phishing Attack Seminar Ppt
No ratings yet
Phishing Attack Seminar Ppt
20 pages
Svchost Exe
100% (1)
Svchost Exe
11 pages
Malware Analysis Professional: VA/RVA/Offset & PE File Format
No ratings yet
Malware Analysis Professional: VA/RVA/Offset & PE File Format
53 pages
AdaCore Tech Cyber Security Web
No ratings yet
AdaCore Tech Cyber Security Web
94 pages
Wifu Syllabus
0% (1)
Wifu Syllabus
9 pages
Information Security BDCR Plan DRAFT Nov 2019
No ratings yet
Information Security BDCR Plan DRAFT Nov 2019
11 pages
Portable Executable File Format
No ratings yet
Portable Executable File Format
21 pages
QLogic CLI Reference Guide
No ratings yet
QLogic CLI Reference Guide
130 pages
Bypass Av Dynamics (NEW)
No ratings yet
Bypass Av Dynamics (NEW)
21 pages
Advanced Malware Analysis and Intelligence
No ratings yet
Advanced Malware Analysis and Intelligence
538 pages
Gaining and Maintaining Access
No ratings yet
Gaining and Maintaining Access
37 pages
Reverse Engineering Cheat Sheet
100% (2)
Reverse Engineering Cheat Sheet
1 page
Security Automation
No ratings yet
Security Automation
30 pages
OT Security
No ratings yet
OT Security
5 pages
Lab - Use Wireshark To View Network Traffic: Topology
No ratings yet
Lab - Use Wireshark To View Network Traffic: Topology
13 pages
Code Injection and Hooking
No ratings yet
Code Injection and Hooking
54 pages
NeXpose User Guide
No ratings yet
NeXpose User Guide
76 pages
Anti VM
No ratings yet
Anti VM
26 pages
Windows 7: Presented By, Sugeerthi G 2020614033
No ratings yet
Windows 7: Presented By, Sugeerthi G 2020614033
32 pages
Book Sample Buffer
No ratings yet
Book Sample Buffer
70 pages
Trackpad Ver. 2.0 Class 4
From Everand
Trackpad Ver. 2.0 Class 4
Nidhi Arora
No ratings yet
Trees
No ratings yet
Trees
125 pages
Minor Project Synopsis
No ratings yet
Minor Project Synopsis
15 pages
2022 Info Sec Semester
No ratings yet
2022 Info Sec Semester
3 pages
Excerpt
No ratings yet
Excerpt
10 pages
TI-Inspire For IGCSE
No ratings yet
TI-Inspire For IGCSE
52 pages
_.NET Framework and Programming
No ratings yet
_.NET Framework and Programming
2 pages
Unit 4 Flip Flops
No ratings yet
Unit 4 Flip Flops
55 pages
Lesson 3 Control Structures C++ For Students
No ratings yet
Lesson 3 Control Structures C++ For Students
20 pages
LinuxSocials Roadmap
No ratings yet
LinuxSocials Roadmap
12 pages
END TERM EXAM OS L6
No ratings yet
END TERM EXAM OS L6
3 pages
Basics of Computers - Quick Guide - Tutorialspoint
No ratings yet
Basics of Computers - Quick Guide - Tutorialspoint
50 pages
Chapt (1,2,3)
No ratings yet
Chapt (1,2,3)
100 pages
MCQ Computer GSEB class 10
No ratings yet
MCQ Computer GSEB class 10
22 pages
Python Inheritance
No ratings yet
Python Inheritance
9 pages
Thread libraries and implicit threading
No ratings yet
Thread libraries and implicit threading
3 pages
A Review of Dimensionality Reduction Techniques For Efficient INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING Computation Computation
No ratings yet
A Review of Dimensionality Reduction Techniques For Efficient INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING Computation Computation
8 pages
ZERO Lecture CSE316 August - December 2024
No ratings yet
ZERO Lecture CSE316 August - December 2024
42 pages
Activity 5
No ratings yet
Activity 5
4 pages
Multi Threading SJP Nasim
No ratings yet
Multi Threading SJP Nasim
10 pages
Machine Learning Internship Projects
No ratings yet
Machine Learning Internship Projects
8 pages
Operating System Overview and System Structures 2marks: Unit - I
No ratings yet
Operating System Overview and System Structures 2marks: Unit - I
6 pages
Isd Assignment
No ratings yet
Isd Assignment
21 pages
Java Basics (Notes
No ratings yet
Java Basics (Notes
6 pages
BCA Over-All Syllabus
No ratings yet
BCA Over-All Syllabus
112 pages
43 - Campus Placements Prediction Analysis Using ML
No ratings yet
43 - Campus Placements Prediction Analysis Using ML
47 pages
Multithreading in Scala
No ratings yet
Multithreading in Scala
7 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Improving Disassembly and Decompilation

Uploaded by

Improving Disassembly and Decompilation

Uploaded by

Improving Disassembly and Decompilation

Intro and Setup

Improving Decompilation: Data Types

Improving Decompilation: Function Calls

Improving Decompilation: Control Flow

Improving Decompilation: Data Mutability

Improving Decompilation: Setting Register Values

Intro and Setup

Exercise: Non-returning Functions

Exercise: Non-returning Functions

Exercise: Non-returning Functions

Improving Decompilation: Data Types

Defining Data Types

Exercise: Auto-creating Structures

Exercise: Auto-creating Structures

Exercise: Auto-creating Structures

Exercise: Defining Classes

Exercise: Defining Classes

Exercise: Virtual Function Tables

Exercise: Virtual Function Tables

Exercise: Virtual Function Tables

Exercise: Virtual Function Tables

Exercise: Virtual Function Tables

Exercise: Virtual Function Tables

Improving Decompilation: Function Calls

Function Signatures and Calls

Refresher on Function Signatures in Ghidra:

Refresher on Function Signatures in Ghidra:

Aside: The System V AMD64 ABI

Exercise: Overriding Signatures

Exercise: Overriding Signatures

Exercise: Overriding Signatures

Custom Calling Conventions

Exercise: Custom Calling Conventions

Exercise: Custom Calling Conventions

Exercise: Custom Calling Conventions

Exercise: Custom Calling Conventions

Multiple Storage Locations

Exercise: Multiple Storage Locations

Exercise: Multiple Storage Locations

Exercise: Inlining Functions

Exercise: Inlining Functions

Exercise: Inlining Functions

adjustStack decreases the stack pointer by 16, which violates

Improving Decompilation: Control Flow

Fixing Switch Statements

Exercise: Fixing Switch Statements

Exercise: Fixing Switch Statements

Exercise: Fixing Switch Statements

Exercise: Shared Returns

Exercise: Opaque Predicates

Jumps Within Instructions

Exercise: Jumps Within Instructions

Exercise: Jumps Within Instructions

Exercise: Jumps Within Instructions

Improving Decompilation: Data Mutability

Exercise: Constant Data

Exercise: Volatile Data

Exercise: Volatile Data

Improving Decompilation: Setting Register Values

Setting Register Values

Exercise: Global Variables

Exercise: Simplifying Transformations

in , unaff , and extraout

Pcode in the Decompiler View

Compiler vs. Decompiler

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.