CODEMAP: Semantic run-trace visualization for binary analysis.



Codemap introduction (Korean version)

Codemap introduction (English version)

What is Codemap?

Codemap is a binary analysis tool for "run-trace visualization" provided as IDA plugin. Unlike DBI(Dynamic Binary Instrumentation) based tools such as Intel PIN or QEMU, Codemap uses 'breakpoints' for tracing the program. If the program hits a breakpoint, Codemap breakpoint handler is invoked as a callback function, then proper action for trace is taken and program continues. This might sound like a slow/inefficient approach for execution tracing. However, there are two major advantages by tracing the binary in this manner.


1. Selective Tracing

When you trace a binary with Codemap, you can "selectively set trace-points (with conventional break-point) only against instructions of your interest". These selective trace-points avoid unneccessary tracing against meaningless part (out of your interest) of program. In most of case when you reverse-engineer a software, you want to analyze a very specific portion of the binary. For example, if you are analyzing a reason of program crash, you probably want to extract the execution log around that particular crashing point only. The "selective tracing" capability of Codemap perfectly suits for such cases.


2. General Register Tracing

Note that 'program execution trace' is not only thing that Codemap can trace. In fact, Codemap can trace entire register context. This enables Codemap to generate very concrete and flexible trace results. For example, to trace the "size of allocated heap chunks", you can set a trace-point against an instruction which references a general register that contains the size of heap chunk. Likely candidate would be an instruction from the starting part of malloc(). For another example, to trace the entire "heap chunk address", you can set a break-point at the end of malloc() and trace the EAX register value(always holds the chunk location at that moment) to visualize the overall location of heap chunks. As long as you understand the 'semantic of register context' while specific instruction is being executed, you can effectivly trace/visualize the information flow of these semantics, which can be very helpful to understand the behaviour of binary. you can also make your own SQL statement and specify how to visualize these results.


## Download

https://github.com/c0demap/codemap


## Requirements (essential)

- IDA Pro 6.5 or after. (6.6 is recommended, do not use cracked version. Codemap might not work), Python 2.x


## Requirements (recommended)

- Chrome web browser, Large screen / Dual monitor environment


## How to install

- run `python install.py` from Codemap home directory



## How to use

Basically, codemap hooks IDA and place its own break-point event handler to use as trace-point. every time when a program hits break-point(trace-point), codemap will save register/memory information of that moment into DB. Later, codemap visualizes this trace information in the web-browser with SQL query. There are 5 commands for Codemap.



### ALT-1 : Start(resume)/Stop(pause) Codemap

This button will popup a web-browser screen and start the Codemap tracing. -- press this button while program is paused. do not continue IDA manually by pressing IDA's continue button -- Before you press this button, make sure setup break-points where you wan to trace the binary. If you press this button again, Tracing will pause. to resume, press the button again.
Codemap must be PAUSED to see fast/accurate result from web browser screen



### ALT-2 : Set Function BP

This button set break-points against entire instruction inside the function that you currently put cursor.  Note that, the function (pointed by your cursor) should be recognized by IDA. (in order to force IDA to disassemble the byte stream, put the cursor on the byte that you think as the starting point of instructions, then press 'C' button, and in order to force IDA to recognize a function, put the cursor on the instruction that you think as the starting point of function, then press 'P' button). For example, put your cursor inside a IDA-recognized function and press Alt-2. Then you should see something like following.

그림입니다.
원본 그림의 이름: image29.png
원본 그림의 크기: 가로 1402pixel, 세로 1288pixel



### ALT-3 : Set Range BP

This button will ask you the address range in which the break-points will be set. By using this button, you can set break-points against instructions in the range of 0x8048100 ~ 0x8048200 per se. For example, if you press Alt-3, a popup window will ask you the start address.

그림입니다.
원본 그림의 이름: image26.png
원본 그림의 크기: 가로 1324pixel, 세로 1278pixel


After you put the starting address, Codemap asks the End address.

Then, you can see following result.


그림입니다.
원본 그림의 이름: image37.png
원본 그림의 크기: 가로 1262pixel, 세로 1282pixel



### ALT-4 : Create/Setup Module BP

This button helps you to setup breakpoints against starting point of entire functions inside a module(.dll, or .so). There are two steps in order to do this. First, you need to open up the module(dll or so) file with IDA and let IDA finish its static analysis.

그림입니다.
원본 그림의 이름: image34.png
원본 그림의 크기: 가로 1888pixel, 세로 1526pixel

Once IDA finished the static analysis for module, press this button(alt-4). Then a prompt will popup and ask you to give a name of file. Give a name that indicates this module and press enter.

Now, the ‘module break point’ information of this module is saved as a file now we can proceed.


While you are debugging an application which uses the module(dll or so), put your cursor inside the memory area of the loaded module and press this button again, then Codemap will ask you the name of the file that you stored the break-point information for this module.


Now, you should have break points against all the function start points inside the module.




## ALT-5 : Connect Codemap Graph with IDA


This button allows you to connect the IDA with Codemap graph browser. if you press this button and refresh the Codemap graph browser, the browser and IDA will be connected. After the connection, the IDA screen will follow your cursor from the graph browser.





## Tips for SQL query and Codemap.

Codemap takes SQL statement to generate the run-trace result. The most representative query would be ‘select eip from trace’ which generates the execution trace of eip register. You can specify all the register names and memory(if you add 'm_' prefix to a register name, its a string of memory dump pointed by the register) make any SQL query to extract various information from the run trace. For example, if you set up break-point at the end of malloc() and trace the eax register, you can visualize the Heap layout.


[select eax from trace]



If you set up breakpoint at the beginning of malloc() and trace a register value that holds the memory size value, you can track the sizes of Heap objects.

[select esi from trace]



you can also make an SQL query that traces multiple register values. and give the query condition (where, order by, limit, etc...) to extract the information of your own interest.


[select eax, ebx from trace]



[select eax, ebx from trace where eax < ebx]



[select eax, ebx from trace where eax < ebx limit 100]



Disclaimer

Codemap is open-source free software for non-commercial users. However, it is forbidded to redistribute this tool for commercial purpos.



Contacts

Contact information of developers
daehee - daehee87@kaist.ac.kr
zzoru - zzoru@kaist.ac.kr
dinggul - dinggul@kaist.ac.kr
Academic Reference
KAIST CysecLab (Graduate School of Information Security, School of Computing) Advisor (Prof. B. Kang)