Friday, March 06 2015
Wednesday, March 04 2015
Yesterday I wrote about load time DLL injection and of course somebody (Jurriaan Bremer of cuckoo sandbox) pointed out that there is of course a pre existing tool. Specifically his inject tool that is part of cuckoo sandbox. The tool uses
QueueUserAPC as the way to execute code within the process to call LoadLibrary.
updated: March 6th 2015
I couldn't figure out what the exact difference is between QueueUserAPC and CreateRemoteThread in terms of when it is executed. The RemoteThread only executes while the program is running and if it is running it will execute code. I don't see the obvious difference for QueueUserAPC. I will read up on this but so far I will just continue with my actual project.
For my project I need the guarantee that I can run my code first. That is why I went through all this hassle.
The important information from the QueueUserAPC documentation is:
If an application queues an APC before the thread begins running, the thread begins by calling the APC function. After the thread calls an APC function, it calls the APC functions for all APCs in its APC queue.
This means: if you start the process in suspended mode and call QueueUserAPC before resuming the process the APC function will be called before the thread starts executing.
In my recent adventures into MS Windows land I needed to inject a DLL into a process at load time.
The DLL should hook the program's entrypoint so that it can take control over certain aspects of the process before
the actual program executes any instruction.
I thought that this must be a long solved problem and searched the web for an answer. I found 1001 ways to
implement DLL injection but most of them do not support load time injection and non of them supported load time injection and
hooking the entrypoint.
One solution that is very close to what I need is the AppInit_DLL mechanism. Also various sources on the Internet claim that AppInit_DLL is unstable I didn't have any issues with it in the last couple of month. The issue with AppInit_DLL is that it relies on User32.dll to be used by a particular application. Most applications use it but if User32.dll is not in the application's import list in the PE file but the application loads it manually using LoadLibraryX the AppInit_DLL injection happens too late.
When I started looking into load time DLL injection I had a hard time finding anything useful. The most useful
information I found was this blog post on Injecting DLL into process on load. Their technique worked by overwriting the program's entrypoint
with an endless loop (JMP $-2) to get the process running without executing any code. While the process is looping they
attach a remote thread that calls LoadLibrary to inject the their DLL.
The problem with their approach is that the injected code can't take control over the entrypoint itself. Simply overwriting the endless loop with a jump to DLL code is possible but creates a race condition that mostly leads to NOT being able to hijack the entrypoint from the injected DLL.
The second problem is ASLR. Their code didn't support randomized processes.
The solution I came up with uses pydbg to load the process and carry out the injection.
I also use an endless loop that I place at the program's entrypoint. But my endless loop has a defined
exit, it checks if a register value is non zero and the jumps to the address in the register. The
injected library's DLL main function just needs to write the address of it's entrypoint hook to the specific
memory address to over write zero in the load register instruction (mov eax, 0x00000000).
mov eax, 0x00000000;
cmp eax, 0x00000000;
The second novel part is to resolve the ASLR problem. I do that by adding a small feature to pydbg that
allows to set a callback for the initial breakpoint on application load. The tiny patch for pydbg is here: pydbg.patch. That breakpoint is late enough that
we can call enumerate_modules() to determine the load address of our executable.
The actual steps are listed below:
- load executable (pydbg)
- register initial breakpoint callback (pydbg)
- when initial break happens
- retrieve the base address of the executable module to calculate entrypoint (needed if ASLR is present)
- save entrypoint code to disk (12 bytes)
- write endless loop to entrypoint (12 bytes)
- set breakpoint on entrypoint
- *let process continue*
- entrypoint breakpoint is reached
- register "user callback"
- *let process continue* (process starts looping on entrypoint)
- user callback is executed
- create remote thread to inject DLL
detach from process
- dllmain from injected DLL is called
- write address of entrypoint hook into loop code at entrypoint (see dllexample.c)
- *let process continue*
- endless loop at entrypoint breaks and entrypoint hook of injected DLL is called
- ... some entrypoint hook action ...
- restore entrypoint code from file (see dllexample.c)
- jump to entrypoint and let the process finally run
The injection tool and the example DLL that takes care of hooking and un-hooking the entrypoint is available here together with the tiny patch for pydbg.
Files: injection.py dllexample.c pydbg.patch.
I hope I didn't just miss something and did all this work for nothing.