In my recent adventures into MS Windows land I needed to inject a DLL into a process at load time. The DLL should hook the program's entrypoint so that it can take control over certain aspects of the process before the actual program executes any instruction.
I thought that this must be a long solved problem and searched the web for an answer. I found 1001 ways to implement DLL injection but most of them do not support load time injection and non of them supported load time injection and hooking the entrypoint.
One solution that is very close to what I need is the AppInit_DLL mechanism. Also various sources on the Internet claim that AppInit_DLL is unstable I didn't have any issues with it in the last couple of month. The issue with AppInit_DLL is that it relies on User32.dll to be used by a particular application. Most applications use it but if User32.dll is not in the application's import list in the PE file but the application loads it manually using LoadLibraryX the AppInit_DLL injection happens too late.
When I started looking into load time DLL injection I had a hard time finding anything useful. The most useful information I found was this blog post on Injecting DLL into process on load. Their technique worked by overwriting the program's entrypoint with an endless loop (JMP $-2) to get the process running without executing any code. While the process is looping they attach a remote thread that calls LoadLibrary to inject the their DLL.
The problem with their approach is that the injected code can't take control over the entrypoint itself. Simply overwriting the endless loop with a jump to DLL code is possible but creates a race condition that mostly leads to NOT being able to hijack the entrypoint from the injected DLL.
The second problem is ASLR. Their code didn't support randomized processes.
The solution I came up with uses pydbg to load the process and carry out the injection. I also use an endless loop that I place at the program's entrypoint. But my endless loop has a defined exit, it checks if a register value is non zero and the jumps to the address in the register. The injected library's DLL main function just needs to write the address of it's entrypoint hook to the specific memory address to over write zero in the load register instruction (mov eax, 0x00000000).loop:
mov eax, 0x00000000;
cmp eax, 0x00000000;
The second novel part is to resolve the ASLR problem. I do that by adding a small feature to pydbg that allows to set a callback for the initial breakpoint on application load. The tiny patch for pydbg is here: pydbg.patch. That breakpoint is late enough that we can call enumerate_modules() to determine the load address of our executable.
The actual steps are listed below:
- load executable (pydbg)
- register initial breakpoint callback (pydbg)
- when initial break happens
- retrieve the base address of the executable module to calculate entrypoint (needed if ASLR is present)
- save entrypoint code to disk (12 bytes)
- write endless loop to entrypoint (12 bytes)
- set breakpoint on entrypoint
- *let process continue*
- entrypoint breakpoint is reached
- register "user callback"
- *let process continue* (process starts looping on entrypoint)
- user callback is executed
- create remote thread to inject DLL
- detach from process
- dllmain from injected DLL is called
- write address of entrypoint hook into loop code at entrypoint (see dllexample.c)
- *let process continue*
- endless loop at entrypoint breaks and entrypoint hook of injected DLL is called
- ... some entrypoint hook action ...
- restore entrypoint code from file (see dllexample.c)
- jump to entrypoint and let the process finally run
The injection tool and the example DLL that takes care of hooking and un-hooking the entrypoint is available here together with the tiny patch for pydbg. Files: injection.py dllexample.c pydbg.patch.
I hope I didn't just miss something and did all this work for nothing.