Thursday, October 15, 2015

Some Tips to Analyze PatchGuard

I published a new tool called meow that disables PatchGuard on Windows 8.1 on-the-fly. Though qertmeow has some interesting technical details I could explain such as support of ARM (Windows RT) and detection of the end of a function for installing an epilogue hook, on this entry, I am going to explain some techniques that help researchers analyze PatchGuard on your own rather than how this specific exploitation works. 

Those techniques are worthwhile to share because, you have to be able to analyze it if you hope to do something with PatchGuard as it is a moving target, and meow is not going to work forever due to updates of implementation of PatchGuard, or meow may not be perfect even at the time of publication of this article.

Summary

As your regular reverse engineering work, you can analyze PatchGuard in both static and dynamic means, but there are some hurdles specific to PatchGuard analysis on both sides, for example:

  • PatchGuard related functions do not have descriptive names or do not have names at all unlike other functions in the kernel
  • Most of function calls in PatchGuard functions are indirect calls like C++ code
  • Kernel debugging is not an option in some situations
  • Code is copied into random locations and stored in an encrypted form, and you cannot easily spot where to monitor at the run-time

Those are significant difficulties you face at the initial stage of analysis, but also ones you can easily overcome if you know some tricks I describe here. The tricks are as follows:

  • Identifying PatchGuard functions
    • Locating an initialization function and checking cross-references
    • Naming functions in a consistent manner
    • Checking the existence of SEH
  • Analyzing 0x109 Crash Dump for Re-constructing the PatchGuard context
    • Dissecting bug check parameters
    • Applying the format of the context to IDA
  • Discovering Threads Executing PatchGuard Code
    • Finding system threads on memory 

let us through them one by one.

Identifying PatchGuard functions

Firstly, you can easily find an initialization function of PatchGuard by sorting a function list by length. The largest function in the ntoskrnl.exe is the initialization function executed at the time of system initialization and sets up a large structure so called the PatchGuard context(s) on non-pagable memory (I am going to describe the structure of the context later). I call this function as Pg_xInitializePatchGuard() in this article.
Image 1: The largest functions on x64 
Image 2: The largest functions on ARM

Secondly, you can identify other PatchGuard related functions with cross-referencing function calls. If a function is referenced from only other PatchGuard related functions, it is safe to assume that the function is PatchGuard dedicated and needs to be analyzed. As an example, let us take a look at a caller of Pg_xInitializePatchGuard(), KiFilterFiberContext(). You see that this function is referenced from Pg_xInitializePatchGuard() and another unnamed function sub_1407339C3() which is not called by anywhere. At this stage, it is safe to say that KiFilterFiberContext() and sub_1407339C3() are only used for PatchGuard.
Image 3: Callers of Pg_xInitializePatchGuard()

Image 4: Callers of  sub_1407339C3()


For ease of analysis with IDA, it is worth naming functions in a consistent manner since a number of functions to be analyzed is going to be large. I usually name PatchGuard functions with prefixes Pg_ or Pg_x for ones with symbols names and for ones without symbol names, respectively. In this case, I name KiFilterFiberContext() as Pg_KiFilterFiberContext(), and sub_1407339C3() as Pg_xKiFilterFiberContextCaller().
Image 5: Filtering functions with the prefix


You may also want to use parse_x64_SEH.py to discover code flow using SEH. With this script, you find that Pg_xKiFilterFiberContextCaller() is an __except expression and corresponding __try is in KeInitAmd64SpecificState(). By now, you may rename Pg_xKiFilterFiberContextCaller() as Pg_xKeInitAmd64SpecificStateExceptionHandler() and KeInitAmd64SpecificState() as Pg_KeInitAmd64SpecificState().
Image 6: Reflected SEH information

Image 7: Where the corresponding __try is


Similarly, you can repeat the same process against all functions and global variables referenced from each Pg_*() function using the Proximity browser of IDA. This gives you a fairly comprehensive list of Pg_ functions, which can be discouraging enough to most of casual reverse engineers ;)

Analyzing 0x109 Crash Dump for Re-constructing the PatchGuard Context

As soon as you start to read Pg_*() functions, you discover that there are countless of indirect calls with specific registers. Those are accesses to the PatchGuard context, and it is essential to know what are stored and how they are used to understand the internals of PatchGuard.
Image 8: References to the PatchGuard context
The most precise way to accomplish this is to read the initialization function (i.e., Pg_xInitializePatchGuard()) for function pointers and a main variation routine (i.e., Pg_FsRtlMdlReadCompleteDevEx()) for variables. Besides static analysis, it is also a wise idea to perform dynamic analysis to get a large view of it quickly, especially at the initial stage of analysis.

There are some difficulties to perform effective run-time analysis, however.

First of all, you do not know where to monitor at the beginning of analysis since most of core code are copied onto random memory locations and stored in an encoded form except for the time of execution. In addition to that, setting breakpoints or installing hooks onto the kernel causes bug check 0x109 unless you know how integrity check is carried out. Moreover, you may not able to attach a kernel debugger to the system running on some non-PC devices such as Windows RT and Windows Phone.

It may sounds pretty bad to us, but a good news is that we can still uncover the contents of the PatchGuard context with analyzing crash dump. Specifically, you can interpret each 'reserved' bug check parameter in the following ways on x64:

  • Arg1 - 0xA3A03F5891C8B4E8 = An address of the PatchGuard context
  • Arg2 - 0xB3B74BDEE4453415 = An address of a validation structure that detected corruption
  • Arg3 = An address of corrupted data (in most cases)

    NB: You can easily spot those magic values in Pg_FsRtlMdlReadCompleteDevEx() before a call to  Pg_SdbpCheckDll() as well as code setting bug check parameters.


Let us take a look at an example on Windows 10. This is what you get on bug check 0x109:
----
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CRITICAL_STRUCTURE_CORRUPTION (109)

...
Arguments:
Arg1: a3a01f597768b4f0, Reserved
Arg2: b3b72bdfc9e65cc3, Reserved
Arg3: fffff80100af8074, Failure type dependent information
Arg4: 0000000000000001, Type of corrupted region, can be
...
----
Then, check the first parameter:
----
kd> ? a3a01f597768b4f0 - 0xA3A03F5891C8B4E8
Evaluate expression: -35180519620600 = ffffe000`e5a00008
kd> dps ffffe000`e5a00008 l200
ffffe000`e5a00008  70047266`b0b8a753
...
ffffe000`e5a000e0  00000000`00000000
ffffe000`e5a000e8  fffff801`00453b80 nt!ExAcquireResourceSharedLite
ffffe000`e5a000f0  fffff801`004537f0 nt!ExAcquireResourceExclusiveLite
ffffe000`e5a000f8  fffff801`00688930 nt!ExAllocatePoolWithTag
ffffe000`e5a00100  fffff801`006896d0 nt!ExFreePool
...
ffffe000`e5a004b8  fffff801`00b850b0 nt!HandleTableListLock
ffffe000`e5a004c0  ffffc001`0c614000 nt!ObpKernelHandleTable
ffffe000`e5a004c8  fffff780`00000000 nt!KiUserSharedData
ffffe000`e5a004d0  ff73c402`76affdcd ; a copy of nt!KiWaitNever
ffffe000`e5a004d8  fffff801`00b292c0 nt!SeProtectedMapping
...
----
In this example, ffffe000`e5a00008 is an address of the PatchGuard context starting with random-looking bytes followed by a bunch of function pointers and variables. Although you may not tell what some variables are at a glance, defining the PatchGuard structure in IDA with this result is fundamental to uncover how PatchGuard works.
Image 9: Defining the structure in IDA

Image 10: Applied the structure definition

The second parameter is an address to the validation structure that detected corruption. There are multiple structures and each corresponds to a type of corrupted region (Arg4). Their formats vary but are mostly made up of at least: type of corrupted region, address(es) to verify, checksum(s) to be expected as valid value(s).

The following is dump of the structure in this example (I commented with some guesswork):
----
kd> ? b3b72bdfc9e65cc3 - 0xB3B74BDEE4453415 
Evaluate expression: -35180519544658 = ffffe000`e5a128ae
kd> dps ffffe000`e5a128ae
ffffe000`e5a128ae  00000000`00000001   ; type of corrupted region
ffffe000`e5a128b6  fffff801`00789000 nt!BcpCursor <PERF> (nt+0x36d000)
                                       ; an address of .pdata 
ffffe000`e5a128be  244e1425`0004a9e8   ; checksum?, a virtual size of .pdata
ffffe000`e5a128c6  fffff801`00789000 nt!BcpCursor <PERF> (nt+0x36d000)
                                       ; an address of .pdata 
ffffe000`e5a128ce  fffff801`0041c000 nt!WerLiveKernelInitSystem <PERF> (nt+0x0)
                                       ; an address of nt image base
ffffe000`e5a128d6  0004a9e8`00842000   ; a virtual size of .pdata, a size of nt image
ffffe000`e5a128de  39e90701`406ebd95   ; chehcksums?
ffffe000`e5a128e6  78ca89f0`62a1f735
...
----

Those structures are stored at the end of the PatchGuard context as a variable length of an array following other structures and code to recover corruption for reliable bug check and referenced using variable fields containing an offset and a number of arrays.

Discovering Threads Executing PatchGuard Code

another trick for run-time analysis is discovering threads running on memory and setting break points there. It is possible only when you are able to attach a kernel debugger to the system.

As I mentioned earlier, PatchGuard contexts including their code are allocated on memory, which is either on executable NonPagedPool or independent pages allocated by MmAllocateIndependentPages(), and it exhibits uncommon outputs in the thread stack trace.
----
kd> !process 4 
...
        THREAD ffffe00137df7040  Cid 0004.0064  Teb: ...
...
        Win32 Start Address nt!ExpWorkerThread (0xfffff803d16ac3f0)
...
        Child-SP          RetAddr           Call Site
        ffffd001`043ccdb0 fffff803`d1658ab9 nt!KiSwapContext+0x76
        ffffd001`043ccef0 fffff803`d1657fb8 nt!KiSwapThread+0x689
        ffffd001`043ccfb0 fffff803`d1621d0c nt!KiCommitThreadWait+0x148
        ffffd001`043cd040 ffffe001`37ede587 nt!KeDelayExecutionThread+0x1dc
        ffffd001`043cd0b0 4c91448e`dcd4c0fd 0xffffe001`37ede587
        ffffd001`043cd0b8 00000000`00000000 0x4c91448e`dcd4c0fd
...
----
From this output, you can see that the thread 0x64 is calling KeDelayExecutionThread() from somewhere outside images. Obviously, it is not common unless you have malware in your system, especially considering the fact that the thread is a worker thread and even not a dedicated thread.

Once you find a thread like this, you are free to set a break point at the return address and get control with the debugger.
----
kd> u 0xffffe001`37ede587
ffffe001`37ede587 jmp     ffffe001`37ede5b5
ffffe001`37ede589 lea     rax,[rbp+1A8h]
ffffe001`37ede590 xor     r9d,r9d
ffffe001`37ede593 xor     r8d,r8d
ffffe001`37ede596 mov     qword ptr [rsp+20h],rax
ffffe001`37ede59b mov     rcx,r13
ffffe001`37ede59e call    qword ptr [rbp+68h]
ffffe001`37ede5a1 test    eax,eax
kd> bp 0xffffe001`37ede587
----
Image 11: Woohoo! Enjoy debugging.
This trick does not always work because PatchGuard sometimes skips sleep functions (KeDelayExecutionThread() or KeWaitForSingleObject()) and you do not catch the moment when a thread is executing code on memory, or PatchGuard sometimes runs inside of ntoskrn.exe and not on pool. But it is worth trying some times of reboot and checking if those threads exist.

Note that if you want to read code around the return address with IDA, you can search the byte sequence at the return address with [Alt-B].
----
kd> db 0xffffe001`37ede587 l10
ffffe001`37ede587  eb 2c 48 8d 85 a8 01 00-00 45 33 c9 45 33 c0 48  .,H......E3.E3.H
----

Image 12: Finding where the PatchGuard context is running in IDA 

Another option is using a hypervisor to monitor and detect PatchGuard threads based on execution of some uncommon instructions if the system is running on the Intel platform. See my PoC Sushi as an example.

Conclusion

We have seen how to locate functions, how to read 109 bug check parameters and how to discover threads running PatchGuard code. That is pretty much everything you need to know to get started. By now, you are ready for analyzing PatchGuard on Windows 10 where no one has ever succeeded in exploitation (at the time I wrote this article). All you have to do is just read code, name fields and functions, and test if your analysis is correct. That would not be anything special to us.

Special Thanks

Thank you very much @Myriachan for providing me many details about Windows RT and an opportunity to work on this fun project.

Saturday, August 8, 2015

Writing a Hypervisor for Kernel Mode Code Analysis and Fun

In this entry, I am going to share some tips to develop your own hypervisors using VMware Workstation and briefly introduce a sample ring-1 monitoring tool based on a home made hypervisor. This entry may not be for you if you already have your own hypervisors you can update at your disposal or are not interested in development at all.  

Motivation

You know that hypervisors are helpful for dynamic analysis, and most of you use them in some forms. However, I guess not many of researchers have ever written by your self might be because it sounds challenging, while it can be quite handy and fun to have ring-1 monitoring tools you can update however you want. You may, for example, want to detect and monitor PatchGuard contexts a piece of kernel mode code that relocates itself onto a random memory location and performs some uncommon operations such as disabling write protection with modifying the CR0, clearing hardware breakpoints with resetting the DR7 and accessing IDTR using the SIDT and LIDT instructions. Without having hypervisors, you are not able to detect any of those operations as those are just instructions, and you do not know where to set breakpoints due to periodic relocation. But if you wish, do hypervisors help you accomplish it.

Now, you may wonder why you want to write your own hypervisors even though there are quite a few open source projects you could re-purpose. The reason is "it sounds better if you say that you wrote your own hypervisor from scratch." ;) Besides, reading full-fledged hypervisors may not be as enjoyable as writing your own code. Let us see how you can do it.

What You Need

You need VMware Workstation as a test environment. It supports nested virtualization (emulation of VT-x technology) and lets you debug your hypervisor code form the host Windows through Windbg in the exact same manner as regular kernel debugging.

You may also be able to use other VMware products that support nested virtualization like Fusion, but you will have to configure kernel debugging between two VMs, which is not the most straightforward way. Also, having Windows as a host lets you use VirtualKD which makes communication between a debugee and a debugger very fast. VirtualBox, unfortunately, does not support nested VM and allow to execute VMX operations in it. 

If you are paranoia and do not trust software emulated VT-x, you could use a real box with a serial port (which means it is not going to be a laptop) as a debugee. You might already know that kernel debugging through USB is possible. DO NOT GO THERE unless you already have hardware that was confirmed that it supports USB debugging. There are some subtle requirements which you will not tell if a debugee device suffices by just looking at the specs online. Besides that, being able to take snapshots and memory dump from a hanged machine using VMware drastically speeds up your development.
Image 1: USB2 debug-cable. Not recommended.

Configuring Virtual Machines

You have to make some changes in a debugee virtual machine. Firstly, you have to check the following options:
  • Virtualize Intel VT-x/EPT or AMD-V/RVI
  • Virtualize CPU performance counters
Image 2: Virtual Machine Config

Secondly, you should add those lines in a corresponding VMX file [1], otherwise you will end up with getting mysterious, random-looking NMI_HARDWARE_FAILURE bug check.
----
hypervisor.cpuid.v0 = "FALSE"
mce.enable = "TRUE"
vhu.enable = "TRUE" 
----

EDIT (Nov 1, 2015): Removed entries with apic.xapic since I confirmed that it still worked without them.

Gotchas

Once you have configured the VM, the rest is only a matter of programming, but there are some gotchas I would like to share to keep you sane:
  • Try to avoid use of APIs inside a VMExit handler (in VMX root mode). Since the handler can be executed from any contexts including exception handlers or code under a very high IRQL, it is tough to conclude that calling an API which you do not know *exactly* what it does is 100% safe. 
  • For the same reason, avoid calling DbgPrint() from the VMExit handler. It usually works fine but sometimes causes mysterious errors like triple fault when you request a lot of log. Instead, store log texts into pre-allocated non-paged pool and print them out later from a safe context. 
  • Do not step-in to vmlaunch and vmresume instructions. The debugger will never return control to you, and the debugee will hang. 
  • Do not put software breakpoints everywhere in the VMExit handler. Although it seems to be fine in most cases, in some situation, the debugger does not get control from the debugee, and the system just freezes when int 3 is executed.  

Getting Memory Dump From a Hanged Debugee System

One of the biggest advantages of using VMware is that you can take memory dump (.dmp files) from a hanged debugee and give it to Windbg just like normal crash dump analysis. 

To take a dump file, first you suspend the virtual machine when it is freezed and take a snapshot.
Image 3: Suspend a hanged virtual machine

Then, navigate to where snapshot files are stored and run the vmss2core command under the VMware Workstation directory with names of the latest vmsn and vmem files. For instance, commands look like this:
----
> cd "C:\Users\user\Documents\Virtual Machines\Windows 8 x64"

> "C:\Program Files (x86)\VMware\VMware Workstation\vmss2core-win.exe" -W8 "Windows 8 x64-Snapshot45.vmsn" "Windows 8 x64-Snapshot45.vmem"

vmss2core version 2452889 Copyright (C) 1998-2015 VMware, Inc. All rights reserved.
scanning pa=0 len=0x10000000
Cannot translate linear address 7ff7d12b1b00.
Cannot read context LA from PRCB.
...
... 2020 MBs written.
... 2030 MBs written.
... 2040 MBs written.
Finished writing core.
----
Note that vmss2core comes with VMware Workstation by default (version 2780323) does not seem to be functioning (always generates 0 byte of empty files). If that the case for you too, download version 2452889 from VMware's website.

Now, you have gotten the memory.dmp and can give it to the debugger.
----
> windbg -z memory.dmp
----
Image 4: Memory dump analysis



References

There are some open source hypervisor projects you can refer to for implementation. Those are small enough to read quickly and written for Windows hosts.

A Sample Monitoring Tool Based on a Hypervisor

With those tips, you should be able to develop your own hypervisor fairly smoothly and utilize it for your research. I, for example, wrote a proof-of-concept hypervisor, Sushi, monitoring use of some uncommon instructions from non-image kernel space and stopping a thread when write protection in CR0 is modified.
Image 5: Demo hypervisor "Sushi" detecting interesting stuff

This is less than 4000 lines of code yet gives me an ability to investigate some run-time behaviour of the kernel I was unable to monitor. It is pretty awesome, and above all, playing with low-level stuff like this is quite fun.

In short, if you are interested in developing hypervisors for whatever reasons, you can do it without buying any extra hardware, and then, you can also make it one of your analysis tools like this demo hypervisor.

Thanks

Thank you @brucedang for letting me know that nested virtualization of VMware is reliable enough to write and test hypervisors.

Sunday, June 7, 2015

Reverse Engineering Windbg Commands for Profit

In this article, I will introduce benefit of reverse engineering Windbg for understanding the Windows kernel with looking at an undocumented command, fixing an issue in it and re-implementing the same functionality on a device driver.


Windbg is a powerful resource not only because you can see thorough run-time information even if you do not know how to manually do that but also you can learn how Windbg does that with reverse engineering it. Implementation of the !timer command is, for example, where you should examine if you want to know how to enumerate all scheduled timer callbacks.

But, what if you cannot find the command that works with internals you are interested in? In my case, I was looking for a way to list items inserted into the work queues with ExQueueWorkItem(), and documents of Windbg did not tell what command I could use.

Finding an Undocumented Command

There are many undocumented commands in Windbg. By using strings.exe against DLL files under the Windbg folder, you will get hints about them and/or where to look at. Here is a result of search for "workqueue":

Debuggers\x64>strings -n 5 -s *.dll | findstr /i workqueue
...
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i RealTime WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i HyperCritical WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i SuperCritical WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i Critical WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i Delayed WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i Normal WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** NUMA Node %i Background WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** Critical WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** Delayed WorkQueue
Debuggers\x64\winxp\kdexts.dll: **** HyperCritical WorkQueue
Debuggers\x64\winxp\kdexts.dll: ExWorkQueue
...

With firing up IDA, I easily found that there was an undocumented (*1) command named !exqueue and it was exactly what I wanted.

kd> !kdexts.help
...
exqueue [flags]              - Dump the ExWorkerQueues
...

Fixing an Issue

The issue was that this commend did not work against Windows 8.1 and 10 targets.

kd> !exqueue
GetGlobalValue: unable to get NT!KeNumberNodes type size

So, I started to debug and reverse engineer kdexts.dll a bit more, then noticed that Windbg was failing to read the nt!KeNumberNodes, which is always 1 unless you use NUMA, and a solution was just patching code to set 1 to eax.

.text:00000001801161DA   lea rcx, aNtKenumbernode ; "NT!KeNumberNodes"
.text:00000001801161E1   call read_global_variable
.text:00000001801161E1   mov eax, 1
.text:00000001801161E6   mov rbx, cs:qword_1801A8500
.text:00000001801161ED   mov rcx, rbx
.text:00000001801161F0   mov rbp, rax

Here is an output:

kd> !exqueue

**** NUMA Node 0 - ( Threads: 7/4096 ) ****

...
 -> Priority 12 - ( Concurrency: 0/2 )
...
    ExWorkItem (ffffe00084a91160) 
      Routine ListWorkItems!<lambda_bae...> (fffff800746e5290) 
      Parameter (fffff800746e7220)
    ExWorkItem (ffffe0008105ff10) 
      Routine dxgkrnl!DxgkpProcessTerminationListThread (...) 
      Parameter (ffffe0008105fbb0)
    WdfWorkItem (ffffe00082b035a0) 
      Routine cdrom!IoctlWorkItemRoutine (fffff80072c6e900)
    ExWorkItem (fffff8006b8ff0c0) 
      Routine nt!CmpDelayDerefKCBWorker (fffff8006ba5d008) 
      Parameter (0000000000000000)
...
 -> Priority 31 - ( Concurrency: 0/2 )

 -> Associated Threads

THREAD ffffe000828fe040  Cid 0004.0170  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
...


Re-implementing the Command


My goal was, however, not to use the command; the goal was to know how it works, so I reverse engineered !exqueue more to learn the details of the work queues and how to enumerate items in them by hand.

Analyzing Windbg commands is often a lot easier than analyzing the kernel file because it contains a lot of strings that help you know what structures are dealt with.
Image1: Strings in code of !enqueue
This time was not exception. Basically, for Windows 8 and later, the command gets a NUMA node (nt!_ENODE) structure containing a reference to the work queues with looking at nt!KeNumberNodes and nt!KeNodeBlock first. Here, I follow the same procedure as !exqueue using basic commands.

There is only one NUMA node block as I do not configure NUMA.

kd> dw nt!KeNumberNodes
fffff803`f5774008  0001

kd> dps nt!KeNodeBlock
fffff803`f576c800  fffff803`f56c3240 nt!ExNode0
fffff803`f576c808  00000000`00000000
fffff803`f576c810  00000000`00000000

Then, the command refers to the _ENODE.ExWorkQueue.WorkPriQueue.EntryListHead field, which is an array of the prioritized work queues.

kd> dt nt!_ENODE ExWorkQueue.WorkPriQueue. fffff803`f56c3240
   +0x0c0 ExWorkQueues              : [8]
   +0x100 ExWorkQueue               :
      +0x000 WorkPriQueue              :
         +0x000 Header                    : _DISPATCHER_HEADER
         +0x018 EntryListHead             : [32] _LIST_ENTRY [
                                               0xfffff803`f56c3358 -
                                               0xfffff803`f56c3358 ]
         +0x218 CurrentCount              : [32] 0n0
         +0x298 MaximumCount              : 4
         +0x2a0 ThreadListHead            : _LIST_ENTRY [
                                               0xffffe000`eb78f248 -
                                               0xffffe000`ecd89a88 ]

Each element of the EntryListHead is a list of work items (nt!_WORK_QUEUE_ITEM), and the array represents priorities of each list; in other words, there are 32 work queues and each manages items with priority 0 (the lowest) to 31 (the highest) respectively.

kd> dt nt!_ENODE -a ExWorkQueue.WorkPriQueue.EntryListHead fffff803`f56c3240
          ...
          [12] _LIST_ENTRY [ 0xffffe000`846f62f0 - 0xffffe000`82bea2c0 ]
          ...

kd> dt nt!_WORK_QUEUE_ITEM 0xffffe000`846f62f0
   +0x000 List             : _LIST_ENTRY [ 0xffffe000`8305d130 -
                                           0xfffff800`6b8ca418 ]
   +0x010 WorkerRoutine    : 0xfffff800`746dd290     
                             void  ListWorkItems!<lambda_7d382b...>+0
   +0x018 Parameter        : 0xfffff800`746df220 Void

The !exqueue command shows the contents of the lists as well as associated worker threads referred by the ThreadListHead field.

All you can do, I can do :) I wrote a driver that dumps all items in each work queue to confirm that the above analysis was correct. Note that because this driver is just PoC, it works only on the 64 bit version of Win8.1 and Win10.

https://github.com/tandasat/ListWorkItems

Conclusion

Analyzing Windbg commands often gives you good understanding of the Windows kernel with a less effort than analyzing the kernel file, and even if you do not find a helpful command at a glance, there may be an undocumented command which you can reverse engineer to unveil the Windows internals.

Side Notes

  1. It seems that !exqueue used to be documented and then abandoned for some reasons. I found a description about it on a help file came with the Windbg version 6.11.001.404.
  • The ExWorkQueues field in the _ENODE also holds pointers to the WorkPriQueue structures. It is likely that the structures are allocated for each processor but only one associated with the processor 0 is really used. 
  • Apparently, an item does not go to the queue if any of worker threads is in a wait state (ie, waiting for an item) when the item is being queued. I guess the item is associated with a thread without being stored in the queue for performance in this case. For this reason, the command does not show the first item when multiple items are queued at once. 

Saturday, March 7, 2015

Section Based Code Injection and Its Detection

Summary

I wrote a small tool to detect a possible code injection even if it is done by only section APIs.

----

A few weeks ago, I had an opportunity to analyze ransomware referred as Urausy. At a very initial stage of analysis, its behaviour seemed to be nothing surprising to me; it injected code into explorer.exe, and the injected code spawned svchost.exe hosting malicious code and initiated main ransom activities (More detaied analysis can be found on avast! blog).

Image 1: Process Tree 
I expected that the sample was injecting code using VirtualAllocEx() and CreateRemoteThread(), or relevant APIs such as NtWriteVirtualMemory() NtCreateThread/Ex(). But through analysis, I noticed that it was using none of them for the injection but using section APIs instead. Malware replaced the existing ntdll image on explorer.exe with a newly created section containing an inline hook on NtClose() and code responsible for starting svchost.exe. Output of VMmap indicates that an image of ntdll no longer exists and replaced with a shared Executable/Readable/Writable section after this injection.

Image 2: Memory Map of Explorer.exe (Before Infection)
Image 3: Memory Map of Explorer.exe (After Infection) 
Then, I started to wonder what if explorer.exe did not do any obvious activities I could easily spot and the sample did many other bad things besides that in meaningful ways (i.e., non junk operations)? I could miss the injected code.

It is also true of the case of the traditional code injection with VirtualAllocEx() and CreateRemoteThread(), but we are less likely to overlook it as we always expect to see that these APIs are used for injection and have tools or systems that tell us occurrence of typical thread injection.

So I wrote a driver, RemoteWriteMonitor, monitors inter-process memory modification by hooking NtWriteVirtualMemory() and ZwMapViewOfSection() to assist analysts to find this section based injection. This tool should report all possible code injections because if you want to execute your own code on another process from the user-mode, you need to either (1) write something onto the other process using those APIs (as far as I can think of), or (2) use a DLL file in conjunction with SetWindowsHookEx() or other type of injection mechanisms which is very easy to find due to preceding a disk write operation.

Let us see what it does in case of another Urausy sample I found on Malwr. If you installed the driver and run the sample, you see that the sample is mapping sections onto explorer.exe using ZwMapViewOfSection().

Image 4: Output on DebugView




This tool also saves the contents of memory being written as <SHA1>.bin so that you can examine what it is later. In this example, written data was code and a PE image respectively.

Image 5: File Contents (Code upside and PE downside)
This tool is more like PoC and does not have rich functionality, but I hope it helps you understand this uncommon injection method and its detection.

Monday, January 26, 2015

ARM Exception Handling and an IDAPython Script

Windows RT has differences in several points, and implementation of SEH is one of them. To sort out my understanding of ARM exception handling, I wrote an IDAPython script that interprets SEH information in an Windows RT PE file and applies it to an IDB. Here is an example of how this script helps you (I use one of PatchGuard routines uses SEH to obfuscate its code flow):
Image1: Before Use (plain output of IDA)
Image2: After Use
 In the image2, comments show that there is a __try/__except block around a call to __rt_sdiv().
Image3: Exception Filter
If you look at the location of an exception filter, you will find that the exception filter is calling another interesting looking function, which is actually authentic PatchGuard code flow. You could miss this path if you were just looking at plain output of IDA like the image1. This script will help you tell existence of SEH handlers.

About the internal of ARM exception handling, I do not explain it here as there is detailed enough explanations on MSDN[1] to understand it, but in short, it is fairly similar to one on x64. For instance, each function in a file is dictated by a RUNTIME_FUNCTION structure located in a .pdata section, and the structure points to an .xdata record consists of a SCOPE_TABLE structure and an array of its entries describing ranges of __try blocks, addresses of except filters and body blocks (or finally blocks). These are all essentially the same design as x64. 

As a note, I listed some references below which may complement your understanding of ARM exception handing[2][3][4][5]. Hope you enjoy them and my script too. 

  1. ARM Exception Handling
  2. Exceptional behavior: the Windows 8.1 X64 SEH Implementation
    References listed at the top of the articles are all exceptionally good, apart from this article.
  3. RtlLookupFunctionEntry function
    Returns a corresponding .pdata entry for a given address.
  4. .fnent (Display Function Data)
    You can dump .pdata/.xdata information with it.
  5. Improving IDA Analysis of x64 Exception Handling
    An x64 version of my script. Very handy.

Wednesday, January 21, 2015

A List of PatchGuard v8.1 Related Functions on x64 and ARM

I was working on analyzing PatchGuard on Windows RT 8.1 (which runs on ARM) last two months and got that work done recently. Analysis tuned out to be a lot easier than I expected mostly because PatchGuard's code was written in C and had the almost same structure on both x64 which I had already analyzed and ARM.

In order to look at PatchGuard on Window RT 8.1, almost all I had to do was to identify PatchGuard related functions and map them with corresponding functions on x64.

Here is a table showing that mapping (ones have different names between platforms are highlighted).

x64 ARM
CcAdjustBcbDepth CcUnmapBehindLazyReader
CcBcbProfiler CcDelayedFlushTimer
CcInitializeBcbProfiler CcPrepareDelayedFlushTimers
CmpAppendDllSection ExpWnfAcquireNameInstanceShared
CmpEnableLazyFlushDpcRoutine CmpEnableLazyFlushDpcRoutine
CmpLazyFlushDpcRoutine CmpLazyFlushDpcRoutine
DeferredRoutine <NoSymbol>
ExInitSystemPhase2 ExInitSystemPhase2
ExpCenturyDpcRoutine ExpCenturyDpcRoutine
ExpTimerDpcRoutine ExpTimerDpcRoutine
ExpTimeRefreshDpcRoutine ExpTimeRefreshDpcRoutine
ExpTimeZoneDpcRoutine ExpTimeZoneDpcRoutine
FsRtlMdlReadCompleteDevEx RtlpExecuteHandlerForUnwind_xdata_compact
FsRtlUninitializeSmallMcb ExpPrefetchPushLock
IopTimerDispatch IopTimerDispatch
KeCompactServiceTable KeCompactServiceTable
KeInitAmd64SpecificState KeArmDiscoverCacheTopology
KiBalanceSetManagerDeferredRoutine KiBalanceSetManagerDeferredRoutine
KiDispatchCallout CcDelayedFlushTimer
KiDpcDispatch <NoSymbol>
KiFastGetCallersAddress KiFastGetCallersAddress
KiFatalExceptionFilter KiFatalExceptionFilter
KiFilterFiberContext KiArmDiscoverCacheTopology
KiGetGdtIdt <NoSymbol>
KiLockExtendedServiceTable KiLockExtendedServiceTable
KiLockServiceTable KiLockServiceTable
KiMcaDeferredRecoveryService KiInitializeExternalCacheController
KiScbQueueScanWorker PopPdcSampleIdleTimeouts
KiServiceTablesLocked KiServiceTablesLocked
KiTimerDispatch <NoSymbol>
PopPoCoalescinCallback PopPoCoalescinCallback
PopThermalZoneDpc PopThermalZoneDpc
PsQueryThreadTerminationPort PspGetReaperLink
RtlLookupFunctionEntryEx CmpFlushLockedHives
SdbpCheckDll PspInitDeferredResourceReservation
<NoSymbol> CmpDelayFreeTMWorker
<NoSymbol> FsRtlPrivateResetHighestLockOffset
<NoSymbol> FsRtlReInitializeTunnelCache
<NoSymbol> FsRtlRemovePerStreamContextEx
<NoSymbol> KiCheckForDivideOverflow
<NoSymbol> KiRundownScbQueue
<NoSymbol> RtlInsertSmallIndex

These functions were taken from an ntoskrnl.exe version 6.3.9600.17476 and either only used by PatchGuard or have some importance from the point of view of analysis. For example, IopTimerDispatch() is not a PatchGuard dedicated function but can be used as one of its DPC routines, while KeInitAmd64SpecificState() and KeArmDiscoverCacheTopology() are dedicated and only used to initiate PatchGuard.

It seemed that some more functions were added for PatchGuard since Windows 10, but most, if not all, of these functions still remain the same name, so though this list is unlikely to be perfect, it would help you start your own analysis on both x64 and ARM.