On some systems, both physical and virtual, 64-bit Windows 8.1 as well as Server 2012 R2 consistently crashes with error code (bug check) 0xC4; 64-bit Windows 8 may run on these same systems without trouble. On physical systems, the BSOD is typically accompanied by the machine rebooting so fast that it is very difficult to read the error message at all. If the system keeps attempting to boot Windows 8.1, this results in a nice reboot loop.
In VirtualBox, users at least have a chance to read the error message before the VM terminates with a fatal error:
Not that the error message is in any way helpful. According to MSDN, bug check 0xc4 means DRIVER_VERIFIER_DETECTED_VIOLATION and the first argument of 0x91 is “reserved”. But wait, a driver verifier violation usually means buggy software, so how can that happen with a Windows 8.1 installation DVD? Windows 8.1 couldn’t be buggy, could it?
Short answer… yes, it could be. The long answer is a bit more complicated, but the Windows kernel debugger provides all the clues.
It doesn’t help that Microsoft helpfully disabled the F8 start-up key on Windows 8.1, as that makes debugging the installation disc itself just about impossible. Fortunately the F8 key survived in Windows Server 2012 R2, and the two systems are very, very similar.
Hitting F8 early enough in the start-up sequence (the first “loading files” progress bar) brings up the following menu:
The “Debugging Mode” selection allows WinDbg to attach to the system/VM. And soon enough, WinDbg hits a bug check… but wait—it’s not bug check 0xC4, it’s 0x5D or UNSUPPORTED_PROCESSOR. MSDN is again unhelpful. Sadly, Geoff Chappell has not yet updated his excellent bug check 0x5D documentation to cover this issue either.
But instead of perhaps BSODing with the 0x5D bug check, Windows hits a 0xC4 bug check if it is allowed to continue executing. According to WinDbg, that bug check is caused by incorrect stack switching. Whether that is really the case is difficult to tell. At any rate, WinDbg classifies it as a software bug (in Windows itself, since there’s nothing else installed yet).
Affected Systems
The reason for the 0x5D bug check on 64-bit Windows 8.1/Server 2012 R2 is typically the lack of a CMPXCHG16B instruction. One might think that a company with the resources of Microsoft would adequately document the exact processor requirements for a server OS, but that does not appear to be the case. But fear not—for those willing to make the leap of faith that Windows 8.1 and Server 2012 R2 is the same thing, the documentation does exist: “To install a 64-bit OS on a 64-bit PC, your processor needs to support CMPXCHG16b, PrefetchW, and LAHF/SAHF”. And no, this is not exactly news.
The new Windows 8.1/Server 2012 R2 requirements only affect fairly old processors, most likely only CPUs produced in 2005 and earlier. There were a few Intel CPUs based on the Pentium 4 architecture which didn’t support LAHF/SAHF, but those are probably on the scrap heap where they always belonged.
However, the entire first generation of AMD Opteron and Athlon 64 processors is affected because those CPUs did not support the CMPXCHG16B instruction. Indeed the problem described here was easy to reproduce on a system equipped with an Athlon 64 3200+ CPU.
For those without historic hardware at hand, a VirtualBox VM will do, as long as the guest OS type is set to something other than Windows 8.1 or Server 2012 R2. VirtualBox will then not enable the CMPXCHG16B instruction and Windows 8.1 will crash as described above.
The Bug
The exact cause is unclear but the problem may be caused by the 0x5D bug check occurring before the NT kernel is prepared to deal with bug checks. The OS is not fully initialized and hits the 0xC4 bug check. That will soon invoke nt!KiBugCheckDebugBreak and then nt!DbgBreakPointWithStatus.
The latter is basically just an INT3 instruction and then things go horribly wrong. The IDT is either not fully initialized or corrupted, and the system can’t handle exceptions. A triple fault ensues (the CPUs way of saying “I’m utterly lost and can’t go on anymore”), which on physical systems usually causes a reboot of the system. In virtualized environments it typically terminates the VM.
The Fix
For physical systems which hit this error, there’s nothing that can be done other than upgrading the hardware or downgrading the OS. They’re just not supported by 64-bit Windows 8.1.
For VirtualBox VMs, the situation is a bit different. There do not appear to be any CPUs that would support hardware virtualization and not support CMPXCHG16B (although conclusively proving this is impossible). Because VirtualBox requires hardware virtualization for 64-bit VMs, it’s more or less certain that the CPU does support CMPXCHG16B, and it just needs to be enabled for the VM. For recent VirtualBox releases, simply setting the guest OS type to Windows 8.1 (64-bit) or Windows Server 2012 R2 will do the trick.
It’s clear that although 64-bit Windows 8.1/Server 2012 R2 enforces the presence of required CPU features, Microsoft never adequately tested the failure scenario. This causes the users of affected systems to needlessly suffer because they do not get any useful diagnostic information, and may not even be able to read the error message (such as it is) at all. While obscure, such bug is hardly something that would inspire confidence.
“There were a few Intel CPUs based on the Pentium 4 architecture which didn’t support LAHF/SAHF, but those are probably on the scrap heap where they always belonged.”
But the PREFETCHW requirement AFAIK exclude all 90nm P4 CPUs, making the 65nm P4s the minimum requirement. For desktop P4s this is an easy drop in replacement, but for Xeon DP that is not true.
Funny that I had been looking at the same issues recently. Some motherboard chipsets like P35 with 2007 BIOS won’t let the instructions run even if the CPU supports it. Sometimes, there is a BIOS update from 2009 that permits Win 8.1 installs.
http://answers.microsoft.com/en-us/windows/forum/windows8_1-windows_install/cannot-update-win-8-64-bit-to-win-81-64-bit/47f34921-87ec-407c-9de8-d51d90598287?page=2
This is because of a CPU errata that involves some of the CPUID bits not being set properly, for which Intel had to release a microcode update.
This part of the bug (“… the problem may be caused by the 0x5D bug check occurring before the NT kernel is prepared to deal with bug checks”) reminded me of a similar problem I recently discovered when debugging OS/2 1.0 in PCjs, which I’ve since documented in the source code on GitHub. Here’s an excerpt — another example of a kernel not getting all its ducks in row early enough:
“Interestingly, if we force a GP_FAULT to occur at a sufficiently early point in the OS/2 1.0 initialization code, OS/2 does a nice job of displaying the GP fault and then shutting down:
0090:067B FB STI
0090:067C EBFD JMP 067B
but it may not have yet reprogrammed the master PIC to re-vector hardware interrupts to IDT entries 0x50-0x57, so when the next timer interrupt (IRQ 0) occurs, it vectors through IDT entry 0x08, which is the DF_FAULT vector. A spurious double-fault is generated, and a clean shutdown turns into a messy crash.
Of course, that all could have been avoided if IBM had heeded Intel’s advice and not used Intel-reserved IDT entries for PC interrupts.”
In retrospect, I wonder if it may have been feasible to change the PIC base and re-vector the interrupts like IBM did for IRQ 13/INT 2. But yeah, ignoring the “reserved” advice was clearly a bad idea. Lots of trouble later on.
Have you run into any situation where the spurious OS/2 1.x #DF may have been reproducible on contemporary systems? The thing about the Windows 8.1. bug check is that it could/should have been caught during testing.
No, the bogus OS/2 #DF has only been seen “in the lab”, not in the wild.
FYI, MS could trivially put the emulation of LAHF/SAHF and 3DNow! PREFETCH back, but it is probably not worth the effort given the fact that the 65nm drop in reduces power consumption and seems to be pretty cheap on eBay. CMPXCHG16B would be more difficult, of course, and it affects all pre-DDR2 K8 CPUs AFAIK.
Yeah, all three-digit Opteron models (pre-2006) are affected due to lack of CMPXCHG16B.
I suspect the whole thing would also be much less of a problem if it wasn’t a free update suddenly bricking certain computers. It seems Microsoft couldn’t make its mind as to whether Windows 8.1 should be “just an update” (in which case it should not have changed the HW requirements) or a new OS version.
Wrote http://yuhongbao.blogspot.ca/2015/06/why-your-core-2-processor-appear-to-not.html
Does 8.1\Server 2k12R2 actually need CMPXCHG16B once installed, or does it only check for it during installation to make sure it’s running on a CPU newer than a certain point?
If the former, a) what does it use it for, and b) could the Windows kernel be patched to emulate CMPXCHG16B in software if it isn’t available in hardware?
Yes, Windows 8.1 really uses CMPXCHG16B. It’s used for implementing synchronization primitives, because it’s the only instruction that can read/write two 64-bit quantities at once. It’s basically the 64-bit equivalent of CMPXCHG8B (added in the Pentium). The kernel could be patched, but patching the Windows 8.1 kernel is not trivial because the OS takes measures to prevent modifications.
>Yes, Windows 8.1 really uses CMPXCHG16B. It’s used for implementing synchronization primitives, because it’s the only instruction that can read/write two 64-bit quantities at once. It’s basically the 64-bit equivalent of CMPXCHG8B (added in the Pentium).
Why couldn’t Windows 8.1 check to see if the processor supports CMPXCHG16B, and, if it doesn’t support it, use some sort of multi-operation 64-bit-quantity writing rather than simply crashing?
>The kernel could be patched, but patching the Windows 8.1 kernel is not trivial because the OS takes measures to prevent modifications.
Kernel Patch Protection is mainly geared towards preventing the kernel from being patched on-the-fly; patching the necessary files (and KPP itself, if necessary) on the installation media would presumably be much easier to get to work than trying to do the patching once Windows (including KPP) is already running.
The files are all signed so they really shouldn’t be easy to modify on disk. Yes, Windows 8.1 could use an alternative code path, like the previous versions did, but that’s bad for performance and Microsoft decided that CPUs without CMPXCHG16B aren’t important enough anymore. The crashing part is clearly a bug, it’s supposed to report some kind of identifiable error.
Ah, got it. (Although wouldn’t running a bit slower than optimally still be better than not running at all?)
The reason why CMPXCHG16B is required is that the address space is extended to 48-bit instead of 44-bit, and it is used in the SList routines.