People noticed a long time ago that SCO UNIX 3.2v4.0 won’t boot on anything resembling modern hardware, and it won’t boot in a VM either. In a VM the results may be inconsistent across implementations, but on a physical machine the system reboots right after the kernel is started (i.e. right after loading the N1 floppy). Most, if not all, older and newer 32-bit SCO XENIX/UNIX releases have no such problem.
It is not unreasonable to assume that the OS worked on 386s and 486s back when it was released (1992), but why won’t it work anymore? Intel processors are supposed to be backwards compatible…
Why it won’t work now
Analyzing why SCO UNIX 3.2v4.0 causes a triple fault (and a subsequent reboot) is not particularly difficult. When the kernel initializes, it first switches into 32-bit protected mode without paging, initializes the paging structures, and only then enables paging. This is slightly unusual as the recommended sequence is to enable protected mode and paging at the same time, but there’s nothing fundamentally wrong with this approach. The crashing code in question looks as follows:
mov eax, 000002000h
mov cr3, eax ; load PDBR
mov eax, cr0
or eax, 080000000h ; set PG bit
mov cr0, eax ; enable paging
jmp ecx ; jump to the following instruction
mov eax, 0f0000000h
This code is loaded somewhere near the top of physical memory and its address will depend on the system’s memory size. In an 8MB system, the above sequence might start at 0075f00bh. Once paging is enabled, the code will be mapped near the top of the 4GB address space, at f001000bh (this address does not change with memory size).
The code is obviously not identity-mapped, i.e. its physical address does not equal its linear address once paging is enabled. What’s worse, the physical address is not mapped by the page tables at all. As soon as the CR0.PG bit is set, the immediately following instruction (jmp ecx) suddenly doesn’t exist anymore. The attempt to fetch the instruction causes a page fault, but the IDT isn’t mapped either. That makes exception delivery rather difficult, and the CPU does not have any recourse but to cause a triple fault which triggers a shutdown bus cycle, which typically hard reboots the system.
Why it worked before
The semantics of instruction prefetching and TLB (Translation Lookaside Buffer) management slightly changed between the 386 and later CPUs, and not just once.
Indeed reading Intel’s own 386 programming manual it seems that SCO just followed the CPU vendor’s recommendations. Here’s an excerpt from From the Intel 80386 Programmer’s Reference Manual (1986), section 10.4.4, Page Tables:
The initialization procedure should adopt one of the following strategies to ensure consistent addressing before and after paging is enabled:
- The page that is currently being executed should map to the same physical addresses both before and after PG is set.
- A JMP instruction should immediately follow the setting of PG.
SCO simply chose the latter option, a JMP immediately following the enabling of paging.
On a 386, reloading CR3 causes a TLB flush, which means that the TLB must be reloaded after CR3 is loaded. At that point, the processor is still executing with paging disabled, hence reloading the TLB causes no problem. On the other hand, setting the PG bit in CR0 does not cause a TLB flush. The TLB is still valid and the JMP instruction can be fetched without referring to the paging structures. The JMP will point to an address that is not currently in the TLB, but it can be resolved by looking it up in the page tables. Changing CR0 also doesn’t flush the prefetch queue, hence there is a high likelihood that the JMP instruction is already fetched and decoded anyway.
What changed
The behavior of current IA-32 CPUs is subtly different. The SCO page switching code in UNIX 3.2v4.0 clearly violates the current (2012) Intel IA-32 programming manual which states: Processors need not implement any TLBs. Processors that do implement TLBs may invalidate any TLB entry at any time. Software should not rely on the existence of TLBs or on the retention of TLB entries. SCO’s code implicitly relies on the existence of TLBs and on the CR0 update not flushing said TLBs. Alternatively a prefetch queue with certain semantics would also produce the behavior assumed by SCO.
The current Intel manual (Volume 3, section 4.10.2.1) further states that MOV to CR0 invalidates TLBs if it changes the value of CR0.PG from 1 to 0. That’s not what SCO is doing, but the Intel manual also says that the processor is always free to invalidate additional entries in the TLBs and paging-structure caches, and specifically MOV to CR0 may invalidate TLB entries even if CR0.PG is not changing. CR0.PG is in fact changing here.
Note that this may be sloppy editing on Intel’s part as the case where CR0.PG changes from 0 to 1 is not explicitly documented. AMD’s documentation clearly states that modifying the CR0.PG bit flushes TLBs. The upshot is that on current processors, enabling paging in CR0 pulls the rug from under the SCO UNIX 3.2v4.0 kernel as the following instruction is no longer mapped. The CPU is in a state where it can’t do anything sensible about it either.
When did it change?
The obvious follow-up question is when the behavior actually changed. Since SCO UNIX 3.2v4.0 was released in 1992, it must have supported 386s and 486s, but not necessarily Pentium CPUs (released in 1993, after unexpected delays).
Interestingly, the Intel 486 documentation had already changed in 1990, even though the CPU behavior apparently had not. Intel’s 486 manual stated (section 10.5.2, on paging initialization) : As with the PE bit, setting the PG bit must be followed immediately with a JMP instruction. Also, the code which sets the PG bit must come from a page which has the same physical address after paging is enabled.
Instead of the 386’s “either use identity paging or jump right after enabling paging”, both identity mapping and using a jump instruction is now required. SCO’s code then already violated the 1990 reference manual for the 486, although it is hard to blame the programmers who clearly followed the earlier edition and did not run into any trouble on actual 486 hardware.
The Pentium Processor Family Developer’s Manual from 1995 offers an explanation in Volume 3, section 16.5.3 (paging initialization):
The following guidelines for setting the PG bit (as with the PE bit) should be adhered to
maintain both upwards and downwards compatibility:
- The instruction setting the PG bit should be followed immediately with a JMP instruction. A JMP instruction immediately after the MOV CR0 instruction changes the flow of execution, so it has the effect of emptying the Intel386 and Intel486 processor of instructions which have been fetched or decoded. The Pentium processor, however, uses a branch target buffer (BTB) for branch prediction, eliminating the need for branch instructions to flush the prefetch queue. For more information on the BTB, see the Pentium® Processor Family Developer’s Manual, Volume 1, order number 241428.
- The code from the instruction which sets the PG bit through the JMP instruction must come from a page which is identity mapped (i.e., the linear address before the jump is the same as the physical address after paging is enabled).
The 32-bit Intel architectures have different requirements for enabling paging and switching to protected mode. The Intel386 processor requires following steps 1 or 2 above. The Intel486 processor requires following both steps 1 and 2 above. The Pentium processor requires only step 2 but for upwards and downwards code compatibility with the Intel386 and Intel486 processors, it is recommended both steps 1 and 2 be taken.
This is a little tricky to reconcile with the reasonable assumption that SCO UNIX 3.2v4.0 must have worked on 486s in 1992, unless it only broke on the post-1992 486 models. Then again, there’s the following in the 1997 Intel Architecture Software Developer’s Manual, Volume 3, section 9.7 (Self-Modifying Code): For Intel486 processors, a write to an instruction in the cache will modify it in both the cache and memory, but if the instruction was prefetched before the write, the old version of the instruction could be the one executed. To prevent the old instruction from being executed, flush the instruction prefetch unit by coding a jump instruction immediately after any write that modifies an instruction. That fairly strongly hints that the 486 prefetch behavior was much the same as on the 386; once CR0 was modified, the jump instruction had already been prefetched and ready to execute with paging enabled.
There was another change in the Pentium architecture with regard to prefetch buffers. Among other things, a MOV to CR0 is a serializing instruction in the Pentium, meaning that it causes the prefetch queue to be flushed. That wasn’t the case on the 386, and presumably also not on the 486.
But there is a notable exception, according to section 3.2.2 (Serializing Operations) of Intel’s Pentium Processor Family Developer’s Manual, Volume 1: Whenever an instruction is executed to enable/disable paging (that is, change the PG bit of CR0), this instruction must be followed with a jump. The instruction at the target of the branch is fetched with the new value of PG (i.e., paging enabled/disabled), however, the jump instruction itself is fetched with the previous value of PG. Intel386, Intel486 and Pentium processors have slightly different requirements to enable and disable paging. In all other respects, an MOV to CR0 that changes PG is serializing. Any MOV to CR0 that does not change PG is completely serializing.
In other words, the Pentium architecture implemented a small hack designed to work with page switching code written for previous x86 generations. If and only if a MOV to CR0 changes paging, the jump instruction expected to follow it is fetched with the previous setting of the CR0.PG bit. That effectively emulates the 386 behavior. The Pentium manual claims that page enabling/disabling code must be identity mapped, but the special prefetch semantics for the jump immediately following a CR0 change obviate that.
The 1995 manual for the Pentium Pro, Volume 3, section 7.3 (Serializing Instructions) changed the wording somewhat: When an instruction is executed that enables or disables paging (that is, changes the PG flag in control register CR0), the instruction should be followed by a jump instruction. The target instruction of the jump instruction is fetched with the new setting of the PG flag (that is, paging is enabled or disabled), but the jump instruction itself is fetched with the previous setting. The Pentium Pro processor does not require the jump operation following the move to register CR0 (because any use of the MOV instruction in the Pentium Pro processor to write to CR0 is completely serializing). However, to maintain backwards and forward compatibility with code written to run on other Intel architecture processors, it is recommended that the jump operation be performed.
Reality
The behavior of actual CPUs is somewhat unclear. The OS/2 Museum confirmed that the SCO UNIX 3.2v4.0 boot disk causes a triple fault on Pentium II and Pentium M class processors. It is inconceivable that the disk would not boot on 386 and 486 machines available in 1992. The question then is how Pentium and Pentium Pro processors behave.
It would also appear that the current Intel processor documentation is incorrect when it implies that the old semantics of CR0 change followed by a jump are retained. If that were the case, SCO UNIX 3.2v4.0 should still boot on current systems.
Interested readers with access to old hardware may wish to perform their own experiment. The SCO UNIX 3.2v4.0 N1 boot floppy is still available from SCO’s FTP. All it takes is to ignore the prompt to insert the filesystem floppy and hit Enter (naturally after first hitting enter on the ‘boot:’ prompt).
Conclusion
From time to time, Intel changes the semantics of IA-32 compatible processors in ways that break existing commercially available software (we’re talking about off-the-shelf operating systems, not special-case code crafted to exploit differences between processor generations). Following the recommended programming practice is usually a good way to avoid such pitfalls, but not always, as sometimes even the recommended practice changes.
Intel clearly tries to strike a balance between evolving the IA-32 architecture and retaining backwards compatibility. Sometimes existing software falls on the “incompatible” side when a new CPU generation arrives.
In addition, trawling through old and new Intel reference manuals can be a fascinating exercise, especially when attempting to resolve vague and sometimes contradictory statements.