SCO UNIX 3.2v4.0 vs. IA-32 Semantics Changes

People noticed a long time ago that SCO UNIX 3.2v4.0 won’t boot on anything resembling modern hardware, and it won’t boot in a VM either. In a VM the results may be inconsistent across implementations, but on a physical machine the system reboots right after the kernel is started (i.e. right after loading the N1 floppy). Most, if not all, older and newer 32-bit SCO XENIX/UNIX releases have no such problem.

It is not unreasonable to assume that the OS worked on 386s and 486s back when it was released (1992), but why won’t it work anymore? Intel processors are supposed to be backwards compatible…

Why it won’t work now

Analyzing why SCO UNIX 3.2v4.0 causes a triple fault (and a subsequent reboot) is not particularly difficult. When the kernel initializes, it first switches into 32-bit protected mode without paging, initializes the paging structures, and only then enables paging. This is slightly unusual as the recommended sequence is to enable protected mode and paging at the same time, but there’s nothing fundamentally wrong with this approach. The crashing code in question looks as follows:

mov eax, 000002000h
mov cr3, eax ; load PDBR
mov eax, cr0
or eax, 080000000h ; set PG bit
mov cr0, eax ; enable paging
jmp ecx ; jump to the following instruction
mov eax, 0f0000000h

This code is loaded somewhere near the top of physical memory and its address will depend on the system’s memory size. In an 8MB system, the above sequence might start at 0075f00bh. Once paging is enabled, the code will be mapped near the top of the 4GB address space, at f001000bh (this address does not change with memory size).

The code is obviously not identity-mapped, i.e. its physical address does not equal its linear address once paging is enabled. What’s worse, the physical address is not mapped by the page tables at all. As soon as the CR0.PG bit is set, the immediately following instruction (jmp ecx) suddenly doesn’t exist anymore. The attempt to fetch the instruction causes a page fault, but the IDT isn’t mapped either. That makes exception delivery rather difficult, and the CPU does not have any recourse but to cause a triple fault which triggers a shutdown bus cycle, which typically hard reboots the system.

Why it worked before

The semantics of instruction prefetching and TLB (Translation Lookaside Buffer) management slightly changed between the 386 and later CPUs, and not just once.

Indeed reading Intel’s own 386 programming manual it seems that SCO just followed the CPU vendor’s recommendations. Here’s an excerpt from From the Intel 80386 Programmer’s Reference Manual (1986), section 10.4.4, Page Tables:

The initialization procedure should adopt one of the following strategies to ensure consistent addressing before and after paging is enabled:

  • The page that is currently being executed should map to the same physical addresses both before and after PG is set.
  • A JMP instruction should immediately follow the setting of PG.

SCO simply chose the latter option, a JMP immediately following the enabling of paging.

On a 386, reloading CR3 causes a TLB flush, which means that the TLB must be reloaded after CR3 is loaded. At that point, the processor is still executing with paging disabled, hence reloading the TLB causes no problem. On the other hand, setting the PG bit in CR0 does not cause a TLB flush. The TLB is still valid and the JMP instruction can be fetched without referring to the paging structures. The JMP will point to an address that is not currently in the TLB, but it can be resolved by looking it up in the page tables. Changing CR0 also doesn’t flush the prefetch queue, hence there is a high likelihood that the JMP instruction is already fetched and decoded anyway.

What changed

The behavior of current IA-32 CPUs is subtly different. The SCO page switching code in UNIX 3.2v4.0 clearly violates the current (2012) Intel IA-32 programming manual which states: Processors need not implement any TLBs. Processors that do implement TLBs may invalidate any TLB entry at any time. Software should not rely on the existence of TLBs or on the retention of TLB entries. SCO’s code implicitly relies on the existence of TLBs and on the CR0 update not flushing said TLBs. Alternatively a prefetch queue with certain semantics would also produce the behavior assumed by SCO.

The current Intel manual (Volume 3, section 4.10.2.1) further states that MOV to CR0 invalidates TLBs if it changes the value of CR0.PG from 1 to 0. That’s not what SCO is doing, but the Intel manual also says that the processor is always free to invalidate additional entries in the TLBs and paging-structure caches, and specifically MOV to CR0 may invalidate TLB entries even if CR0.PG is not changing. CR0.PG is in fact changing here.

Note that this may be sloppy editing on Intel’s part as the case where CR0.PG changes from 0 to 1 is not explicitly documented. AMD’s documentation clearly states that modifying the CR0.PG bit flushes TLBs. The upshot is that on current processors, enabling paging in CR0 pulls the rug from under the SCO UNIX 3.2v4.0 kernel as the following instruction is no longer mapped. The CPU is in a state where it can’t do anything sensible about it either.

When did it change?

The obvious follow-up question is when the behavior actually changed. Since SCO UNIX 3.2v4.0 was released in 1992, it must have supported 386s and 486s, but not necessarily Pentium CPUs (released in 1993, after unexpected delays).

Interestingly, the Intel 486 documentation had already changed in 1990, even though the CPU behavior apparently had not. Intel’s 486 manual stated (section 10.5.2, on paging initialization) : As with the PE bit, setting the PG bit must be followed immediately with a JMP instruction.  Also, the code which sets the PG bit must come from a page which has the same physical address after paging is enabled.

Instead of the 386’s “either use identity paging or jump right after enabling paging”, both identity mapping and using a jump instruction is now required. SCO’s code then already violated the 1990 reference manual for the 486, although it is hard to blame the programmers who clearly followed the earlier edition and did not run into any trouble on actual 486 hardware.

The Pentium Processor Family Developer’s Manual from 1995 offers an explanation in Volume 3, section 16.5.3 (paging initialization):

The following guidelines for setting the PG bit (as with the PE bit) should be adhered to
maintain both upwards and downwards compatibility:

  1. The instruction setting the PG bit should be followed immediately with a JMP instruction. A JMP instruction immediately after the MOV CR0 instruction changes the flow of execution, so it has the effect of emptying the Intel386 and Intel486 processor of instructions which have been fetched or decoded. The Pentium processor, however, uses a branch target buffer (BTB) for branch prediction, eliminating the need for branch instructions to flush the prefetch queue. For more information on the BTB, see the Pentium® Processor Family Developer’s Manual, Volume 1, order number 241428.
  2. The code from the instruction which sets the PG bit through the JMP instruction must come from a page which is identity mapped (i.e., the linear address before the jump is the same as the physical address after paging is enabled).

The 32-bit Intel architectures have different requirements for enabling paging and switching to protected mode. The Intel386 processor requires following steps 1 or 2 above. The Intel486 processor requires following both steps 1 and 2 above. The Pentium processor requires only step 2 but for upwards and downwards code compatibility with the Intel386 and Intel486 processors, it is recommended both steps 1 and 2 be taken.

This is a little tricky to reconcile with the reasonable assumption that SCO UNIX 3.2v4.0 must have worked on 486s in 1992, unless it only broke on the post-1992 486 models. Then again, there’s the following in the 1997 Intel Architecture Software Developer’s Manual, Volume 3, section 9.7 (Self-Modifying Code):  For Intel486 processors, a write to an instruction in the cache will modify it in both the cache and memory, but if the instruction was prefetched before the write, the old version of the instruction could be the one executed. To prevent the old instruction from being executed, flush the instruction prefetch unit by coding a jump instruction immediately after any write that modifies an instruction. That fairly strongly hints that the 486 prefetch behavior was much the same as on the 386; once CR0 was modified, the jump instruction had already been prefetched and ready to execute with paging enabled.

There was another change in the Pentium architecture with regard to prefetch buffers. Among other things, a MOV to CR0 is a serializing instruction in the Pentium, meaning that it causes the prefetch queue to be flushed. That wasn’t the case on the 386, and presumably also not on the 486.

But there is a notable exception, according to section 3.2.2 (Serializing Operations) of Intel’s Pentium Processor Family Developer’s Manual, Volume 1: Whenever an instruction is executed to enable/disable paging (that is, change the PG bit of CR0), this instruction must be followed with a jump. The instruction at the target of the branch is fetched with the new value of PG (i.e., paging enabled/disabled), however, the jump instruction itself is fetched with the previous value of PG. Intel386, Intel486 and Pentium processors have slightly different requirements to enable and disable paging. In all other respects, an MOV to CR0 that changes PG is serializing. Any MOV to CR0 that does not change PG is completely serializing.

In other words, the Pentium architecture implemented a small hack designed to work with page switching code written for previous x86 generations. If and only if a MOV to CR0 changes paging, the jump instruction expected to follow it is fetched with the previous setting of the CR0.PG bit. That effectively emulates the 386 behavior. The Pentium manual claims that page enabling/disabling code must be identity mapped, but the special prefetch semantics for the jump immediately following a CR0 change obviate that.

The 1995 manual for the Pentium Pro, Volume 3, section 7.3 (Serializing Instructions) changed the wording somewhat: When an instruction is executed that enables or disables paging (that is, changes the PG flag in control register CR0), the instruction should be followed by a jump instruction. The target instruction of the jump instruction is fetched with the new setting of the PG flag (that is, paging is enabled or disabled), but the jump instruction itself is fetched with the previous setting. The Pentium Pro processor does not require the jump operation following the move to register CR0 (because any use of the MOV instruction in the Pentium Pro processor to write to CR0 is completely serializing). However, to maintain backwards and forward compatibility with code written to run on other Intel architecture processors, it is recommended that the jump operation be performed.

Reality

The behavior of actual CPUs is somewhat unclear. The OS/2 Museum confirmed that the SCO UNIX 3.2v4.0 boot disk causes a triple fault on Pentium II and Pentium M class processors.  It is inconceivable that the disk would not boot on 386 and 486 machines available in 1992. The question then is how Pentium and Pentium Pro processors behave.

It would also appear that the current Intel processor documentation is incorrect when it implies that the old semantics of CR0 change followed by a jump are retained. If that were the case, SCO UNIX 3.2v4.0 should still boot on current systems.

Interested readers with access to old hardware may wish to perform their own experiment. The SCO UNIX 3.2v4.0 N1 boot floppy is still available from SCO’s FTP. All it takes is to ignore the prompt to insert the filesystem floppy and hit Enter (naturally after first hitting enter on the ‘boot:’ prompt).

Conclusion

From time to time, Intel changes the semantics of IA-32 compatible processors in ways that break existing commercially available software (we’re talking about off-the-shelf operating systems, not special-case code crafted to exploit differences between processor generations). Following the recommended programming practice is usually a good way to avoid such pitfalls, but not always, as sometimes even the recommended practice changes.

Intel clearly tries to strike a balance between evolving the IA-32 architecture and retaining backwards compatibility. Sometimes existing software falls on the “incompatible” side when a new CPU generation arrives.

In addition, trawling through old and new Intel reference manuals can be a fascinating exercise, especially when attempting to resolve vague and sometimes contradictory statements.

Posted in 386, Intel | 6 Comments

The XENIX 2.2.3 Mystery, Continued

After pondering the strange TeleDisk images of SCO XENIX 386 2.2.3 (released in 1988) and not being able to make heads or tails of it, I decided it was time to simply restore the TeleDisk images onto actual floppies and boot those on a real system.

This was not entirely straightforward. The images were of 720K 3.5″ disks. On a typical system, DOS is unable to format a 1.44M 3.5″ disk as 720K; I know that from experience. TeleDisk did not fare any better. It would simply not write the 720K image onto a high-density 1.44M disk (and of course I don’t have any actual 720K disks anymore, or at least not ones I’d want to overwrite). That raised two questions: why, and what to do about it? Continue reading

Posted in PC hardware, Xenix | 25 Comments

Book Review: Linkers & Loaders

A Few Decades Late Book Reviews

Linkers & Loaders, by John R. Levine
Morgan Kaufmann Publishers, October 1999; 256 pages, ISBN 1-55860-496-0; $60.95

Linkers & Loaders (Front Cover)

Published in 1999, Linkers & Loaders is one of the more recent books reviewed in this series. Interestingly, after more than a decade, the work is both relevant and showing its age. Continue reading

Posted in Books, Development | 1 Comment

The XENIX 386 2.2.3 Mystery

On the Internets, one may find a package labeled as SCO XENIX 386 version 2.2.3 or similar, sometimes mislabeled as version 2.2.2. This is one of the very oldest operating systems designed for 386-based PC compatibles, released around June 1988 (years before 32-bit OS/2 and Windows NT, or Linux and 386BSD for that matter; note that the first release of XENIX 386 probably happened sometime around mid-1987). Based on AT&T’s System V Release 3.2, it’s also one of the hardest operating systems to get running, for several reasons.

XENIX 386 2.2.3

The core operating system (the ‘N’ floppies) is version 2.2.3c, though the rest, i.e. the basic and extended utilities (the ‘B’ and ‘X’ floppies) is version 2.2.2c. That may explain the occasional mislabeling. Continue reading

Posted in 386, Xenix | 9 Comments

Master Builders of OS/2

The MS OS/2 videos exhibit has now been completed with the addition of two PDF documents. These are scans of two fat three-ring binders that were handed out to attendees of the Microsoft OS/2 Developer’s Conference in New York City on July 7-9, 1987.

These are 700 pages containing hardcopies of conference slides. More or less all the slides are shown in the videos, but obviously print has a bit better resolution than VHS tape. This should make it easier to follow the videotaped presentations.

Note that the binder covers included the phrase “Master Builders”, which does not appear to be repeated anywhere else in the text.

Continue reading

Posted in Development, Microsoft, OS/2 | 11 Comments

OS/2 for PowerPC Tidbits

In December 1994, IBM shipped the first beta version of OS/2 for the PowerPC to selected developers. This beta included the PowerPC operating system as well as Intel-based cross-development tools that ran on OS/2 2.11 or Warp.

The operating system naturally required a PowerPC system to run on. In late 1994, there was only a single machine that OS/2 for PowerPC supported: IBM Personal Computer Power Series 440, also known as Model 6015 or Sandalfoot. This system was very similar to the RS/6000 Model 7020 (40P). The difference was that the Power Series used PReP firmware, rather than OpenFirmware.

Continue reading

Posted in IBM, OS/2, PowerPC | 8 Comments

Windows NT BSOD Aclock Port

Do you remember the famous Windows NT Blue Screen Of Death? For years it was a source of jokes and bad reputation of Windows reliability. There even was a Blue Screen Saver!

Today we fortunately see much less of it, but it still is there, reminding us that Windows Kernel was developed in a text mode environment. The 1989 NT Design Workbook tells us that in the early days of development there was an ANSI terminal emulator and bunch of command line utilities running in the text mode. Sadly all were removed in the retail version. The only true text mode application left around was autochk. Since the day Aclock was conceived I always wanted to run it on the NT text mode boot screen. In it’s twisted logic it actually makes a perfect sense. Continue reading

Posted in 386, Development, NT, VGA, Windows, x86 | 10 Comments

Microsoft’s 1987 OS/2 Videos

The site has been finally expanded to include videos of presentations given at Microsoft OS/2 developer conferences held in 1987, a quarter of a century ago. These videos are of historical interest as they show Microsoft’s product plans as they existed at a time (e.g. Steve Ballmer touting the advantages of OS/2). There is naturally a wealth of technical information as well, although that is less historically significant.

Windows buffs may also find some of the presentation interesting, as they show Windows 2.0 several months before the product became available. The reason why a conference targeted at OS/2 developers showed Windows at all was simple: In mid-1987, the upcoming Windows 2.0 was a prototype of the OS/2 GUI, called the Windows Presentation Manager at the time. While Windows 2.0 was more or less complete, OS/2 Presentation Manager was still more than half a year from the first beta.

In fact the Microsoft OS/2 SDK included copies of Windows 1.x and 2.0 so that developers could start wrapping their heads around the new concepts of GUI programming, and do so more than a year before the OS/2 Presentation Manager shipped. Continue reading

Posted in Microsoft, OS/2 | Leave a comment

OS/2 1.0 Availability Announced 25 Years Ago

On November 3, 1987 IBM announced a few new products and provided more information on several previously announced packages. One of those was OS/2 1.0 Standard Edition. First announced on April 2, 1987, OS/2 1.0 SE ($325) had been completed and IBM would start shipping it to customers in December 1987, slightly ahead of the original schedule.

The announcements from November 3, 1987 covered OS/2 1.0 and 1.1 SE (letter 287-498), OS/2 1.0 and 1.1 EE (letter 287-499), a slew of development tools such as C/2, Macro Assembler/2, or FORTRAN/2 (letter 287-500), and IBM LAN Sever 1.0 (letter 287-501).

Continue reading

Posted in IBM, Microsoft, OS/2 | Leave a comment

IBM PC XENIX

In 1984, IBM briefly flirted with XENIX, Microsoft’s variant of UNIX licensed from AT&T.  Around 1983-1984, Microsoft and Intel worked on porting XENIX to the 286 processor; Intel shipped XENIX with a number of its development systems in the mid-1980s.

A good description of IBM’s flavor of XENIX may be found in the IBM Personal Computer Seminar Proceedings, Volume 2, Number 9, published in November 1984. The OS/2 Museum recently obtained a copy of this booklet, which is now being made available in PDF format.

Continue reading

Posted in IBM, Microsoft, Xenix | 27 Comments