While working on an unrelated problem, I noticed a strange behavior of one of my OS/2 VMs running OS/2 Warp 4.52. To cut a long story short, if an unhandled floating-point exception occurred in a DOS window (VDM, or Virtual DOS Machine) while executing real-mode code, the DOS box would crash because it would try to execute an invalid instruction. The IRQ13 (INT 75h) handler provided by the BIOS would run, and then execute an INT 02h instruction (for compatibility with old PCs). But interrupt vector 2 pointed to a place in the middle of the VDM’s conventional memory that wasn’t even allocated. It should be pointing to the BIOS and execute harmless code.
I quickly realized that only one of several very similarly configured VMs did this. At first I suspected a problem with the way the DOS box in the troublesome VM was set up, but that wasn’t the case. The OS version wasn’t it either. What’s more, the VDM was simply reflecting the contents of physical memory (the former real mode IVT at physical address zero)… and the IVT was modified very early in the boot, before even showing the OS/2 boot logo or boot menu.
Finally realization dawned: The OS/2 kernel debugger was doing this, and that’s why most of my VMs didn’t have the problem. But why would this happen at all?
The behavior likely exists in every 32-bit OS/2 version; it was verified to be present in OS/2 2.11 (1993), OS/2 Warp 3 (1994), and OS/2 Warp 4.52 refresh (2002). Now let’s make it clear that this is a bug that is very difficult to trigger. The OS/2 kernel debugger overwrites the first six interrupt vectors. For the most part, the default BIOS handlers do nothing, which means that software which might generate these interrupts will install its own handlers and avoid this issue entirely.
The interrupts include division by zero, single-step interrupt, breakpoint interrupt, and overflow interrupt. The NMI (vector 2) is the odd one out because the BIOS actually does contain a handler. On the other hand, an OS/2 VDM won’t get any NMIs… except that the IRQ13/INT 75h handler should invoke the NMI service routine in software.
The problem shows a few things about the OS/2 MVDM (Multiple Virtual DOS Machine) operation. DOS boxes in OS/2 partially get to execute the system’s actual BIOS, and the host system’s real IVT is used to initialize the IVT in the DOS boxes. The VDM environment isn’t entirely divorced from the underlying hardware/firmware, which has both advantages and disadvantages.
But back to the question of why this happens at all. In the days of OS/2 1.x, the OS would frequently switch between real and protected mode. The kernel debugger had both protected- and real-mode parts, and it took over several real-mode interrupts. While the DOS box was executing, the kernel debugger was in fact present the whole time.
In OS/2 2.0, the situation is different. The kernel debugger still initializes a real-mode part and installs handlers for the first six interrupt vectors. This is useful while the real-mode portion of the OS startup sequence is executing. But soon enough, the OS switches to protected mode and stays there—any real-mode code is then executed within a V86 task.
By now, the cause of the problem is probably obvious: The OS/2 kernel debugger takes over the first six real-mode interrupts, but the vectors are never restored. When a VDM starts, its real-mode IVT will contain several vectors pointing to where the real-mode kernel debugger used to be back when the OS was still in real mode, but the vectors are no longer valid. If the interrupt handlers are, against all odds, somehow invoked, the VDM will almost certainly very quickly crash.
This appears to be a very old and very obscure bug. It only shows up on OS/2 systems with the kernel debugger installed, and that is a tiny minority. Even on those systems, it is quite improbable that the bug will be triggered—most likely only as a consequence of another bug when interrupts that shouldn’t be invoked somehow are. It’s no wonder that the bug went undetected. It would be presumably easy to fix by restoring the original IVT contents either before the system switches to protected mode for good, or at least by restoring the original IVT contents for newly created VDMs.
Interesting problem, thanks for the detailed analysis!
Maybe I could use this as an opportunity to ask a question about how the DOS box really worked in OS/2 (and Windows, if it worked the same way)? I’m aware that they use the CPU’s V86 mode to execute the real-mode code. But how much of the system’s original “real mode” code is reflected in such a DOS box? Is the system’s native BIOS (i.e. the built in BIOS on the motherboard) the one that is executing inside the V86 task when the program or the DOS-code itself calls BIOS services? And then requests to hardware from this BIOS code (i.e. I/O etc.) is caught by the OS/2 VMM which emulates the hardware in a suitable way? Or does DOS box run with a BIOS-code that OS/2 provides which implements those BIOS services by calling out of the V86 box? Do Win95 and OS/2 work the same in this aspect? If it’s the former scenario, how does that work out with the internal state the BIOS must maintain, when you have multiple DOS-boxes running? Is that “cloned” using a copy-on-write strategy? To me it seems the most easy to implement would be the latter. But I’m not sure, because at least in the Win95 scenario, the DOS boxes you have there seem to reflect the state of DOS before Windows was launched (because in this o/s DOS is fully loaded before Windows launches) and any device drivers etc. that were loaded before Windows were launched is still present (at least that’s how I remember it). In this scenario it seems somewhat difficult to sort of “pull out” the BIOS and replace it with a Win95 “virtual BIOS” because all the device drivers and TSR’s loaded before Windows loaded would see a new BIOS with different offsets etc. On the other hand, it seems it must be hard to write a VMM that can emulate hardware sufficiently for the system’s native BIOS (which wasn’t written with virtualization in mind) to work, because the BIOS’s interface to hardware is not (completely) well-defined and it could access proprietary resources on the motherboard that the VMM wouldn’t know how to virtualize. It makes me think maybe these DOS box’es worked in a 3rd way? 🙂
Would be very interesting if you could write a bit about this or provide some pointers?
The only short answer I can think of is “it’s complicated” 🙂 The VMM strategy in 32-bit OS/2 and Windows 3.x/9x was roughly similar; with Windows 3.x/9x there was the added complication in that the OS itself needed DOS to function. The emulation strategy was ad-hoc and specific to each device. In general the system’s BIOS was used (system and video) and hardware devices were emulated, with the major exception of storage. The VMM intercepted many BIOS/device calls and emulated them on relatively high level (mouse, CD-ROM, EMS, XMS). File access was done on even higher level (DOS) and that way, networking was also provided without needing to load network drivers in the VDM. Devices such as the interrupt controller, DMA controller, timer, or real-time clock were emulated. Some devices were passed through more or less directly to a VDM and the OS handled exclusion. That was the case with audio, for example. Video typically needed hardware-specific drivers for VDM support.
BIOS state was maintained per-VDM. The BIOS data area is fairly well defined, which helps. Another thing to keep in mind is that although the BIOS has all sorts of highly system-specific and custom code that is run during POST, at run-time there is not that much variation. What a BIOS does to read from a floppy or how it scrolls a VGA text mode screen does not differ that much between implementations. Obviously Windows and OS/2 in turn also influenced BIOS design as BIOS writers had to make sure they were compatible with those OSes.
Windows NT (NTVDM) employed a different strategy and supplied its own system and video BIOS. The crucial difference between NT on the one hand and OS/2 and Windows 9x on the other was that NT ran on non-x86 platforms and had to completely emulate the execution environment. The x86 versions did not emulate the CPU but were otherwise very similar.
Unlike the others, OS/2 offered the option to boot a specific version of DOS from a floppy or floppy image. In that case, unmodified version of DOS was used with emulated hardware, but again with the exception of storage. A special driver needed to be installed in the VDM to access drives controlled by OS/2 (otherwise the VDM just wouldn’t see the hard disk).
The architecture of the OS/2 and Windows 3.x/9x VMMs was relatively complex and ad-hoc because it had to provide acceptable performance on relatively very slow hardware (16-20 MHz 386). That more or less dictated how much hardware had to be emulated, how much hardware was passed through directly, and which OS and BIOS calls were intercepted.
Some of the available literature that I can think of right now is DOS Internals and Undocumented DOS, although neither of them is primarily concerned with VDMs.
“Windows NT (NTVDM) employed a different strategy and supplied its own system and video BIOS.”
I am not sure about that one.
@Yuhong Bao
In fact it does. MS NTVDM is based in a very known VM product known as SoftPC from Insignia Solutions. The only difference between retail and MS version was that MS built a vCPU “plugin” for his custom x86 version that allowed SoftPC engine to make use of the V86 mode in x86 mode, being a complete emulation in non-x86 RISC platforms.
Despite that, both versions use their custom BIOSes. that can BOP intro the VM engine for things like Interrupts and such. In x86 you can also use hardware fullscreen in all NTVDM versions, even in Win7 and Vista if you’re using XPDM compatible video drivers, but then is the video driver calling intro system VBIOS Int10, not the NTVDM, which uses his own custom VideoBIOS.
Morten: you need to find a copy of Andrew Schulman’s “Unauthorized Windows 95”. He picks apart Win95 at a nearly-atomic level including how the DOS boxes and 32-bit tasks work (including showing how to modify it so it loads the VMM and then only COMMAND.COM, giving you a preemptive multitasking text mode DOS). Unfortunately it’s long out of print, but tracking down a copy in some manner is essential to answering the questions you have.
OS/2’s VDM was not restricted to just DOS versions. IBM’s standard showcase of the feature used CP/M-86 and someone even got Minix running that way.
For other DOS box methods, there were many magazine articles and whitepapers written between 1989 and 1995 comparing the various solutions and the tradeoffs between performance, memory consumption, stability and inter-process communication.
NTVDM maps all PC`s BIOSes to own address space, at least.
Just try to enable bootrom on embedded network card 😉
I found this when tried to run Master of Magic on PC with enabled bootrom – system could not find appropriate hole for EMS memory.
> Michal Necasek says:
> August 31, 2014 at 10:00 pm
>
> …
I reckon you should flesh that reply out into a proper article. It would be very interesting. Yes, a few people (Raymond Chen springs to mind) could probably offer similarly deep analysis, but many (again, Ray springs to mind) would offer it with a very [my favourite platform]-centric viewpoint.
Your far more platform-agnostic insights, not to mention pretty deep understanding of how the 286 and 386 actually work, would be far more illuminating.
Just my two cents; thus ends ‘OS/2 Museum: By Request’ for this month. 🙂
Thanks for the suggestion, I will keep it in mind. Yes, something like this could be interesting. I know (or can find out) enough about Windows 3.x/9x, NT, OS/2, and plain DOS (EMM386). I know much less about stuff like DESQview, VP/ix or Merge/386.
Thanks for the detailed replies. I had always been wondering about this. But one final thing: what would happen if one installed a DOS TSR (like donkey) prior to loading windows… Would that then be available in all the dos boxes launched from within Windows 95? If yes how do they clone the state of the TSR (copy on write?). If not, how do they “excise” the TSR out of DOS?
Yes, “global” TSRs were available in all DOS VMs. Windows also distinguished between global data (shared across VMs) and instance data (most of the data, VM specific). There were also callbacks that allowed TSRs to deactivate themselves when Windows was starting and things like that.
On x86 there is a registry key called UseRealRoms that controls the behavior, at least in newer versions of NTVDM.