Recently I had a need to test the behavior of floating-point exceptions (FPEs) in environments where traditional FPE reporting is used.
To briefly recap, in the original PC equipped with an 8088/8087 pair, floating-point exceptions, which are generally asynchronous events, were reported through an NMI. In the PC/AT (80286/80287), IRQ 13 was used instead, but the BIOS interrupt handler executed INT 02 for compatibility with the PC. This method stuck around in AT compatibles.
Using Open Watcom C/C++, I verified that FPE delivery works as expected in a 32-bit extended DOS executable using the default DOS/4GW extender. The run-time library installs a default handler on interrupt vector 2 and unless overridden by the user, a floating-point exception terminates the program.
The exact same executable behaved differently with the CauseWay and DOS/32A extenders—it was as if the exception never happened. The CauseWay and DOS/32A extenders are supposed to be compatible with DOS/4GW, but clearly there are differences… but where exactly?
How Floating-Point Exceptions Work with DOS/4GW
The way FPEs work in protected mode with DOS/4GW is extremely similar to the way they work in real-mode DOS—after all, the environment isn’t called extended DOS for nothing.
The C run-time library hooks interrupt vector 2, using the familiar INT 21h/25h service. When the FPU signals an error, it will be delivered to the CPU as IRQ 13, typically corresponding to interrupt vector 75h. The IRQ 13 handler clears the error, sends an EOI, and performs INT 02.
The default IRQ 13 handler is real-mode code, normally provided by the system’s BIOS. Which means that the interrupt handler for vector 2 is also executed in real mode. Since interrupt vector 2 is not considered a hardware interrupt in DPMI, there’s no automatic pass-up to protected mode. For that reason, DOS/4GW installs its own protected-mode IRQ 13 handler which then calls the protected-mode interrupt 2 handler.
Once the protected-mode handler for interrupt vector 2 is invoked, it eventually returns to the caller (the IRQ 13 handler) or perhaps terminates the program.
Pass-up Interrupts, Bi-Modal Interrupts, and Callbacks
Anyway, what exactly is a pass-up interrupt handler? DOS/4GW uses the terms pass-up and pass-down handlers which may sound strange but the concept is in reality quite simple.
A pass-up interrupt handler indicates a protected-mode interrupt handler with a real-mode stub. Depending on the environment a DOS extender runs in (raw, VCPI, DPMI, etc.) the system may actually switch between real and protected mode, rather than executing real-mode code in V86 mode. Now a hardware interrupt may occur at any time. If the interrupt arrives while the CPU is in protected mode, the protected-mode interrupt handler will be invoked directly. If the CPU is in real mode while the interrupt arrives, the real-mode stub will “pass up” the interrupt to protected mode: switch to protected mode, run the protected-mode handler, and return back to real mode.
A pass-down interrupt is the exact opposite: interrupts are reflected from protected mode back to real mode for handling.
The concept of bi-modal interrupt handlers deserves a brief mention. Bi-modal interrupt handlers are simply two handlers, one protected-mode and one real-mode handler. Whether the CPU is in real or protected mode, an interrupt can be processed without requiring a mode switch. Bi-modal handlers are a performance optimization, useful in situations where low latency is required.
“Callback” is a concept used in the DPMI specification and closely related to pass-up interrupt handlers. DPMI services 0303H and 0304H can be used to allocate and free real-mode callbacks which pass execution to protected-mode code. Such callbacks can be used with interrupts to construct pass-up interrupt handlers, but are also usable for general calls from real to protected mode.
What’s Wrong with CauseWay?
CauseWay naturally supports the DPMI callback functionality, but it also provides automatic callbacks for hardware interrupts, as well as software interrupts 1Ch (timer tick callback), 23h (Ctrl-C handler), and 24h (DOS critical error handler). As a consequence, if a protected-mode application installs a hardware interrupt handler (e.g. using INT 21h/25h), it will be invoked whether the interrupt occurred in real mode or not.
What CauseWay does not do is provide this handling for vector 02, and it does not install its own IRQ 13 handler. Interrupt vector 02 is a bit of an odd duck because it is the NMI hardware interrupt handler, but it is also used as a software interrupt when processing floating-point errors. At any rate, under CauseWay it does not get special treatment, so FP exceptions will trigger IRQ 13 which will be handled by the BIOS in real-mode code in the absence of any other handler. INT 02h will be executed and invoke the real-mode interrupt 02 handler, which will not do anything about the FPE (as the run-time library only installs a protected-mode interrupt 02 handler). The default interrupt handler will do nothing and execution will continue as if the exception never happened.
DPMI Host Differences
As if things weren’t complicated enough, there are (predictably) different between DPMI implementations. This in part stems from the fact that the DPMI specification says absolutely nothing about floating-point exception handling. In fact the DPMI 0.9 specification never even mentions the FPU, coprocessor, or floating point. While DPMI 1.0 does define a few coprocessor-related services, it still says nothing about floating-point exceptions.
Under Windows NT (NTVDM), an FPE always calls the protected-mode interrupt 2 handler. The BIOS is completely bypassed. Under DOS, OS/2 MVDM, and Windows 3.1/9x VDM, the real-mode (BIOS) IRQ 13 handler is invoked if no protected-mode equivalent exists. This will in turn execute INT 02h in real mode and bypass protected-mode interrupt 2 handlers as well. In OS/2, this may have unexpected consequences.
The Fix
Initially, an attempt was made to modify CauseWay to automatically provide a callback for interrupt vector 02, just like it does for maskable hardware interrupts. There is some logic to this since vector 02 is also used for NMIs and those are (non-maskable) hardware interrupts. This works under DOS but unfortunately not under Windows 9x.
The only reliable solution is for CauseWay to install its own protected-mode IRQ 13 handler just like DOS/4GW does. Because this is a hardware interrupt handler, it will be invoked even if IRQ 13 against all odds occurs in real mode; that is the DPMI host’s responsibility.
With an IRQ 13 handler provided by CauseWay in place, floating-point exception handling works with this DOS extender the way the Watcom C/C++ run-time library expects, and it works under all tested environments: DOS with or without a memory manager, Windows 3.1, Windows 9x, OS/2, and NT (Windows XP).
Needless to say, this saga underscores just how rarely floating-point exceptions are used. 99.9% of the time, the FPU runs with exceptions masked and the extraordinarily complex legacy floating-point exception delivery mechanism does not matter.