Stack Checking on OS/2

A while ago I was involved in debugging a seemingly simple yet mysterious problem:

That error message should not have been there

A piece of code (a fairly simple interface DLL) built with the Open Watcom compiler was failing with a bogus stack overflow error. The mystery was that this failure only happened on OS/2 Warp Connect. It didn’t happen on OS/2 2.0 or Warp Server for e-Business (WSeB) or MCP2. And it also didn’t happen on Warp Connect updated to FixPack 40.

That’s weird, right? And getting to the bottom of the faulty stack check was a bit of a journey…

The Watcom run-time library checks the stack for overflow at function entry, unless stack checking is disabled with the -s switch. For executable modules, the logic is simple enough. The run-time is in control of initialization and of creating new threads, so it’s aware of which stack is where (with a caveat explained below).

For dynamically linked libraries (DLLs) it’s bit more involved. The run-time linked into the DLL has no control over what threads already exist in the process when it’s loaded, or what additional threads might be created at runtime.

As it turns out, OS/2 has a handy mechanism that allows each thread to discover where its stack begins and ends. The TIB (Thread Information Block) contains two fields named tib_pstack and tib_pstacklimit, which store the addresses of the stack bottom and top, respectively (NB: The stack starts near the top and grows toward the bottom address). The DosGetInfoBlocks API allows each thread to obtain the address of the TIB.

The Watcom runtime tries to catch the situation where the stack pointer (ESP register) moves below the stack bottom before it happens. The runtime actually does not call DosGetInfoBlocks; instead, it takes advantage of the fact that on process startup, OS/2 loads the FS register with selector 150Bh, and that selector maps the TIB of the current thread. I am not entirely sure where this factoid is documented, but at minimum it is stated in the OS/2 Debugging Handbook – Volume IV System Diagnostic Reference.

When I started debugging the crashes, I could see that for example on Warp Server for e-Business, the TIB contained exactly the expected values for the stack base and limit (i.e. bottom and top). On Warp Connect, it did not. In fact on Warp Connect, the stack base address looked decidedly wrong. Problem solved?

Well… no. Because although the stack base looked wrong, it was lower than expected. That should have caused the opposite problem—instead of spurious stack overflow errors, a real stack overflow would not be detected. And in fact the stack base that the Watcom runtime stored during initialization was higher than the expected stack bottom. Which explained the bogus errors, but where did that value come from?

As it turns out, the stack base was determined right when the DLL was loaded, in the DLL initialization entry point. And that was exactly the problem.

For reasons that are not entirely clear, on some versions of OS/2 the DosLoadModule API switches to a different stack when running the DLL initializers, and that is reflected in the TIB. I could not quickly find this fact documented anywhere, but it certainly happens at least on Warp Connect.

The difference is that on Warp Connect GA, the DLL initialization stack is at a higher address than the regular thread stack, at least for the failing program. On WSeB and most other OS/2 versions, that is not the case.

This led to a realization that the Watcom DLL run-time cannot reliably obtain the thread stack base during initialization. Or rather it can… but it’s the wrong stack.

The obvious solution is to not store the stack base during initialization but rather get it from the TIB whenever it’s needed. Whether that actually works in practice is currently an open question.

What Do Others Do?

While researching the problem, I thought I’d see what the IBM compilers do. Usually that is a reliable way to figure out the “right” way of doing things. The answer was rather unsatisfactory: nothing.

The IBM compiler can (and does by default) generate stack probes which ensure that the stack “guard page” is not skipped and stack memory is committed as needed. But there is no mechanism to check for stack overflows.

Out of desperation I thought I’d check if the old Microsoft 32-bit compiler shipped with OS/2 2.0 pre-releases did anything useful. But it didn’t. Although the run-time does include a __chkstk routine, which performs stack checking in Microsoft’s 16-bit compilers, the 32-bit OS/2 variant only probes the stack in __chkstk and does not make any attempt to guard against stack overflows.

Version Differences, DLL Initialization Stack

As mentioned above, there are version specific behaviors related to loading DLLs. As far as I can establish, if a DLL is loaded at load time (that is, an executable directly imports from a DLL) then the DLL initialization is run using the stack of the main executable’s first (and at that point, only) thread.

If DosLoadModule is used to load DLLs at run-time, OS/2 2.0, 2.1, and even 2.11 SMP simply uses the stack of the calling thread. On Warp 3 or Warp Connect GA, as well as Warp 4 GA, DosLoadModule runs the DLL initialization on some kind of custom stack. On WSeB, IBM reverted to the old behavior used in OS/2 2.0.

Not only that, but the old behavior was also restored in Warp 3 and Warp 4 FixPacks. On Warp 4, it changed in FP7 (June 1998). I have not established exactly when the change happened on Warp 3, but it is known that Warp 3 FP40 went back to the old behavior as well.

I could not quickly find any documentation of this change and its eventual reversal. I can only speculate that IBM implemented the change (switching to a separate stack for DosLoadModule) for a reason, but it is likely that the change had undesirable side effects and IBM reverted it again after 2 or 3 years.

Version Differences, Stack Limits

There is another, unrelated change in OS/2 behavior which affects the stack base (that is, the bottom of the stack) reported in the TIB for the main thread.

Note that any secondary threads are started through DosCreateThread and their stack size is explicitly specified as a parameter to the API call. For the main thread (i.e. thread 1) that is not the case.

The initial SS:ESP is specified in the executable header, as is the stack size. By convention, the stack is located in the data segment, past any statically allocated data.

On WSeB and later, the main thread’s TIB reflects the executable header in a very logical manner: The stack top corresponds to the initial SS:ESP, and the stack base is the stack top minus stack size.

However, on older versions (OS/2 2.x as well as Warp 3 and Warp 4 GA) that is not the case. The stack top in the TIB is reported identically, but the stack base is not. The older OS/2 versions appear to completely ignore the stack size in the executable header and set the stack base to the start of the segment that the stack is in.

That is rather undesirable for stack checking: Typically there is data below the stack, which means that using the information from the TIB will not catch overwriting the data segment… which is exactly what the stack checking tries to prevent!

LX Format Vagaries

When exploring this sub-optimal behavior, a likely reason quickly came to light: Old versions of IBM’s linker (LINK386.EXE) didn’t bother setting the stack size in the executable header at all!

That is, LINK386.EXE always had a /STACK switch which allowed the user to specify the stack size at link time. This influenced the initial ESP location, but the stack size was not recorded in the executable header at all. It was always set to zero.

Older OS/2 version therefore didn’t even bother looking at the stack size in the executable LX header and always set the stack base in the TIB to the bottom of the segment the stack was in. This was more or less guaranteed to be wrong, but it was probably the most reasonable guess OS/2 could make.

The odd behavior can be traced to the tooling change from Microsoft to IBM and switching from the LE to LX format. Microsoft planned to change the LE format and OS/2 so that even the main thread’s stack could use the guard page mechanism, which required knowing exactly how big the stack was.

But this change probably never materialized. There are old versions of the LX format specification which do not show the stack size field (a DWORD at offset 0ACh) in the LX header at all. Newer versions of the LX specification show the stack size field and make no mention that it might not be there, but also clearly state that the stack size in the LX header may be zero (in which case the stack size cannot be determined).

Likewise the EXE386.H header in the OS/2 2.0 Toolkit (March 1992) has no stack size member in the e32_exe struct. In the OS/2 2.1 Toolkit (March 1993) the e32_stacksize member is present and the number of reserved bytes is reduced accordingly.

The behavior of LINK386 changed as well, and at some point LINK386 started recording the stack size in the executable header. Changing OS/2 itself took longer, and happened between the Warp 4 (1996) and WSeB (1999) releases. The new behavior is arguably how things should have always been, except the original LX header definition and the early 32-bit linker didn’t allow it to work.

On OS/2 Warp 4, FP6 (March 1998) introduced the exact stack size reporting in TIB that later appeared in WSeB. The loader looks at the reported the stack size and uses it to calculate the exact stack base. If the executable reports zero stack size in the LX header, the loader reverts to the earlier behavior and uses the base of the segment holding the stack instead.

It should be noted that Watcom’s own wlink also didn’t set the stack size in the LX header for a long time, probably not until version 11.0c. That is unsurprising since Watcom developed OS/2 2.0 support quite early (before the stack size was defined in the LX format), and even though the stack size was present in the LX header since OS/2 2.1, it continued to be ignored by the OS until after the release of Warp 4.

Why Even Bother?

For secondary threads, running out of stack space is normally accompanied by crashes whose source may be more or less easily recognized. The problem child is the main, and often only, thread of a program.

By convention, stack is located at the very top of the data segment, growing downwards. The problem is that at the bottom of the segment, there’s static initialized and uninitialized data, and it’s all writable memory. If the stack erroneously grows downwards beyond its preset limits, it will corrupt some innocent data. Such a problem may be extremely difficult to debug, especially if the stack does not grow uncontrollably (unlike cases of infinite recursion, when the program will crash soon enough) and only causes “minor” corruption, and only occasionally.

Sure, programmers can use a big stack, but no stack is big enough to guard against all possible programming errors. That is why stack checking is useful, because it can prevent a type of error that is otherwise often very difficult (and expensive) to diagnose.

But doing it correctly on OS/2 is clearly not trivial, especially when DLLs are involved!

Stack Checking on OS/2

What Do Others Do?

Version Differences, DLL Initialization Stack

Version Differences, Stack Limits

LX Format Vagaries

Why Even Bother?

Leave a Reply

Archives

Categories