It was recently pointed out to me that a simple “hello world” style application built with Open Watcom C/C++ 1.9 does not run on Win32s version 1.30, even though the same executable runs just fine on Windows NT 3.51, Windows 95, or Windows 10.
More specifically, the program crashes rather early on Win32s. With the help of map files and source code, I established that the crash occurs in an internal function called __setenvp
, which tries to dereference a null pointer stored in an internal variable _RWD_Envptr
.
The _RWD_Envptr
variable is filled in by the GetEnvironmentStrings
API in the C runtime startup code. The GetEnvironmentStrings
API call ends up importing GetEnvironmentStringsA
from KERNEL32.DLL. And clearly GetEnvironmentStringsA
is failing on Win32s, although it works just fine on NT and Win9x.
Further probing revealed that the GetEnvironmentStrings
API has curious history. On Windows NT 3.1, there was only GetEnvironmentStrings
(no A or W suffix). On all later Win32 implementations, starting with NT 3.5, there’s GetEnvironmentStringsA
and GetEnvironmentStringsW
, as well as FreeEnvironmentStringsA
and FreeEnvironmentStringsW
.
On NT 3.1, there was no FreeEnvironmentStrings
, presumably because GetEnvironmentStrings
returned a pointer to existing memory that couldn’t be freed (and would be freed at process termination anyway). On NT 3.5, GetEnvironmentStringsA
converts the strings provided by GetEnvironmentStringsW
and allocates memory for the converted strings, so there is something to free.
A quick experiment with Microsoft Visual Studio 4.0 showed that a test application does run on Win32s; reading MSVC 4.0 runtime source code also revealed that Microsoft calls GetEnvironmentStringsA
and immediately terminates the process if GetEnvironmentStringsA
fails. So… how can that work on Win32s?
Examining the EXE file produced by MSVC 4.0 revealed that it imports GetEnvironmentStrings
and not GetEnvironmentStringsA
. Changing the Open Watcom kernel32.lib import library to make GetEnvironmentStringsA
an alias of GetEnvironmentStrings
made the application work on Win32s. But why?
A closer look at W32SCOMB.DLL shipped with Win32s showed the cause of the odd Win32s behavior. Although W32SCOMB.DLL exports all of GetEnvironmentStrings
, GetEnvironmentStringsA
, and GetEnvironmentStringsW
, the latter two are stubs which always fail, and only GetEnvironmentStrings
with no suffix actually does something useful. That seems like a bug in Win32s—GetEnvironmentStringsA
should have been an alias of GetEnvironmentStrings
.
The mess was most likely caused by a design defect in Windows NT 3.1. The plain GetEnvironmentStrings
function probably should never have existed, only GetEnvironmentStringsA
and GetEnvironmentStringsW
, as is the case with other APIs. Windows NT 3.5 corrected the oversight, but its KERNEL32.DLL still had to export the suffix-free GetEnvironmentStrings
—otherwise almost all existing applications would have been broken.
Win32s tracked the development of Windows NT, therefore it implemented GetEnvironmentStrings
, and initially only that. Win32s version 1.20 (1994) added GetEnvironmentStringsA
and GetEnvironmentStringsW
, but only as dummies. As mentioned above, making GetEnvironmentStringsA
always fail was arguably wrong… but wasn’t noticed because Microsoft’s programs did not use GetEnvironmentStringsA
.
At least up to and including MSVCRT40.DLL, Microsoft’s runtime DLLs only imported GetEnvironmentStrings
. That also illustrates why any reasonable Win32 implementation needs to provide the GetEnvironmentStrings
import and not just GetEnvironmentStringsA
; if it didn’t, quite a few older applications would break because they need the suffix-free GetEnvironmentStrings
.
Win32 SDK Details
As mentioned above, tweaking the kernel32.lib import library is one way to work around the problem with GetEnvironmentStringsA
on Win32s. But that’s not what Microsoft’s SDK does.
Here is how WINBASE.H in the NT 3.5 SDK defined the then-new FreeEnvironmentStrings
API:
WINBASEAPI BOOL WINAPI FreeEnvironmentStringsA(LPSTR); WINBASEAPI BOOL WINAPI FreeEnvironmentStringsW(LPWSTR); #ifdef UNICODE #define FreeEnvironmentStrings FreeEnvironmentStringsW #else #define FreeEnvironmentStrings FreeEnvironmentStringsA #endif // !UNICODE
That’s the usual way of dealing with Unicode APIs. Function prototypes have ‘A’ and ‘W’ suffix, and a suffix-less macro is defined to map to one or the other.
But that’s not how GetEnvironmentStrings
was dealt with:
WINBASEAPI LPSTR WINAPI GetEnvironmentStrings(VOID); WINBASEAPI LPWSTR WINAPI GetEnvironmentStringsW(VOID); #ifdef UNICODE #define GetEnvironmentStrings GetEnvironmentStringsW #else #define GetEnvironmentStringsA GetEnvironmentStrings #endif // !UNICODE
The non-Unicode function prototype has no suffix, and the macro is “backwards”, mapping the ‘A’ function to the suffix-less original. Thus when the Microsoft runtime calls GetEnvironmentStringsA
, the compiler ends up generating a call to GetEnvironmentStrings
instead. This oddity persists to the present day and even Windows 10 SDK headers handle GetEnvironmentStrings
the same way.
Moral of the story? Changing operating system APIs is a messy business.
Raymond Chen discussed the root cause of this, the incorrect export of the ANSI version of GetEnvironmentStrings without an “A” suffix in NT 3.1 (https://devblogs.microsoft.com/oldnewthing/20130117-00/?p=5533). In another post he also discussed a similar problem with functions such as IsDialogMessage that originally lacked separate ANSI and Unicode versions because their dependency on the character set was not obvious (https://devblogs.microsoft.com/oldnewthing/20070103-15/?p=28523). In the case of GetEnvironmentStrings, however, it seems likely that the NT 3.1 developers recognized the dependency but made a mistake trying to address it.
But what kind of mistake? It may have been a typo or something equally mundane. But the mistake may also have been an artifact of an evolving design. It’s pure speculation on my part, but I wonder if GetEnvironmentStrings was (one of the) first functions for which this problem was recognized, and the eventual solution (separate versions, both with suffixes) was not yet fully formed or widely disseminated when the developers had to address it. Given that GetEnvironmentStrings is needed for process startup code, it seems likely it was implemented at an early stage. And once the problem was “solved” for this function, it’s easy to imagine it being overlooked later when the eventual solution was adopted.
Well, Chen is just plain wrong about NT 3.1. There simply is no GetEnvironmentStringsW in NT 3.1, there is only GetEnvironmentStrings. Go have a look, there’s no GetEnvironmentStringsW exported from KERNEL32.DLL or listed in WINBASE.H. The old (NT 3.1 level) Win32 documentation also reflects that and GetEnvironmentStrings is not listed as an Unicode enabled function.
I assume that the original design expected that the environment couldn’t be Unicode since it had to be shared with DOS and OS/2 applications. Then they changed their mind for NT 3.5. How that actually worked in practice I’m not entirely sure, but the API was different.
At any rate, GetEnvironmentStrings is different from most other APIs because KERNEL32.DLL exports all of GetEnvironmentStrings, GetEnvironmentStringsW, and GetEnvironmentStringsA. The headers are presumably done the way they are because Microsoft wanted apps importing GetEnvironmentStrings and not GetEnvironmentStringsA for compatibility with NT 3.1 and Win32s.
Also the way Chen suggest that he would had done things is the way that Watcom did which fails on Win32s.
Also, I somewhat object to “GetEnvironmentStrings is needed for process startup code” that Hayden wrote above.
Environment strings is only needed for the startup code if a decision has been made to A) have environment strings at all and B) to use environment strings as the primary way to give applications rudimentary information like paths to find stuff and whatnot.
From what I found using a quick google it seems like VMS only added environment strings to the versions for the Alpha CPU, which would had been several years later than when Dave Cutler & co left DEC for Microsoft. So it could had been the case that they thought of environment strings as a legacy they had to support for existing DOS and 16-bit Windows 3.x applications, but not something they would encourage 32-bit application to use. And since we obviously didn’t get unicode / wide string support for 16-bit windows and DOS there would from that point of view not be any need for a W version to get the environment variables.
N0w you wounder how an operating system and it’s application would work without environment variables. The way both VMS and AmigaOS solved this was to have aliases that would point to one or more places in the file system. On VMS they are called logicals, on AmigaOS they are called assigns. Both these operating systems anyway use letters to distinguish different partitions/drives, like DOS, but instead of using single letters and mostly in alphabetical order they use multiple letters and digits which also tells something about the device the file system resides on. DKA100 would be a file system on a hard disk in VMS, while it would be DH0 on an Amiga. For every case where an application needs to find certain sets of files, like say include files for a compiler, there would be a logical or assign that points to those files. So for the include files the compiler would simply look in INCLUDE: and either the installation script for the compiler or the user would had created that logical/assign before running the compiler. Since environment variables are mostly used to point to paths this removes the need for all those environment strings while also relieving the application from having to interpret the content of environment strings. As a bonus those logicals/assigns can also be more easily used from scripts, and at least in the case of AmigaOS the default behavior is to actually find the “destination” for the assign and get a lock for that directory. That way you can’t dismount a partition without removing the assigns and as a bonus the assigns will work correctly even if you rename anything in the path to what the assign is pointing to. (The startup script that creates the assigns would have to be updated though). (In later versions of AmgaOS it became possible to only actually look up the destination for an assign either when it was first used or every time it was used. The former of these two options could greatly speed up boot time).
The thing you can’t store as paths are miscellaneous configuration options, and those can be stored either locally in the current working directory or where the application you are running is stored. (Later versions of AmigaOS added something similar to assigns called PROGDIR: which always pointed to where the current running executable are stored). (Also on AmigaOS the default way of storing various small configuration stuff for program that you started from the GUI were so called “tool types”, stored in a file with the same name as the application but with the extension .info added. If you ran things from the command line interface you were supposed to type in any special parameters or run the application through a script).
In hindsight we know that features of VMS were mostly used in the kernel of Windows NT but everything else more or less evolved from the Windows 3.x API and/or in a way that a Windows 3.x programmer would expect things to be.
NT 3.1 stores the environment block internally as 8-bit chars. I have no idea how this wasn’t caught before release. If a file path can have UTF16 characters, why can’t %PATH%? I don’t think this can be explained because of DOS limitations – after all, it’s completely valid to have a long file name in %PATH% which would be indescribable to DOS also.
That’s why it used GetEnvironmentStrings as a pointer to the block with no FreeEnvironmentStrings function. It’s why there was no Unicode form of the function. No Unicode form meant no point having a suffix.
When NT 3.5 came along to clean this up, what it was really doing is storing the environment block as UTF16, which meant that GetEnvironmentStrings has to exist for compatibility. It means it has to copy, and hence, has to free. So the “compatibility” comes with a giant caveat – old programs leak.
What I don’t know is why they bothered to create GetEnvironmentStringsA at all, given there had to be a GetEnvironmentStrings, it had to return 8 bit chars, and it had to allocate and copy. Presumably it was done for symmetry, but as noted here, the headers never used it in the way other APIs do.
Some of this likely reflects the US-centric nature of NT development. The developers tended to not just not deal with Unicode but even with non-ASCII strings. You’re right that if you can have Unicode directory names, you need a Unicode PATH environment variable to represent that.
Yes, on NT 3.5 and later GetEnvironmentStrings leaks without FreeEnvironmentStrings, but probably by far the most common usage of GetEnvironmentStrings is to be called once at program startup and freed right before termination. In such a scenario, the memory leak is purely theoretical.
I can also only guess that GetEnvironmentStringsA was added for symmetry, even if it serves no real purpose and users of the standard SDK headers have to do extra work to even call it at all.
On NT (and UNIX and DOS and OS/2), environment variables are a fact of life. Some versions of the Microsoft C runtime use the environment internally to pass information to child processes even if the program itself does not use the environment at all (though I’m not sure if MS’s Win32 runtime used that). Given the DOS and OS/2 compatibility built into NT, not using the environment was not really an option.
GetEnvironmentStrings works the same way as the Win16 function GetDOSEnvironment so the design was geared to ease the transition from Win16 to Win32. It could not be designed around Unicode since the NT 3.1 was in a recognizable state before Unicode started. Note there was a Win16 GetEnvironment function but it is part of GDI and absolutely nothing like the Win32 GetEnvironment* functions.
The knock I have on GetEnvironmentStrings is the use of a macro to redefine the function’s operation. With most of Windows, when a function was superceded, new functions were created and the old function’s code passed the parameters on to the new function with proper massaging of the results to match what the old function expected.
The Amiga’s “method” of aliases was also used on the Apple IIgs in GS/OS. The ORCA developer tools and GNO/ME relied on a system of standard prefixes to locate common file locations. Kinda stumbled across this when porting aclock to GNO/ME.
Michal, you are quite right about the origin of the problem. I hadn’t done my homework. Consulting the prelease Microsoft Windows 32-bit API Reference (dated June 27 1992) that I have reveals that the return type of GetEnvironmentStrings was originally LPVOID. That documentation also indicates that the function was not originally intended to provide direct access to the variables themselves, but to return a pointer to an opaque buffer that could be manipulated with other functions. I suppose in principle that could work, but I guess in practice it did not (presumably because the buffer wasn’t really opaque at all, and everyone was accessing it directly).
You’re right that Chen’s post isn’t really consistent with this course of events (it is at best incomplete), so I’m sorry I even brought it up.
MiaM: With regard to my claim that accessing the environment is necessary for startup code, I didn’t mean to imply that it was logically necessary, just that it is necessary in a system that uses environments. I was thinking especially of C programs, which must have the environment, if any, available in a global variable. I have no great love for the concept of environments as such.
I’m sorry if this beating a dead horse, but note that NT 3.1’s Unicode support was quite universal. Maybe development was US-centric, but NT 3.1 has UTF-16 for file names, registry keys, window titles, and the rest. Maybe Win16 does need 8 bit chars, but for the rest of the functions, it could use the ANSI wrappers. That was for everything except the environment block. That’s what makes this function odd – other functions had A and W variants from the start.
Ignoring GetEnvironmentStrings completely for a moment, note that NT 3.1 has SetEnvironmentVariableW and GetEnvironmentVariableW when its underlying representation is ANSI. Whereas for other functions A upconverts to UTF-16, here W downconverts to ANSI. Many people had to know that this situation was inherently broken.
One thing I noticed when looking this up is fixing it also brings in CREATE_UNICODE_ENVIRONMENT, because NT 3.1 would expect an ANSI buffer passed to CreateProcessW, so an extra identifier was needed to indicate that a parameter to a Unicode API is actually Unicode. That’s another legacy that lives on to this day.
Frankly, this is the worst defect I’ve found in NT 3.1. Most of the problems are because standards didn’t exist for the hardware it wanted to support, extensively covered in this blog (ATAPI CD-ROMs, MPS, large memory detection, etc.) Some features weren’t finished (DHCP.) But its environment handling is just totally wrong.
Actually Chen’s blog post is very interesting and I hadn’t found it myself. It shows that GetEnvironmentStrings/A/W has been causing confusion for some time and that even people at Microsoft don’t necessarily remember all the details.
Yes, a lot of the trouble with NT 3.1 is simply a side effect of it coming out in 1993. Other OSes from that time also don’t have DHCP or ATAPI or PCI support, simply because it wasn’t there yet.
You’re totally right that even in the Win32 API implementation on NT 3.1, GetEnvironmentStrings stood out. I can only guess that they didn’t fix the API because it a) wasn’t entirely trivial, and b) caused very few problems in practice.
What I meant about US-centric development was that sure, pathnames were all Unicode, but how many people at Redmond were likely to have directories with non-ASCII characters in their PATH?
Side track re early OS:es not having support for then emerging hardware things:
Apparaently at one point in time Yggrassil Plug’n’play Linux had a wrapper to use DOS drivers for CD-ROM. Kind of like NDISwrapper but way earlier.
Source: The excellent Youtuber Ncommander who did a video about this linux distribution.
Novell NetWare did the same thing, it could install from CD-ROM using a DOS driver.
BTW I worked with Ncommander aka Michael Casadevall on reconstructing damaged XENIX disk images, see here: https://www.os2museum.com/wp/what-a-coincidence/
The Dec 1991 beta of NT only shows a single application facing Unicode API: IsWindowUnicode. That would be the beta that matches with the Dec 1991 MSJ issue that announced the planned support of Unicode in NT. In Oct 1992, a second Unicode API is listed: ToUnicode. I haven’t checked all of the beta SDKs but what I remember was that it took a long time before applications could do anything with Unicode.
I’m not familiar with what was included with various betas, although I am familiar with when code changes occurred. Taking CreateFile, it was renamed to CreateFileA in April 1991, and CreateFileW was added in May 1991. This general pattern (bulk rename to A and backfill to W) was happening across the API surface throughout 1991. WideCharToMultiByte and friends arrived in August 1991. It looks like the underlying object manager completely moved to Unicode in Jan 1992. So I’m not familiar with when announcements, public releases or documentation happened, but by Dec 1991 the code was well positioned to support Unicode, although it makes sense that it would be hard to use in an end-to-end way until the conversion was essentially complete.
Note (I’m from file systems, so my bias here shows), NTFS is purely a UTF-16 creature, so a lot of this had to happen to make progress building it. The first commits for it start in May 1991. I can promise from personal experience that building file systems takes time, and a substantial amount of UTF-16 needed to be in place for a long time to enable NTFS to happen.
The transition of the back end to Unicode with implicit conversions to ANSI for user code caused a fair share of problems though a lot of those problems were created by MS’s insistence that applications do error handling. One can’t handle an error that one can’t see.
Pingback: Retro-Porting to NT 3.1 | OS/2 Museum
Hello old discussion 🙂
Looking at this:
#ifdef UNICODE
#define GetEnvironmentStrings GetEnvironmentStringsW
#else
#define GetEnvironmentStringsA GetEnvironmentStrings
#endif // !UNICODE
I think it was done to avoid drawing attention, and for anyone spotting this it might seem like a typo.
The correct way would had been this:
#define GetEnvironmentStringsA GetEnvironmentStrings
#ifdef UNICODE
#define GetEnvironmentStrings GetEnvironmentStringsW
#endif // !UNICODE
but that would had stuck out like a sore thumb.
The question we’ll likely never get the answer to is if this was done so that Microsofts customers wouldn’t notice, and/or if it was (also) done so that others within Microsoft wouldn’t notice?
Nah, there was nothing nefarious. Microsoft didn’t really hide the change, it was documented well enough.
It was simply a design oversight, initially they didn’t realize that environment strings also needed to support Unicode. When they figured it out, it was too late to change NT 3.1.
Although I can’t come up with any reason to do so, the include file won’t allow an application to call GetEnvironmentStringsA when UNICODE is defined. That sure is a bug in the include file, which would be fixed if it would look like I suggested.
If only someone’d come up with UTF-8 earlier, so that ANSI strings’d *also* be valid Unicode…