Learn Something Old Every Day, Part XI: DOS Directory Searches are Bizarre

A while ago I started playing with EMU2, a piece of software which calls itself “A simple text-mode x86 + DOS emulator”. It is indeed relatively simple, only emulating an 8086 (or maybe 80186, with little bits of 80286 here and there), but it’s in some ways quite capable, doing a remarkably good job of running at least some text-mode DOS programs (such as the Turbo Pascal IDE).

Spurred by the release of the MS-DOS 4.0 source code, I thought I’d see if I could use EMU2 to build MS-DOS 4.0 on a non-DOS system. It did not go well.

The MS-DOS 4.0 source BAK includes almost everything needed to build, with the exception of COMMAND.COM, which is required by NMAKE to execute certain commands. OK, I thought, let’s just grab an existing MS-DOS 4.0 COMMAND.COM.

Not even a single DIR command worked right. After a bit of tinkering, I got that working, only to find that TREE.COM was hopelessly broken. And I found other programs mysteriously failing, such as Watcom wmake (wmaker.exe to be exact, since the default 386-extended version has no chance of running under EMU2).

In the process I learned a lot about how DOS directory searches work, and what kinds of seemingly crazy things DOS itself does.

Let’s start with DOS 2.x directory searches, using INT 21h/4Eh (Find First) and INT 21h/4Fh (Find Next). Find First takes an ASCIIZ string as an input, optionally including a path; the file name may include wildcards (asterisks or question marks).

The results of the search are stored in the DTA, or Disk Transfer Area, which is located at an address set by INT 21h/1Ah. The DTA will contain the name, attributes, size, timestamp, etc. of the first search result, if any.

The DTA notably starts with a 21-byte data area which is undocumented, and contains information required to continue the search. In essence, this area contains information about what the search is looking for (file name with optional wildcards, search attributes) and what the current position is in the directory that is being searched.

The structure of the information in the first 21 bytes of the DTA differs across DOS versions. It can be also used by emulated DOS environments (such as the OS/2 MVDM) to store information in a format completely different from DOS.

Case One: Watcom wmake

My first “patient” was Watcom wmake. It had trouble finding any files at all under EMU2. I quickly established that it does something that works fine on DOS and in most if not all emulated DOS environments, but not in EMU2.

Namely, the Watcom run-time library sets the DTA address (using INT 21h/1Ah), and calls Find First (INT 21h/4Eh). Then it copies the first 21 bytes of the DTA to a different memory location, sets the DTA address to point at the new location, and calls Find Next (INT 21h/4Fh).

This broke down in EMU2 because EMU2 ran the host side search during Find First, and associated the search with the DTA address. When the Find Next came, the DTA address was different, and EMU2 said, in so many words, oops, I have no idea what you want from me.

So I tweaked EMU2 to store a unique “cookie” in the DTA, and use that cookie when locating the corresponding host-side search. That was enough to make wmake work.

Case Two: Microsoft NMAKE

Then I started trying to build the MS-DOS 4.0 source code. As one might imagine, NMAKE runs quite a lot of directory searches. And EMU2 started failing because it kept running out of host-side directory searches (the maximum was defined as 64, and it seemed like supporting that many simultaneous searches should be more than enough).

This problem with searches is one that people have encountered in the past, likely many times. When DOS drives aren’t local FAT-formatted disks, searching gets tricky. Often the “server” (networked or not) needs to keep more information than fits in the 21-byte area, potentially lots more.

In the case of EMU2, the host-side search is completed in the Find First call, and results are returned on successive Find Next calls. If the directory is large, there may be quite a bit of data that needs to be kept in memory on the host side.

The problem is that although DOS has Find First and Find Next calls, it has no Find Close call. In other words, searches are started, continued, but not ended. Or not explicitly.

Solving this problem requires heuristics (especially in light of Case Three). If a DOS program issues Find First with no wildcards, there will be zero or one result. The search is effectively over by the time Find First finished, because there will be no further results. Thus the host-side data can be freed right away.

When a DOS program uses wildcards, things get a lot more complicated. Assuming that the search found multiple results, the host-side information obviously needs to be kept while DOS repeatedly calls Find Next.

The host-side state can be more or less safely discarded when Find Next reaches the end of the search results. In most cases, wildcard searches will in fact keep calling Find Next until the results are exhausted. But not always.

A well working solution appears to be a heuristic which more or less intelligently decides when searches are completed, combined with some sort of LRU approach to discard old searches.

Case Three: MS-DOS 4.0 TREE.COM

After improving the logic discarding host-side search state, I was able to build most of MS-DOS 4.0 with EMU2. This included TREE.COM. When I tried running the freshly build TREE utility, bad things happened.

Fortunately I had the TREE.COM source code, and I was able to understand what was going on. What TREE does is… interesting. To recap, TREE prints a “graphical” directory tree that looks like this:

   ...
├───DOS
├───H
├───INC
├───LIB
├───MAPPER
├───MEMM
│ ├───EMM
│ └───MEMM
├───MESSAGES
├───SELECT
└───TOOLS
└───BLD
├───INC
│ └───SYS
└───LIB

When displaying a line of text with another directory item, TREE needs to know whether it should print an I-shape or an L-shape character. An L-shape is printed for the last entry in a directory, and an I-shape is printed for all other entries.

And here’s where TREE is very tricky, and does something that one might not expect to work at all. For example, when it prints the line for DOS (at the top of the diagram above), it already called Find Next and found the DOS directory. TREE then saves the DTA contents (those 21 bytes) on its internal directory stack, and calls Find Next again. In this case, there is another result, the H directory, so TREE prints an I-shape and goes into the DOS directory to find out if there are any sub-directories.

When TREE gets back to where it was (in this case quickly, since no sub-directories exist), it restores the saved DTA contents and runs Find Next again. And as before, Find Next has to return the H directory as the next result.

In other words, TREE saves and restores the DTA contents in order to rewind the search to a previous position. Depending on the directory tree depth, there may be a number of simultaneously active searches, each for one level of the directory hierarchy.

Understanding the inner workings of TREE.COM necessitated further changes to EMU2. The information stored in the DTA needs to include not only an identifier associating the search with host-side state, but also a position in the host side search. That way TREE can successfully return to a previous position.

Note that this does not break the heuristics for freeing host-side state. TREE only ever rewinds the search back by one entry. Thus when Find Next reaches the end of the search results, the host-side state can be discarded. Although TREE may rewind the search, it only needs to call Find Next again to see that there are (still) no further results.

Case Four: MS-DOS 6.0 COMMAND.COM

Since I had the MS-DOS 4.0 COMMAND.COM working reasonably well, I thought the COMMAND.COM from MS-DOS 6.0 would work too.

Not so. The DIR command only returned one result, which necessitated further digging.

For reasons that aren’t terribly obvious to me, COMMAND.COM uses FCB searches to query directory contents. Now, FCB searches work a little differently from “normal” directory searches.

INT 21h/11h (Find First FCB) takes an unopened FCB (File Control Block) as input. The FCB contains the file name (possibly containing wildcards) and searches the current directory. The search results are then stored in the DTA (again set by INT 21h/1Ah).

EMU2 originally associated the host-side search state with the DTA address for both FCB and non-FCB searches. While implementing fixes to cases One to Three, I saved the search state in the DTA.

But then I found that MS-DOS 6.0 COMMAND.COM calls Find First FCB, which places the first search result into the DTA, and then calls INT 21h/47h (Get Current Directory)… which happens to place the result into the same memory area that also holds the DTA.

This led to a facepalm moment when I read through the MS-DOS 4.0 source code in order to understand how FCB searches work. It turned out that the search continuation information is not placed into the DTA at all, but rather into the unopened FCB that is passed as input to both INT 21h/11h (Find First FCB) as well as INT 21h/12h (Find Next FCB). If only I had read the RBIL more carefully, the information was right there.

Not too surprisingly, it turned out that the magic 21-byte search continuation state area is actually the same for FCB and non-FCB searches, and its layout is derived from the unopened FCB format, going back to DOS 1.x.

One way or another, the search continuation area includes the input filename (possibly with wildcards) and attributes, to know what it’s looking for. It also needs to contain the cluster number of the directory being searched, as well as position within the directory.

What’s Reasonable and What Isn’t?

The design (or lack thereof) of DOS directory searches raises an obvious question: Just how long are directory searches valid?

The answer is “as long as the directory being searched remains unmodified”. As long as a DOS process doing the searches has control, it can (but of course doesn’t have to!) ensure that directory contents don’t change. As soon as the DOS EXEC call is invoked to start a new process, it can’t be assumed that directory contents remain static.

In practice, searches remain valid indefinitely, because DOS makes no effort to ensure that directory searches are invalidated even when directory contents change. However, the term “valid” needs to be understood to mean that Find Next can be called, not that Find Next will return sensible data. It is easy to imagine a situation when a search is started, the directory is deleted, and the clusters it used to occupy are replaced with completely different data. Find Next will then attempt to process random file data as a directory, with unpredictable results.

That’s not even considering a situation when a program modifies the search continuation information (in the DTA or FCB). DOS has no way of preventing that, but fortunately DOS programs have no incentive to do so.

Redirectors (whatever method they use to interface with DOS) are thus in a difficult situation, because they often need to associate some “server side” state with a search (necessitated by the fact that the 21-byte search continuation area is likely not large enough to store all required data in it). Since there is no “Find Close” API, redirectors are forced to guess when a search is done.

An approach using LRU logic may not be sufficient. When performing recursive directory searches (such as TREE.COM), a program may start searching the root directory, descend into a sub-directory, and search hundreds or thousands of nested directories before returning back to the top-level directory. The search of the top-level directory must remain active on the server side the entire time, without being discarded.

That’s where heuristics help, because they greatly reduce the number of open searches, thus making it much more likely that still-active searches won’t be inadvertently recycled.

Fortunately well behaved DOS programs do not actively try to break such heuristics, and although they may do surprising things, the list of such surprises is (probably!) not endless.

This entry was posted in Development, DOS, Undocumented. Bookmark the permalink.

53 Responses to Learn Something Old Every Day, Part XI: DOS Directory Searches are Bizarre

  1. Roger says:

    I worked on a SMB server (think alternative to Samba) way back when and had to deal with this issue. We were mapping to a Unix filesystem (case sensitive) and also had to deal with long and short filenames. The protocol included a per file entry blob so the client could resume from any entry of any previous listing. IIRC the final solution was reading all the directory contents into memory first which was necessary to generate short names (eg needing ~1 and ~2 suffixes), and sorting entries into a deterministic order. That allowed any resumption at any time.

    What I do remember is that there were so many corner cases, as we had to test with 16 bit and 32 bit DOS and Windows programs. I was forced to rewrite the directory listing code *four* times!

  2. blacksheep says:

    I may be wrong with this, MSDOS couldn’t manage asynchronous disk IO so the directory structure should always remain static during any given read operation including directory listing. That makes it a valid assumption for the computing environment of the time. The x86 CPU was strictly single threaded, in-order execution with all software running on bare metal as we say today. No emulation layers, virtualization, or containerization to fudge with the likelihood of IO race conditions. On a modern system, this assumption is a race condition waiting to happen, but back then… not so much.

    Sure, I can imagine how to break that assumption even on MSDOS running bare, as could anyone else if they took a short time to do so. I just doubt anyone would purposely screw with their system that way. It was a different time.

  3. Joshua Rodd says:

    Somewhat a similar problem to the “FCBs are never closed” problem, with SHARE.EXE or various MVDMs taking similar approaches to a solution. I really don’t know why DOS 2.0 didn’t take a more definitive stance to slam the door shut on FCBs.

    Incidentally, one of the things most novices trying to write a DOS clone get wrong is how to handle FCBs.

  4. MiaM says:

    With 21 bytes it should be possible to store the search term, the result number and a pointer to the sector on disk that the directory starts at (or whichever unique way each directory can be referred to, disk-inode for *ix and so on) even on servers with multiple terabyte size partitions.

    That way it’s just a performance issue to decide what search results to cache. Even if everything is flushed, it’s still possible to come back and redo a search in directory #12345678 to return search result #42 or so.

    Preferably the 21 bytes would be used as similar to how MS-DOS uses it for local partitions.

    This probably requires a not insignificant rewrite of the search/next function in EMU2 though.

  5. Yuhong Bao says:

    FCBs with >32MB partitions were one of the problems of DOS 4.0 I believe. DOS 5.0 changed the handling of them to fix the problems.

  6. Michal Necasek says:

    Thanks, this is nice to hear! I’m not surprised that rewrites are needed. Reading the DOS API documentation gives people no idea what existing applications actually do.

    What EMU2 does is quite similar, it reads the entire host side directory and then generates short names — I guess that’s the only way to do it because you don’t know if you have to generate special mangled short names until you know if there are collisions on the “normal” 8.3 names or not. As in when you see “foo.c” you can map it to “FOO.C”… but if there’s also “Foo.c” then extra work is needed.

  7. Michal Necasek says:

    Yes. A DOS program did not generally need to worry about some other process changing the disk contents behind its back. But on the DOS API level, there was also no attempt to ensure that users could not mix API calls to search a directory with calls to change the same directory’s content.

    But as you say, people understood what not to do, and didn’t deliberately try to break things. It was hard enough to get a DOS based environment working reliably as it was.

  8. Michal Necasek says:

    Yes, it’s very similar, for the same reasons. There is actually an “FCB Close” call, but especially when reading files I don’t think applications need to call it at all, and conversely I don’t know to what extent calling “FCB Close” really means that an application is done with an FCB.

    I’m not surprised people get FCBs wrong, because they are completely backward when viewed from the perspective of modern API design. Where “modern” means 1970s UNIX.

  9. Michal Necasek says:

    No. As a previous comment says, you may need to generate unique short names, and that cannot be done piecemeal because you don’t even know if you need to do that until you’ve read the entire directory.

    I’m also not aware of any mechanism in common operating systems that would allow you to map an arbitrary (say) 32-bit number to a directory. Sure if you have an UNIX-y OS, there’s a 1:1 correspondence between inodes and directories… within a single file system. But you can have lots of those. And they can dynamically change.

  10. Michal Necasek says:

    I don’t think DOS 2.0 had any chance to ditch FCBs. “Here’s a new DOS, and by the way none of your old applications will work” does not sound like a winner. And once they had FCB support working, I doubt there was a huge incentive for removing it later. I’m sure they thought about it many times, and always found that ditching FCB support would have caused more problems than it solved.

  11. Derek says:

    The flaw with DOS 2.0 was allowing FCB calls to work in the current directory.

    MS should have only allowed them to work in the root directory, and required the handle calls for any other directory. That would then have forced a switch to the handle API, while retaining compatibility for existing programs.

    (Apparently searches for volume labels always ignore the current directory, and look at the root directory).

    However that wouldn’t have helped wrt the lack of a “Find Close” call.

  12. MiaM says:

    Well, you could read a full directory listing, generate short names, and just return a specific entry in that list, and then throw the list away. Repeat for each call to next. The only real reason for doing any cache within EMU2 would be to avoid the overhead of calling the underlying operating system – the actual caching is handled by the operating system though.

    Also, I can’t see any case where a DOS application relies on the generation of unique short names acting in a specific way, except that it has to be done the same way each time. So in your case, FOO.C could just be FOO.C while Foo.c could then become FOO~1.C.

    The only possible problem is if Foo.C first exists, is translated to FOO.C, and then later someone also adds FOO.C (outside EMU2 or any other 8+3 style thingie), which might end up making the newly added file not have it’s name translated while the existing file gets it’s name translated, changing things. But if a user adds FOO.C to a directory already containing Foo.C then they kind of have themself to blame for any problems. A useful feature for EMU2 and other interactive software that has to do 8+3 conversions would be to have some sort of log window that informs the users of the existence of two files with the same name but with different case.

    Obtaining some sort of reference ID for a particular directory would of course be done differently on different operating systems. Seems like this might be a way to do it in modern Windows:
    https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-file_id_extd_dir_information (click my name for a clickable link). This is a 128-bit reference, so would use 8 of the 10 available bytes after 11 has been used for an 8+3 name/extension (+wildcard) combination, which is on the edge of too little to both contain a file result counter and a reference to which filesystem it refers to. But on the other hand, the 8+3 part can’t contain all 256 character codes so that part can be packed. Can’t remember the details but I would assume that it can contain most of the characters in the range 33-95 and some of the ones higher up, in particular the upper case versions of various international characters and whatnot. Given that EMU2 runs on a modern computer there is no problem with using multiplication and division to pack this data. Would likely only save something like 11 bits or perhaps 12 or even 13 bits, but still.

    And sure, there will likely be some system where EMU2 or similar programs can’t use this method

  13. MiaM says:

    Derek: It would likely had been severely unpopular with users if they had to have all data for existing programs in the root directory. Combine that with the fixed limit of files in the root directory and people might had hesitated buying an XT with a hard disk and DOS 2.x rather than a PC without a hard disk and DOS 1.x

  14. Derek says:

    MiaM — Nah one could combine it with the SUBST command, if it had been implemented then. Or an equivalent to the CP/M “floating drive” (or the “load drive”) facility.

    That way users could have still used dedicated directories, appearing as drives, on a hard disk. The FCB APIs could still have worked.

    Moreover, s/w was being replaced/updated so quickly in that period (the upgrade churn), that the need for such would have quickly vanished.

    That said, even in 90’s using DOS 3,5,6 and DRDOS 5,6 we still made use of SUBST as it was convenient.

  15. Michal Necasek says:

    You can use all 8 bits in DOS file name searches, so you can’t just recycle some bits there. And you also need to remember the search attributes. I just don’t see how you could pack everything else into about 10 bytes. For DOS it works because one byte identifies the drive, two bytes identify the directory cluster, and two bytes define the position within the cluster. The rest can be read from the disk.

    The question also is not “is there a unique number that corresponds to a given directory” (there likely is) but “how do you find the directory given said number”. And from what I see, not every file system necessarily even provides such numbers. Then what?

    I’m also not sure how you imagine your method would perform. For every “Find Next” call, you’d have to search the entire host directory, sort it, generate short names, return the N-th result, and then throw it all away? And then repeat the exercise from scratch for the following “Find Next” call? That could get real slow real fast with bigger directories.

  16. Michal Necasek says:

    I think the “problem” was that adding support for directories (as in making the FCB system calls work in the current directory) was actually easy. So they did it. And for users I’m sure it was much easier than SUBST or APPEND, though especially SUBST was quite useful even without FCBs.

  17. Chris M. says:

    @Michal Necasek Apple did exactly what you stated with their Apple Filing Protocol in 1986. They assigned a unique 32-bit catalog number ID number to every file and directory on a file server and the protocol communicates exclusively with them over the line as opposed to filenames. In theory you could have two identically named files in a directory with different CNIDs. This was also handy for supporting Macintosh file aliases (basically symbolic links).

    Apple built and sold AppleShare clients for DOS, so even they had to figure out the directory searches!

  18. Richard Wells says:

    Of course, this was much simpler under CP/M. All BDOS calls including find first and next passed through a single buffer. Calling any other BDOS call will effectively end the find sequence. Having the entire OS plus data structures squeeze into 8K does force some compromises.

    FCB support was dropped with the development of FAT32. There were still problems caused by the absence of FCBs and that was about 15 years after the development of file handles. It was sometimes recommended to keep a small FAT16 partition as part of a drive mostly formatted with FAT32 just to handle any programs requiring FCBs.

  19. Chris M. says:

    FCBs were causing problems even before that. I had some older database applications I had to support that relied on FCBs, and the developer gave TONS of warning that you really shouldn’t go beyond running the software on anything newer then DOS 3.3.

  20. Michal Necasek says:

    I doubt that newer DOS versions didn’t work with FCBs… but there were likely additional possible complications. Judging e.g. by this.

  21. Rich Shealer says:

    @Derek – Even in the year 2024 I use the SUBST command to shorten long path names to my Visual Studio projects.

    C:\USERS\Rich\Documents\Projects\VS\A-E\A\Acme\Coyote

    Becomes J:\ allowing more folders in the project as needed without running into the directory length limits.

  22. Richard Wells says:

    The main problem with later versions of DOS was the limited number of FCBs allowed by default. Combine that with a multitasker and the system might need a huge number of FCBs without clear guidance in the documentation on how to get enough.

    There was also a change to FCB support with DOS 5 and 6. https://hwiegman.home.xs4all.nl/msdos/74932.htm I can’t imagine how it would adversely affect any real program but there are enough programs out there that I can’t rule out one requiring the DOS 4 or earlier FCB behavior.

  23. Yuhong Bao says:

    AFAIK part of the point of changing the FCB implementation in DOS 5.0 is that SHARE would no longer be required to be loaded.

  24. MiaM says:

    Would it really get slow if EMU2 has to call APIs to get a directory listing from the host for each call to search/next? Sure, calling those APIs would most likely cost context switches, but otherwise the actual data ends up in the host OS disk cache after the first call.

    Re name: Sure, every bit it used, but not every of the technically possible character codes. For example you wouldn’t search for file names containing a line feed character. And more importantly, since DOS is case insensitive you can just convert all lower case characters to upper case. Seems like for CP437 there are 176 allowed characters (if we disallow lower case versions of the non-US-ascii characters in CP437). That is about 69% of all technically possible characters, and thus 8+3 would fit in slightly less than 8 bytes if some math using modulo and whatnot is used to pack the data.

    (Let’s not forget that the system spec for EMU2 is minimum 1GHz CPU, so any packing of the data and any directory listing sorting and generation of 8+3 names from long file names would run faster than what the find/next DOS APIs ran back in the days when the software people would run in EMU2 was written. Also although SSDs are faster than mechanical disks, there is still a 30 times (or so, source: first random google search result) speed difference, so reading a directory listing the first time will take about 30 times longer time than subsequent reads when it’s already in the host OS cache).

    And sure, I’m one of those who really hate code that are slower than it has to be, but in this case the only alternatives that will never fail is either what I suggest or to cache at least some meta data for every search that could possible have further calls to next.

    Also, first google search result, pointing to a Microsoft page, says that context switches costs about 5µS. (Admittedly I didn’t read what hardware they are referring to, or even if they are referring to Windows or some of their hosting platforms or whatnot. But still, 5µS is very little comparing to how much time search/next took back in the days on the machines that EMU2 emulates).

    I think that just throwing away possible ongoing searches with some heuristic algorithm has a strong touch of DOS 1980’s, i.e. “let’s try fiddling around with things until it seems like it works”.

  25. John Elliott says:

    I ran into some of these issues reverse-engineering LocoLink, an interface which allows an Amstrad PCW (which uses a CP/M filesystem) to act as a file server to a PC running either the LocoScript word processor or a Win16 document converter. Both versions do it by implementing just enough of the INT 21h API to support the functionality they need, with annoying differences between the versions (such as exactly which bits of the FindFirst/FindNext data get shared across the link). Like TREE, the Win16 client does a depth-first search of directories so I had to maintain multiple open searches identified by cookies.

  26. John Elliott says:

    Even under CP/M you can get FCB weirdness – I remember writing a CP/M emulator and finding that one of the standard Digital Research utilities would happily call F_CLOSE on an FCB it had never opened. And another would poke about in the reserved FCB fields to set the file pointer (presumably so it would work on CP/M 1, which didn’t have functions to do this).

  27. Nix says:

    MiaM: You’ve suddenly turned an O(n) directory read via repeated FindNext calls into an O(n^2) one. Those blow up very, very fast…

    FYI, supporting programs that did things like this was one of the justifications for the existence of the seekdir/telldir operations in Unix (programs like… stateless network protocols like NFS). If you want to see a Unix/Linux filesystem hacker cry, ask them about seekdir/telldir (particularly given that these things *are* meant to work under concurrent modification: the interactions with rename() are especially cursed).

    I note that bugs were found about fifteen years back on, oh, all the BSDs (it predated their split!) which had caused seekdir() after unlink() in big directories to return the wrong entry — since *1991*. These functions are not just cursed nightmares to implement, they are hardly used… and then this sort of thing pops up and I’m reminded that using their close analogue was routinely used in DOS.

  28. Joshua Rodd says:

    Off-topic, but has anyone else reviewed the MT-DOS 4.0 documentation from the “Ozzie drop”? It’s very clear MT-DOS 4.0 was the predecessor to OS/2 1.0. But in an even more bizarre fashion, MT-DOS 4.0 seems to have the memory management infrastructure from Windows 1.0 ported over but ready to use as a general purpose operating system.

    It makes the origins of the 1.x DOS Compatibility Box a little more obvious, if you view OS/2 1.0 as basically a protected-mode version of MT-DOS 4.0, much how Windows 3.0 was a protected-mode version of Windows, so programs didn’t need to bother with the real-mode Windows memory management anymore.

    The obvious open question is how Microsoft internally let MT-DOS 4.0, Windows, and OS/2 1.x get fractured. There was clearly a plan to keep them unified.

    Overall, the MT-DOS 4.0 plan seemed to be:

    – Provide an API so that a properly written program could run on plain DOS 3.x or MT-DOS 4.0 with either a very simple recompile or the exact same binary. This survived in the “family API” on OS/2 1.x+.

    – Encourage developers to use MT-DOS 4.0’s memory management so they could take advantage of more memory in a future, 286-enabled version.

    – Maintain full compatibility with 8086 machines. MT-DOS 4.0 runs on an 8086. Given the tight RAM constraints and how slow a typical 8086 was, this was not a worthy goal in retrospect.

    – Provide a pathway to a future protected-mode 80286 based operating system which could continue to run the same binaries.

    Notably, none of this took the “DOS Extender” approach. OS/2/NT didn’t take the DOS Extender approach either, but Windows 3.x/9x did. In retrospect, the DOS Extender approach worked better since people with plain old DOS but a 286 machine could run programs which needed more RAM without having to bother with a new OS.

  29. Michal Necasek says:

    I thought the MT-DOS connection to Windows and OS/2 was well known…

    Yes, MT-DOS (the “released” versions) used the same NE executable format as 16-bit Windows and OS/2. Windows 2.x (I think it was new there) actually had fairly elaborate EMS support, which plugged into the Windows memory management. But of course that was not a great fit for the 286.

    From the beginning Microsoft also made noises about an integrated DOS/Windows product. This was supposed to have happened many, many years before Windows 95.

    How everything became so fragmented… good question! It is known that Windows 2.x was supposed to be the end of the road, to be replaced by OS/2 and Presentation Manager. I strongly suspect that DOS 3.x was similarly considered a legacy product. From DOS 3.0/3.1 to 3.3 there was very little development. Microsoft was clearly focusing on OS/2 in the period of 1985 to 1990 and MT-DOS and Windows became unimportant.

    I strongly suspect that MT-DOS was so constrained by the 8086 architecture that it was not a viable product. Of course people said the same thing about OS/2 and the 286. MT-DOS was used for Microsoft Network servers and it probably was fine on a dedicated machine. It is quite interesting to consider a unified product strategy where MT-DOS and OS/2 could run the same console applications, and Windows and Presentation Manager ran the same GUI applications. That never happened, probably in part of IBM’s own demands and product strategies. Whether it would have been even workable… who knows.

    Now if you think about it, in late 1988 Microsoft already started working on yet another next generation operating system (NT). So Microsoft had a DOS goldmine, but new development kept focusing on future operating systems that were so future that were barely usable on contemporary hardware. With Windows it took 5 years (1985 to 1990) for the hardware to catch up. With NT it also took years.

    I don’t see any brilliant strategy there — on the contrary, a lot of flailing, with a constantly changing product strategy, products delivered years late… but all backstopped by a huge fat cash cow (first DOS, then Windows) that made failures easy to recover from.

  30. Yuhong Bao says:

    There was a reason why I talked about Multitasking DOS 4.0 and WINOLDAP.

  31. Yuhong Bao says:

    (Later on with OS/2 this problem was handled by just banning all use of WINOLDAP altogether)

  32. Richard Wells says:

    Digital Research had many multitasking APIs over the years that got little use. Designing an API after the fact that requires rewriting code that already runs on the underlying OS in a way that gets developers excited is a difficult challenge. Every systems developer underestimates just how lazy application developers are.

    MT-DOS should still have provided decent speed on the targeted 8088 hardware. Excel managed to match 123 on the same machines using a similar memory model. Not that anyone was going to redesign programs around segments instead overlays or simple giant blobs of code for the 10% of the market that wanted always running tiny business applications to track phone calls without all the problems of TSRs.

    If IBM didn’t want the MS design or later MS couldn’t keep using the IBM design, it made sense to scrap the existing design and replace it with a similar design that corrected all the flaws found in the earlier attempts.

  33. Yuhong Bao says:

    “It is quite interesting to consider a unified product strategy where MT-DOS and OS/2 could run the same console applications”
    AFAIK OS/2 Family API already existed.

  34. Richard Wells says:

    Family API and MT-DOS had rather different goals and different end results. Family API applications under DOS tended to be monolithic designs that weren’t friendly with overlays or other memory usage reduction techniques. MT-DOS could discard all unneeded code segments and run the program using only the currently active code segment. OS/2 did that for all programs automatically so the Family API’s weaknesses were ameliorated. That also would have killed off MT-DOS for OS/2 since 90% of the work specific to MT-DOS was incorporated in the OS/2 kernel and not needed anymore.

  35. vbdasc says:

    @Richard Wells:
    “There was also a change to FCB support with DOS 5 and 6. https://hwiegman.home.xs4all.nl/msdos/74932.htm

    It’s an interesting question what exactly did MS change in DOS 5 about FCBs. As far as I personally understand things (with info gathered from various sources):

    DOS 1: FCBs are the native and only way to access files. No meaningful limitations.

    DOS 2. FCBs are second-class citizens, behind file handles, but are still very much supported without meaningful limitations. The FCBs do contain certain internal filesystem info, but their size is sufficiently large for this, because DOS 2 only supports FAT12 which is rather simple.

    DOS 3. FAT16, network files and SHARE are introduced. This spells problems for FCBs. Microsoft still finds a way to cram the needed internal filesystem info into the FCB reserved area for FAT16 (by the way, the internal FCB format changes with every version of DOS!). Alas, this is impossible for remote files. So, SHARE is called to help. But the memory that SHARE can use is limited. So, the FCBS= parameter is introduced to pre-allocate a certain number of buffers in SHARE, each of which can hold the info for one FCB-accessed file, whether remote, FAT16-local or FAT12-local. By the way, FCBs also now each contain a link to an SFT entry, so ultimately, FCBs and file handles consume resources from the same pool. Effectively, now local files can still be opened with unlimited number of FCBs if SHARE is not loaded, but otherwise FCBS= goes into effect and limits that number.

    DOS 4 (and 3.31). BIGFAT16 is introduced (>32 Mb partitions). FCB size is now insufficient to contain all the needed per-file filesystem info, so to access files on such volumes with FCB, SHARE is needed, just like with remote files.

    DOS 5. MS realizes that informing users that SHARE is needed to access their files with FCB is confusing and even scary (at least to me it was, years ago) and moves the FCB info buffers from SHARE to the DOS core. Also, the second parameter of FCBs disappears, for reasons unknown (maybe to nudge developers and users to stop using FCBs?)

    DOS 7 with FAT32. Microsoft decides to not enlarge the FCB info buffers again, and therefore to not support FCBs on FAT32.

    Unfortunately, this picture is only a speculation, at least until the DOS 5 source code surfaces (I know that DOS 6 source code leaked, but I’m too lazy to try and find it).

  36. Michal Necasek says:

    DOS 2.x handles FCBs more or less exactly the same way DOS 1.x did, with the addition of directory and hard disk support (but hard disk vs floppy makes no difference to FCBs there). As you say, FCB support in DOS 2.x is native and whatever weird things worked in DOS 1.x should still work in DOS 2.x.

    In DOS 3.0, FCB support was significantly reworked to support networking. The low level file I/O does not deal with FCBs at all, likely so that redirectors would not have to deal with FCBs (which they probably can’t, really). So starting with DOS 3.0, FCBs are mapped to SFs (System Files). There’s an LRU algorithm that recycles old FCBs. This is used even when SHARE is not loaded.

    From what I remember, the core FCB support logic did not change all that much between DOS 3.0 and DOS 6.x, though there was tinkering around the edges with SHARE and such. The way it works since DOS 3.0 is that file I/O always goes through the SFT, always. The trick is that in the absence of SHARE and networking, DOS is typically able to re-generate the SFT entry from the reserved FCB fields, even if the SFT entry was recycled in the meantime.

    With SHARE and/or redirector loaded, FCBs cannot be re-generated from the data in the FCB alone, and if the corresponding FCB is recycled, FCB I/O will fail.

  37. Chris M. says:

    Dug up what database program it was. It was called “Nutshell”, originally developed by Leading Edge Computers, and eventually evolved into FileMaker Pro. The readme file strongly recommended that you load SHARE.EXE and set a value for FCBS= in CONFIG.SYS on (Compaq) DOS 3.31 and later, particularly if you had a FAT partition greater than 32MB.

    In 1994, Fair Haven Software (who took over development after FileMaker was spun off) made the bold claim that FCBs were likely to be dropped post MS-DOS 5.0 in a readme for a replacement product called “Ultra-Plus”, which used modern file handles. Clearly they hadn’t tested MS-DOS 6.0 compatibility yet. They also did not want one running Nutshell over a network or under a multitasker.

    Whats weird is that the program was originally released after MS-DOS 2.0 was released, so it should have supported file handles to begin with!

  38. vbdasc says:

    Well, this is all logical, because under DOS 3.31 or 4.x, if you want to use FCBs in a >32 Mb volume (BIGFAT), you MUST load SHARE, and consequently, put a FCBS= line in your CONFIG.SYS file. Under DOS 5 or higher you do not need SHARE, but still need FCBS= .

    Database programs can be remarkably conservative. I know of a database program for DOS that was still being developed until 10 years ago or so, almost until the release of Windows 10.

  39. MiaM says:

    Wasn’t filemaker pro something that was mostly used on Mac?

    I.E. the PC/DOS version might had been the ugly duckling sort of, that they didn’t put as much effort in as the mac version?

    (Also internationally I would say that Filemaker is a bad name, as I would think that most people who aren’t native English speakers only associate “file” with the type of files you have in your file system in your computer, and absolutely not the type of files you have in your filing cabinet. (For example in Swedish we use the word “fil” for either a file on a file system, a lane on a road or particular diary product that doesn’t have a name in English, while the type of file you have in your filing cabinet would be a document or an “akt”, which is also the same word as for acts in for example a theater play)).

  40. Yuhong Bao says:

    AFAIK what changed in DOS 5.0 was that a “reserved” field in the FCB was changed from 16-bit to 22-bit to eliminate the 32MB limitation.

  41. Chris M. says:

    @MiaM

    FIleMaker (new program written from scratch) was basically the Macintosh version of Nutshell, mostly because they couldn’t use the name on a Mac product. Both were developed by a company called Nashoba Systems which was later brought out by ForeThought Inc. (developer of PowerPoint!). Later on they sold FileMaker to Apple, who sold it under the Claris brand. ForeThought eventually got taken over by Microsoft. Nutshell evolved independently and had a bit of a following (sorta like how old DOS versions of dBase had a following).

    The market was big enough that a company called Fairhaven Software bought Nutshell and kept on selling it. They later rewrote the whole thing into a proper relational database (using DOS file handles) and called the replacement “Ultra-Plus”. The company is still around offering consulting services to migrate the old DOS base databases to….. FileMaker Pro.

    I strongly suspect all the dBase and Nutshell holdouts are why the Dosbox team has all those warning about it being for games only!

  42. vbdasc says:

    @Chris M. “I strongly suspect all the dBase and Nutshell holdouts are why the Dosbox team has all those warning about it being for games only!”

    Also don’t forget Clipper (which was immensely popular in many East European countries, at least) and Foxbase/Foxpro (although both Clipper and Fox* later developed powerful backward-compatible replacements for Windows and other OSes).

    Sorry for the off-topic though, because AFAIK neither Clipper, not Fox* used FCBs 🙂

  43. Jeff says:

    “Spurred by the release of the MS-DOS 4.0 source code, I thought I’d see if I could use EMU2 to build MS-DOS 4.0 on a non-DOS system. It did not go well.”

    Spurred by your post, I thought I’d see if my PC.js command-line utility could be used to build MS-DOS 4.0 on a non-DOS system. And it went pretty well. 😉 By default, it’s emulating a 16Mhz COMPAQ DeskPro, so the build takes a while, but it works.

    Details here (I haven’t felt inspired to write a blog post yet).

  44. Vlad Gnatov says:

    > The main problem with later versions of DOS was the limited number of FCBs allowed by default. Combine that with a multitasker and the system might need a huge number of FCBs without clear guidance in the documentation on how to get enough.

    You can have practically unlimited number of _local_ FCBs.
    The FCB api is pretty badly designed as it exposes system data structure (SFT) to the user program. In the attempt to mitigate that DOS only reads FCB at open_fcb call and only updates but not reads it afterward. The data read from user FCB is stored in separate SFT table with size m, when m is the first parameter in CONFIG.SYS’ FCBS= statement. If the user program opens more FCBs than this table can hold, specific LRU algorithm is used to purge oldest FCBSFT entry. That oldest FCB is still considered open and can be used, as SFT entry can be recreated by reading corresponding FCB in the user program. But only for local FCBs.

    >There was also a change to FCB support with DOS 5 and 6. https://hwiegman.home.xs4all.nl/msdos/74932.htm I can’t imagine how it would adversely affect any real program but there are enough programs out there that I can’t rule out one requiring the DOS 4 or earlier FCB behavior.

    The FCB api is indeed changes in every version of MSDOS (including 7.x), but in this case this is just change in LRU algorithm and more or less harmless. Before DOS 5.x it skips n youngest network FCBs even if they are older than some local FCBs, when n is the second parameter in CONFIG.SYS’ FCBS= statement. In DOS 5.x+ it skips all network FCBs and prefer to purge local FCBs first. If SHARE is loaded all FCBs are considered network FCBs.

  45. vbdasc says:

    This is certainly a very interesting information. A few notes, though

    From your information it follows that the second parameter of FCBS= only protects network files, no? Hence, a local file that happens to be the oldest will have its FCBSFT entry purged, even if technically it’s still open and can be recovered… But then some technical information from IBM and Microsoft (hence, authoritative) must be just plain wrong. Consider this quote from a MS manual for DOS 3.3 , for example

    “[about FCBS=, ]
    The parameter protects the first y files from being closed
    in this manner. If is set equal to , then files will not be closed and the
    application program will be unable to open any files in excess of the maximum. ”

    According to your information, this sort of deadlock (where no new files can be opened) can only happen when there are no open local files. A fact that is not reflected in the MS documentation.

    “Before DOS 5.x it skips n youngest network FCBs”

    Surely you meant “n oldest network FCBs”?

  46. vbdasc says:

    D@mn, the posting software mangled my text. Reposting the damaged part

    “[about FCBS=x, y]
    The parameter protects the first y files from being closed
    in this manner. If y is set equal to x, then files will not be closed and the
    application program will be unable to open any files in excess of the maximum. ”

  47. Vlad Gnatov says:

    >From your information it follows that the second parameter of FCBS= only protects network files, no?
    It was quite a while since I have seen DOS code, but I remember that there was two versions of fcblru proc, old and DOS5 one (former is commented out). The LRU pass in old one was something like:
    for (i = 0; i &lt FCBSFT.size; i++)
    if (!netfcb && !isshare) { update LRU_candidate }
    else if (FCBSFT[i].age > oldest_protFCB) { update LRU_candidate }

    > Surely you meant “n oldest network FCBs”?
    It’s LRU, so youngest is correct.

  48. vbdasc says:

    “The LRU pass in old one was something like: …”

    Okay, I’ll take this as “yes” as an answer to my question. When the files are local, the LRU algorithm is simple and straightforward, and no “protection” is offered. A note can be made that SHARE effectively makes all files networked, as you mentioned in your previous comment.

    “It’s LRU, so youngest is correct.”

    With all due respect, and risking to become obnoxious, I must disagree. Consider your full quote

    “Before DOS 5.x it skips [from purging] n youngest network FCBs even if they are older than some local FCBs, when n is the second parameter in CONFIG.SYS’ FCBS= statement.”

    What is the purpose of protecting the youngest files from purging? They are already protected by their youth, and the LRU won’t touch them by default anyway. Quite the contrary, the oldest files need protection from purging, because the default LRU will choose them first, and the second parameter should tweak the LRU algorithm so it would leave the n oldest files alone. This just makes sense.

  49. Richard Wells says:

    The network FCBs may be expected to be needed longer. So the newest network FCBs will be saved from the purge while local FCBs continue to be purged even ones created after the last protected network FCB. If more network FCBs are created, older network FCBs will be purged. This all keeps the local program from destroying the necessary network FCBs while those are still needed.

  50. Mr Morten says:

    Very unrelated (99%) but I don’t know who else to ask 😉 One of the things I remember from upgrading to Win95 was that floppy access considerably faster than it used to be (in DOS, Win 3.1, OS/2 etc.). You could literally hear reading was faster, the ‘clicking’ between tracks/cylinders happening at a higher frequency than in DOS. So things like copying a disk was much faster. What was the reason for this? Did they rely more on PIO transfers? I recall the machine would also seem slower / more jitterish when working with the disk, so it appeared to be highly CPU demanding. This was on an IBM PS/1 (could make a difference – perhaps other BIOS’es were using the same mechanism as Windows 95 natively).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.