Hang with early DOS boot sector

While installing various versions of DOS for the DOS history series of articles, I was faced with a mysterious problem: Some versions of DOS would hang right away when booting from fixed disk, but not from floppy. I already knew that DOS 4.x is very sensitive to BIOS stack usage; if a BIOS needs more than 100 bytes or so of stack to process a disk read request, it will fail to boot DOS 4.x from fixed disk, even though the same DOS 4.x can access the same disk just fine when booted from floppy.

However, the hangs I was observing were happening with DOS 2.x and 3.x, and those do not have such tight stack usage requirements. I quickly realized that the problem is caused by a bug in the DOS boot sector: the boot sector code tries to optimize the loading of IBMBIO.COM and attempts to read a whole disk track at a time. That sounds like a good idea, but it’s not.

The boot sector is loaded at address 0:7c00h, or just under 32KB. The BIOS component of DOS (IBMBIO.COM) is loaded at address 70h:0 (in other words, 0:700h). The boot sector also sets up the top of stack at 0:7c00h, just below the boot sector code. There is therefore slightly under 30KB of room for IBMBIO.COM, which in the 2.x and 3.x versions of DOS is well under 20KB size (just 4.5KB in DOS 2.0 in fact). In theory there should be no problem.

Unfortunately, the author of the DOS 2.0 boot sector was too eager to optimize the loading and not thinking far enough ahead. Loading a track at a time sounds clever, except when it’s not… On a modern disk, there are usually 63 sectors per track, or 31.5KB of data. If the boot sector reads that whole track, it will destroy the stack and overwrite itself. But back when the boot sector code was written, almost all fixed disks had 17 sectors per track, which meant that no amount of testing would have caught the bug. Floppy booting is no problem either, since even 1.44MB diskettes only have 18 sectors per track.

Now, the above explains why an old version of DOS might hang when booted from a modern fixed disk. But it does not explain why some of my DOS 2.x and 3.x installs worked just fine, including a case with one install of DOS 2.0 booting fine and another persistently hanging.

After some head scratching, it turned out that the DOS partition size is key. Depending on how big the partition is, the FATs will have varying size, and IBMBIO.COM will start on a different sector relative to the start of the track. If IBMBIO.COM starts on, say, sector 30, the boot sector will load 33 sectors and DOS will boot fine. If IBMBIO.COM starts on sector 1, DOS will hang. Similarly if IBMBIO.COM starts on sector 60, the first read of four sectors will be fine, but the next read of 63 sectors will crash the system.

For reference, a 30-cylinder partition on a typical disk with 63 sectors per track and 16 heads tends to cause problems. A 60-cylinder partition (close to the 32MB partition size limit) tends to work, but the exact behavior depends on the DOS version.

The bug most likely affects all 2.x and 3.x versions of DOS prior to 3.3. Version 3.3 relaxed the restriction that IBMBIO.COM/IO.SYS must be contiguous (which enabled the track-at-once loading); this is documented by Microsoft in KB 66530. The removed restriction required changes to the boot sector and IBMBIO.COM/IO.SYS had to be loaded cluster by cluster.

In DOS 4.0, IBMBIO.COM/IO.SYS grew beyond 30KB and could no longer be loaded in one go at all, even if it were contiguous (that would in essence always cause the hang problem described above). There just wasn’t enough space between 70h:0 and 0:7c00h anymore. The BIOS component was therefore loaded in two stages, which avoided the problem. However, the staged loading caused the previously mentioned issue with tight stack space—but that’s a different story.

This bug does not appear to be well known. The most likely reason is that extremely few people attempt installing DOS 3.2 or older on a computer with a multi-gigabyte disk. DOS 2.x is simply not useful for running any “modern” DOS software, and DOS 3.0/3.1/3.2 has a serious drawback in that it does not support 1.44MB floppy drives.

This entry was posted in DOS. Bookmark the permalink.

32 Responses to Hang with early DOS boot sector

  1. Yuhong Bao says:

    Not to mention the 32MB partition limit turns the 58 sectors per track limit into a non-issue anyway.

  2. michaln says:

    Sorry, that makes no sense. Can you please elaborate? What 58 SPT limit are you talking about?

  3. Yuhong Bao says:

    7c00h – 700h / 512 bytes = 58

  4. Yuhong Bao says:

    Actually, it would be 57 sectors per track to allow for the stack itself.

  5. michaln says:

    There’s room for 58.5 sectors between 700h and 7C00h. The stack is extremely unlikely to need more than 256 bytes, so 58 sectors should be fine.

    But what does that have to do with the 32MB partition size limit?

  6. Yuhong Bao says:

    Well, DOS 3.2 and earlier did not support partitioning at all without third-party drivers.

  7. michaln says:

    That’s a rather misleading statement. DOS always supported partitioning (ever since hard disk support was added in DOS 2.0), but prior to 3.3 DOS could not access multiple DOS partitions at the same time. It was always possible to have multiple partitions present on a fixed disk, even multiple DOS partitions (but only one active/accessible).

  8. Yuhong Bao says:

    Yea, I think it was primarily for booting other OSes like Xenix.

  9. rauli says:

    Let’s suppose:
    1 – We have a hard disk partition [A] with MS DOS 3.0/3.1/3.2 installed on it, and crashing at boot (because of the issue described in this article).
    2 – We have MS DOS 3.30 installed on another disk [B] (another hard disk partition or a diskette).

    Replacing [A] boot sector with [B] boot sector… would make [A] boot?
    (replacing the boot sector, but maintaining the BPB part, of course)

  10. michaln says:

    It might help, but I can’t guarantee that it will work. I’d give it about 80% chance of success. If you do try, be very careful.

  11. rauli says:

    It works!
    But, if you take DOS 3.3 boot sector from a diskette, you have to maintain also byte at 01FD from the [A] boot sector (3rd from the end) which is 80h for a hard disk boot sector, and 00h for a diskette boot sector.
    I didn’t notice that byte, and it almost makes me quit…
    As I’ve said before, you also have to maintain the original BPB from [A] (for DOS 3.x it’s at bytes 0Bh to 1Dh, I think).
    After some rest I will try to boot DOS 2.x and 4 with this same method (using 3.3 boot sector). Just to try, but I think they will not boot.

  12. alex peter says:

    anyone know how to get more than 4 partitions from dos 3.2? assuming the drive can handle more than 4 32 meg partitions. I heard ast and nec came up with 8 32 meg partitions I think, but im looking for software |i can use with dos 3.2 that will give me more partitions. any takers?

  13. alex peter says:

    I want to run my old games on my Tandy 1k. 3.3 wont run them due to too much conventional mem ue but 3.2 does. Drawback is that I have an 8 bit ide card that can take 2 gig partitions but the dos it uses (6.22) definitely wont run my games. tandy had a very weird video set up that used conventional mem to run its video ram. (sucks) but If I can circumvent the problem by using 3.2 then I got it licked. Assuming that I can get some sort of partitioning sorftware or a version of fdisk that will give me more than the said 4 partition limitations in 3.2.

  14. michaln says:

    If they really supported more than 4 partitions, they probably had OEM DOS versions with adapted IO.SYS.

  15. Pingback: DOS boot hang update | OS/2 Museum

  16. ImperatorBanana says:

    I know you posted this a while ago, but *THANK YOU*! I’ve been using older versions of PC-DOS (2.10, 3.20) with modern replacement storage solutions on the PCJr (JrIDE+CF adapter, SDCartJr + SD Card) and noticed what seemed to be random configurations of partition size + DOS version + storage device would hang on boot or trigger other strange behavior that now makes perfect sense: stack smashing! A lot of the modern replacement storage solutions tend to be used with more modern DOS versions and present themselves as larger CHS values so this problem doesn’t seem to be very well known today either but again, as someone using older versions of DOS: thank you for publishing it!

  17. Michal Necasek says:

    Glad it helped! The problem was somewhat known back in the day (1987-ish) as it popped up with ESDI hard drives and such, but it was soon forgotten because newer DOS versions fixed it. Old DOS versions were written with the assumption that hard disks have 17 sectors per track or thereabouts, which was true for a number of years… until it wasn’t.

  18. ImperatorBanana says:

    I took a shot at disassembling the bootsector and writing an assemble-able version of it (at least in PC-DOS 2.10 with the geometry of the SD-Cart JR + 16GB flash drive: CHS=1024,255,63). Your description of the issue helped greatly with understanding what was really happening and labeling it all which was a nice fun project for the evenings, so thank you again!

    Once I got it assembling byte-for-byte (mostly…see the last statement below), I went forward with trying to patch it and, overall, the patch seems to work. It looks like there is a byte that tells it how many sectors IBMBIO.COM + IBMDOS.COM take up, so my logic was just (regardless of how many sectors are left on the track after the last directory sector) try to read that number of sectors. The read itself will technically fail if it hits the track boundary before reading all of the necessary sectors, but since the error message will indicate the number of sectors successfully read (unfortunately also including the sector it fails on in the count), I just confirm which error occurred, subtract the successfully read sectors from the number of sectors intended to read, update the RAM offset pointer and starting sector, and try the next read from the new starting position. The patch is about 4 bytes shorter than the original, so I put in some NOPs to keep everything else aligned.

    About the only thing I haven’t figured out is in MASM 2.0 how to get it to assemble the code to start at offset 0 of the output file (without it padding the beginning with a bunch of 0’s) while ensuring the memory access offsets are 7c00. I tried a few combinations of ORG and CODE SEGMENT AT values but had no luck. It’s pretty easy to just run a post processing pass to remove the leading 0’s though (in my github, the REFERENCE folder has an IBM C Compiler 1.0 compatible program to strip out the 0’s) so no worries there for me. They made some changes to the PC-DOS 3.2 bootsector so I haven’t yet gone through that to see if my patch logic can be applied similarly yet.

    Github if you were curious (has both the original and patched assemble-able and binary): https://github.com/RetroByten/DOS_BOOTSECTOR_DISASSEMBLY/tree/main/PCDOS210

  19. Michal Necasek says:

    Very cool!

    I don’t think you can get MASM to skip the zeros, or at least Microsoft did not know how to. Because in the MS-DOS 3.21 OAK, they assemble and link the boot sector and then use a DEBUG script to postprocess the resulting binary.

    The assembler as such could do it, the LEDATA OMF record specifies a starting address. But then the linker would add the padding anyway because MS LINK has no concept of segments not starting at zero.

  20. ImperatorBanana says:

    Oh neat, I’ll have to see if the OAK (or some descriptions of that debug script) are online somewhere to reference. It’s pretty awesome when you hit an issue, don’t really know how to solve it so you come up with an alternative workaround, and it turns out in real life that was pretty much the production workaround! Again, much thanks both for posting it and for the additional information on MASM/MS Link!!

  21. Michal Necasek says:

    I can just quote the relevant bits here. The build batch file for the boot sector did this:

    masm -Mx -t -I../../inc msboot.asm,msboot.obj;
    link @msboot.lrf;
    exe2bin msboot
    debug msboot.bin < debscr

    And the debscr script looked like this:

    m7d00 l 200 100
    rcx
    200
    w
    q

    The DEBUG script seems to make an awful lot of assumptions about what the commands do by default. But basically DEBUG loads the binary as if it were a .COM file (i.e. at offset 100h), then the script copies 512 bytes from offset 7D00h to 100h, and writes those 512 bytes to disk.

    ETA: The 'w' command as used there writes the count of bytes in BX:CX to the default file, which in this case is the file specified on the DEBUG.COM command file. And on startup, BX:CX contain the input file size, which is why BX does not need to be explicitly set. It's all very impenetrable without documentation.

  22. ImperatorBanana says:

    Oh goodness, thank you again! For whatever reason I was fully aware DOS (I think ~2.0 and up?) supported the “>” operator but didn’t pay enough attention to the manual to realize the “<" operator was also supported so it being that straightforward never crossed my mind! Also, thank you for the tidbit on BX:CX's initial state: I've been dumping the boot sectors / MBRs by writing the small int13 programs in debug (manually since I didn't know the scripting was possible with the "<" operator, now I know!) which destroy bx:cx so my "-w" command usage was always preceded by setting those registers. I probably would've guessed it was assuming a default state (the tutorial I followed for int13 ignored the setting of ES for the ES:BX buffer location and I came to the same conclusion), but for that script I would've ended up writing the rbx
    0
    anyway.

  23. Michal Necasek says:

    Yes, it was DOS 2.0 that added the standard input/output redirection. Where DOS 1.x was a clear CP/M workalike, DOS 2.0 (also) attempted to act like a minimalist/old UNIX.

    I didn’t know about the initial BX:CX values either. The trouble with DEBUG is that the built-in help is extremely terse (and nonexistent in old versions), and online documentation is no better. In the DOS 2.x days, DEBUG was documented in the DOS manual. By DOS 3.3, the DEBUG documentation was relegated to the Technical Reference, something that very few people had. A lot of the DEBUG functionality is quite non-obvious, and over time people forgot that DEBUG can actually do quite a bit.

    Possibly the most uptodate official DEBUG documentation is in the PC DOS 7 Technical Update.

  24. Jeff says:

    This bug was something I decided to work-around in the pc.js utility I’ve been working on recently.

    That utility allows you to start a PC XT or PC AT machine with a hard disk of any drive type or geometry you specify, along with the boot sector and boot files from any version of MS-DOS, PC DOS, or COMPAQ DOS you select from the pcjs web server.

    Unfortunately, that meant sometimes you’d end up with an unbootable drive, thanks to this bug. So I added some logic to the pc.js formatting code to push the start of the partition just far enough to ensure that IO.SYS/IBMBIO.COM always ends at the end of a track. Problem more-or-less solved.

    This wasn’t the only issue I had to work-around, and I’m sure it won’t be the last. 😉

  25. Jeff says:

    I should add that when employing that trick, you have to use the same “bad math” that the DOS boot sector uses when calculating the number of sectors in IO.SYS/IBMBIO.COM: it divides the file size by 512 and then ALWAYS adds 1 to the result (it should have only added 1 if there was a remainder). So whenever the file is an exact multiple of 512, the boot sector will read one too many sectors.

    And even though DOS 2.x “precalculated” that number and stored it in the boot sector at offset 0x20, it still used the same bad math. DOS 3.x boot sectors actually read the file size from the directory entry and then do the bad math. Fortunately, the bad math is consistent across all versions — at least all versions also affected by the read-a-whole-track bug.

  26. Michal Necasek says:

    That’s very cool! At first I didn’t understand how it could work, but then I read again and realized that you’re doing the equivalent of FDISK/FORMAT/SYS yourself. So you can take care of these little details.

    When I first ran into this bug, I didn’t realize how much disks with 17 sectors per track were a thing. The BIOS drive tables in the PC/AT and even the XT 286 only had 17-sector drives, nothing else. The original Deskpro 386 had 25/26 and 33/34 sector per track entries in its drive table, and that was when DOS 3.3 was around the corner.

  27. MiaM says:

    Side track: What disks and what controller used 25/26 and 33/34 sectors per track?

    17 sectors per track is what you automatically end up with with MFM encoding, 512 byte sectors and the data rate and rotation speed the classic ST506/412 disks used.

    I had always thought that any other number of sectors per tracks would use a controller that has it’s own BIOS that at least in some way overrides the motherboard BIOS, at least before IDE drives. Now I realize though that my memory might be fuzzy, I used to have a RLL card but I can’t remember if I had to select matching disk types in the BIOS setup or if I set it to no disks, or possibly if I had to select matching disk types but incorrect amount of sectors? (Unfortunately that card and the two disks I used it with were accidentally thrown away about 15 years ago – a bit sad. Therefore I can’t check how it was). I have some memory that it wasn’t possible to use this RLL card at the same time as a regular 16-bit MFM AT style ISA controller card.

    As a side track I would say that the way PC used to handle multiple hard disk cards were terrible before PCI. At best you might get a SCSI card going the same time as MFM/RLL/ESDI/IDE, but you could rarely get two MFM/RLL/ESDI cards going the same time.

  28. Michal Necasek says:

    Easy. MFM drives worked with 5 Mbit/sec and gave you 17 sectors per track. RLL drives added 50%, worked with 7.5 Mbit/sec, and used 26 sectors per track. Sometimes people formatted tracks with a spare sector for defect remapping, and then you got 25 sectors per track. ESDI drives used initially 10 Mbit/sec, so instead of 17 sectors you got 34 sectors per track. If one was reserved for remapping, you got 33 sectors per track. In all cases, these drives could be paired with a controller which used the PC/AT (aka WD1003, sometimes horribly misnamed as ST-506) host interface, using the system BIOS to access the drive.

    None of this requires its own BIOS *except* you have to have that entry in the drive table. That was initially a major stumbling block because with a classic PC/AT style BIOS, you simply had no way to use a disk geometry not already included in the system BIOS. Not without adding some kind of adapter ROM.

    For example the better models of the original Compaq Deskpro 386 used ESDI drives paired with Western Digital WD1005 controllers. These used the exact same register interface as standard PC/AT MFM controllers. All Compaq needed to do was make sure that the right entry was in the system BIOS drive tables, with 34 sectors per track.

  29. Jeff says:

    And speaking of drive tables, that’s why I created a slightly modified version of the DOS Master Boot Record (MBR) that includes a drive table alongside the partition table and installs it if there’s a non-zero entry. There were any number of “hacky” ways I could have made custom drive geometries work, but this seemed like the cleanest (and I suspect there were OEMs and drive manufacturers back in the day that came to a similar conclusion).

    When pc.js is building your custom hard disk, it adds that MBR to the image, along with your geometry, and you’re good to go. Both the IBM PC AT BIOS and the COMPAQ DeskPro 386 BIOS seem content with anything in the traditional CHS range.

    More details are here.

  30. MiaM says:

    Oh, I didn’t know that those controllers were register compatible with the classic MFM controller.

    (Or perhaps I did but had forgotten)

  31. Michal Necasek says:

    I think most RLL controllers were, and some ESDI controllers (perhaps a minority?) were too. A lot of people were somehow confused into thinking that ST-506 and ESDI are host interfaces, but they’re not. The same ST-506 drive might be attached to a PC through an XT or AT style hard disk controller, each with a completely different programming interface. Or the drive could be equipped with an ST-506 to SCSI adapter. Or with an ST-506 to IDE adapter (rare but it did exist, Compaq used those). Same with ESDI, there were ESDI to SCSI adapters, WD made AT hard disk compatible adapters, IBM had their own ESDI interface. All of those could have the exact same drive behind it, but the programming interface was completely different.

  32. Now that the source code to 4.00 has been published it would be possible to fix the bootloader?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.