Someone recently asked an interesting question: Why do Microsoft C and compatible DOS compilers have no truncate()
and/or ftruncate()
library functions? And how does one resize files on DOS?
OK, that’s actually two questions. The first one is easy enough to answer: Because XENIX had no truncate()
or ftruncate()
either. Instead, XENIX had a chsize() function which, sure enough, can be found in the Microsoft C libraries at least as far back as MS C 3.0 (early 1985).
The second question is rather more interesting. The way files are resized on DOS is moving the file pointer to the desired size by calling the LSEEK function (INT 21h/42h), and then calling the WRITE function (INT 21h/40h) with zero length (CX=0).
Now, this mechanism is rather curious, because the handle-based file API in DOS 2.0 was modeled on XENIX, yet on UNIX systems, the write()
function asked to transfer zero bytes simply does nothing. If the mechanism didn’t come from XENIX, where did it come from?
I thought I’d check the DOS 2.x source code. But the $Write
function in XENIX2.ASM has absolutely no special handling of zero-size writes. It just performs common setup code and hands off the real work to $FCB_RANDOM_WRITE_BLOCK
.
Was this behavior some kind of oversight? No, certainly not. The MS-DOS 2.0 Programmer’s Reference Manual is quite clear that writing zero bytes either truncates or extends a file.
But the source code points in the right direction. $FCB_RANDOM_WRITE_BLOCK
is in fact INT 21h/28h. And that function is documented to change the file size when called with CX=0.
Is this some kind of CP/M heritage? No, it’s not. CP/M 2.2 had no mechanism for resizing files, and CP/M 3 had a separate BDOS function to change file size, nothing like the DOS mechanism.
In fact this method of resizing files is unambiguously documented in the 86-DOS 0.3 Programmer’s Manual, published in 1980. It is also documented in the preliminary 86-DOS manual of unclear vintage; in that version, the functionality is only document to truncate files, not extend them. In the 86-DOS 0.3 manual, the documentation clearly states that both truncating and extending files can be achieved using this method.
It is thus clear that the DOS method of resizing files through zero-length writes originated in 86-DOS in 1980, and it is more or less guaranteed to be Tim Paterson’s invention. The 86-DOS method was adopted for handle-based I/O in DOS 2.0 by default, because the handle-based I/O was layered on top of FCB I/O.
Now let’s briefly loop back to chsize()
. Implementing a XENIX compatible chsize()
function on DOS is not entirely straightforward. For one thing, chsize()
is not expected to move the file pointer, which means the current position needs to be saved and restored. Another problem is that when DOS extends a file size, it just allocates whatever clusters happen to be available. But on XENIX, chsize()
fills files with zeros when extending; therefore the DOS run-time library implementation must explicitly write the requisite number of zero bytes when extending files. When extending files, chsize()
implicitly changes the file size by writing to it rather than explicitly asking DOS to increase the file size.
I admit that I haven’t used the FCBS API in DOS and I might misremember things, but:
One of the great things with DOS as compared to CP/M was that it can have freely variable size files. CP/M can afaik only have file sizes that are multiples of 128 bytes or so, which is also the reason for why some file formats can handle padding garbage at the end of files.
Maybe this is connected to why the code to change file size came to be?
The directory in CP/M includes a list of blocks used by files. Think clusters on FAT. It may be faster to copy complete blocks and then trim the final block to the appropriate number of records instead of processing 128 bytes records directly.
CP/M had a number of extensions to determine the exact byte count of the final record. Differences made these not very useful.
@Richard: This page contains all the gory details of CP/M’s incompatibilities regarding what the “final record byte count” means: https://www.seasip.info/Cpm/bytelen.html#lrbc
The big problem seems to be (1) CP/M itself doesn’t actually use the value, it just exposes an API for applications to set it and leaves it up to them to actually maintain its value – maybe this was necessary given CP/M’s IO was fundamentally block-oriented and they didn’t introduce a byte-oriented API (2) more inexcusably, CP/M’s documentation failed to clearly specify the meaning of the field, so two incompatible interpretations of it developed.
In any event, this feature was added in CP/M 3, whereas at the time Paterson was writing his DOS, CP/M 2 was the only publicly available version. Understandable he’d invent his own API given the official CP/M one hadn’t been publicly released yet, and he therefore could not have known about it.
Honestly, this is a wart on the face of DOS. Microsoft should have corrected it in DOS 2.0 , by introducing a int 21h handles-based subfunction that truncates/resizes like Unix. Such a function could be implemented by packaging the needed FCB calls like seeking, writing etc. Why MS didn’t do that is unclear, but mimicking the Xenix idiosyncrasies seems the most likely reason.
vbdasc:
Worth remembering is that at some point Xenix apparently had the largest sales of Unix based/compatible/like operating systems. In that view the decision seems more reasonable.
Btw it also seems like the idea of even being able to seek within files, let alone adjust the length at will, wasn’t always a thing on all disk operating systems. As an example the Commodore 8-bit drives with their DOS in the drives ROM can’t seek within regular files. There is a special “REL” file types that is record oriented that actually allows randomly accessing any “record” within a file, but otherwise there is no seek. Each sector contains two bytes that points to the next track+sector for the file, and 254 bytes of data, and the directory only contains a pointer to the first sector of each file, so seeking would require reading the file from start up to the desired point. Would of course be faster to do said seek within the drive and at least not have to transfer the data to the computer just to throw it away, but still.
If someone feels like doing it, it would be interesting to see a table of which 70’s/80’s micro computer disk operating systems allowed seeking and also allowed truncating and/or extending the size of existing files. Not interesting enough that I would take the time to do it though, but still 🙂
RT-11 did truncation as part of the final step of file creation. That was because RT-11 defaulted to allocating an excessive amount of disk space for any file.
A number of systems had recorded based movement but I think it was based on the assigned size of a record instead of CP/M mandatory 128 bytes.
MS tended to go for a similar end result not the same method when trying for matching APIs so I would never expect Xenix-DOS to emulate all the behaviors of Xenix.
The CX = 0 is a very clever piece of code. If the number of records wasn’t changed, it skips through all the record writing code and simply updates the directory. That was the type of efficiency that was needed to have a useful OS on a 64 KB machine. Oddly, Andy Johnson Laird lists CP/M 2.2 with a function 40 that fills a block with NULLs credited to requirements of MS COBOL. So CP/M got the NULL behavior like Xenix which MS didn’t provide to DOS.
I don’t think it was influenced by Xenix, otherwise DOS would have chsize() syscall. It looks for me like change filesize function was omitted from initial api design and later workaround was used as a fix.
I wonder if some DOS authors are available online for the consultation.
Maybe tangentially connected. Byte granular file sizes were added in 86-DOS relatively early on. But Ctrl-Z is also recognized as an EOF character in DOS in various contexts.
As I clearly failed to explain, the DOS interface to change file size was not derived from XENIX. As to why Microsoft did it the way they did, I think the decision making went approximately like this: Option A, piggy back onto existing FCB functionality to change file size and get the same functionality in the handle-based API entirely for free; Option B, add a new API, write new code, and pay the memory cost for functionality that’s quite rarely used.
It’s not hard to see why option A won.
They designed new handle api and choose to save like 100 bytes maximum with a kludge tied to the legacy api?
Well, that can be true.
Some related info: MS planned to add explicit “Set File Size” to IFSMGR FSDs interface in Windows 9x but reverted to “Write Zero Length” model as I remember.
When you’re working with machines that have 64K RAM total (and that is what it was in DOS 2.0 days), throwing away 100 bytes is just not a great idea.
Interesting. They probably concluded that the “zero length write”, although idiosyncratic, actually does the job well enough. And they presumably needed to preserve the existing behavior anyway.
Side track: Were there any PCs with less than say 128k out in the wild, or is that kind of a theoretical thing?
At the time PCs had less than 256k it would probably had seemed like a main reason for opting for a PC rather than something else would be to break the 64k barrier.
AFAIK, DOS 2 kernel used around 24k of ram, so 64k was more like a theoretical minimum. But I agree with your reasoning. It’s one of those things like A20 or EGA write only regs, that brings immediate gains, which don’t matter in a few years, but long term pain/inconveniences.
The IBM DOS 2.1 manual has a list of programs affected by DOS 2.x changes. It notes that some require 128K, some 96K, and some 64K RAM. I can’t imagine why they’d specify that if there were no machines with 64K.
The DOS 2.0 manual says: “Because of the significant amount of function added, DOS Version 2.00 is considerably larger than previous versions. We recommend a minimum memory size of 64K bytes for DOS Version 2.00 (128K bytes if you are using a fixed disk).”
Although CP/M 2 doesn’t have an ‘official’ function to change a file size, the command processor does it unofficially by decrementing the length byte at FCB+0Fh, clearing byte FCB+0Eh (the top bit of which is zero if the directory entry needs to be written) and closing the file.
All the original 5150s had the 64K motherboard and weren’t guaranteed to gain a memory card. DOS 2 needed to run on those well enough that the user would then see the benefit of increasing the memory. Given the way items like video ate into the Z-80’s memory map, the 64K PC didn’t fare that badly in comparison.
The other widespread IBM system that had less than 256K available and also used DOS 2 was the PC Jr. Technically, the Jr had 128K for the disk version but 32K of that would be assigned to video. The 96K available provided room for DOS 2.1 plus some small utility while reserving 64K for an application.
“Side track: Were there any PCs with less than say 128k out in the wild, or is that kind of a theoretical thing?”
MiaM: Curiously, model 1 of IBM PC (5150) came with only 16Kb RAM, and model 3/813 came with 48Kb. Granted, model 1 had no floppies and therefore it wasn’t intended to run DOS, but model 3/813 had a floppy.
For the sake of completeness: The PC BIOS cannot boot off the floppy in a system with less than 32K RAM. To be able to run DOS, PCs had to have at least 32K. 48K was probably a reasonable amount of memory to run DOS 1.x in 1981-82.
The bugs in the Oct 1982 BIOS suggest that few 5150s were shipped without all 4 motherboard banks being filled. IBM’s prices at launch made it just as affordable to have IBM install extra memory and drives as doing it oneself.
@Richard Wells:
That’s okay, but there were two BIOS revisions of IBM PC BIOS from 1981, and they didn’t have that bug. By October 1982, IBM had already replaced the first batch of PCs with the models 014, 064 and 074 (all three discontinued March 1983 , parallel with introduction of 256Kb motherboards and PC/XT) which all had 64Kb on board (despite the erroneous Wikipedia claim that they all came with 16Kb RAM). Things moved fast back then.
The first 5150s were all shipped with less than 64Kb RAM standard (according to IBM documents), and this state of things remained for almost an year. Yes, one could probably order additional memory chips from IBM to be preinstalled, but it was an option.
@Richard Wells:
However, I thank you for reminding me of this bug. It got me thinking and I now have a weird riddle I can’t seem to understand.
I mean, look at the specs of the PC models 114, 164 and 174 from 1983 (and 104 from 1984). (look at https://ardent-tool.com/docs/pdf/GC20-8210-00_A_Guide_to_IBM_Personal_Computers_Apr85.pdf , pages 47 and below) These all use the new motherboard revision (the 64Kb-256Kb one) and come with 64Kb RAM standard (according to IBM themselves). This means that all four models have only one memory bank filled (because 64Kb are soldered). But the only PC BIOS that supports the new board revision is the buggy one from October 1982, and the bug doesn’t allow normal work if all the 4 memory banks are not filled. How is this possible? I can’t imagine that IBM shipped for years computers that didn’t work without opening and tinkering with the hardware (model 104 was discontinued 1987). Seems something is wrong with what we think we know about the IBM PC.
I recommend you to ACTUALLY open that link that you have helpfully provided and then look on TOP of that page 48 (where models with 64KB RAM are listed). There you would see the following message:
“The following 5150 models have been withdrawn from marketing by IBM”.
And given the fact that we know about that stupid bug now we can predict WHY would they be “withdrawn from marketing”.
Technically bug haven’t prevented them from being used with BASIC, it was only DOS that had trouble because of wrong amount of memory being reported.
The error code 201 would stop the machine from getting to BASIC. This is the sort of issue that would yield a lot of service calls unless literally no one ordered a machine without a fully populated motherboard after 1982. The bug was most often noticed relatively recently when motherboards had banks with bad chips removed and the BIOS updated to permit the use of EGA/VGA cards.
That IBM failed to test the systems with less than all 4 banks filled was slightly out of character.
@Victor Klimenko
The documentation in question is from 1985. Of course the models 114, 164, 174 (introduced March 1983) were withdrawn from marketing by then. But they all were introduced MONTHS after the buggy BIOS was released, and were the FLAGSHIP models of IBM PC (especially 174). It’s INCREDIBLE that IBM didn’t fix the BIOS. Using a machine like 174 (with two double-sided floppies only for Basic IMHO, is just silly)
The last 5150 models weren’t withdrawn before April, 1987. It’s amazing that IBM didn’t bother to make an updated BIOS, and the last version remained the buggy one from October, 1982.
I forgot to add that models 114, 164, 174 weren’t withdrawn before June, 1984. This means that they were actively marketed and sold for about 15 months. This is an awfully long time for models whose BIOS contains such a grave bug that limits their functionality.
> The IBM DOS 2.1 manual has a list of programs affected by DOS 2.x changes. It notes that some require 128K, some 96K, and some 64K RAM. I can’t imagine why they’d specify that if there were no machines with 64K.
Sorry for posting delay, just found in TODO:
PCDOS 2.0 on PC5150 64k, practical test (big thanks to Jeff Parsons and his pcjs.org)
The chkdsk shows 65536/40960 total/free memory
Program too big to fit in memory:
Flight Simulator
HomeBase 1.04A
Lotus 1-2-3 1A
MASM 1.0
PC Tools 1.03
Falls because insufficient memory available:
Executive Suite 1982: fall silently
MS Word 1.10
Turbo Pascal 2.00b: runs, but responding with “File too big” on any file load attempt
WordStar 3.20: not enuf memory
Works with issues:
IBM PC Diagnostics 1.02 (Adv): COMMAND.COM halts with memory allocation error
MS Multiplan 1.06: works with moderately sized sheets (with every few cells filled “Free” percentage decreases)
SuperCalc 1.10: works with moderately sized sheets
VisiCalc 1981: works with moderately sized sheets
Works:
NU 2.0: can hang if too many sectors viewed, perhaps bug(memory leak) in NU, need to test in 2.01
donkey.bas: only with basica from PCDOS 1.0 floppy, basica from 2.0 is too big
Conclusion: DOS 2.0 on 64k works well enough to run a few basic programs, load an example spreadsheet and stimulate user to buy more ram.
Good information, thanks.
I will note that for DOS, it mattered not only what IBM sold but also what users had. For example IBM never sold any 96K PCs (let alone XTs), but it’s not difficult to imagine users had them. And 64K machines were clearly popular in the old days, whether IBM shipped them that way or users added 16K to a 48K system sold by IBM.