So I was working on improving a DOS emulator, when I found that something seemingly trivial wasn’t working right when COMMAND.COM was asked to do the following:
echo AB> foo.txt
echo CD>> foo.txt
Instead of ABCD, foo.txt contained ABBC.
I verified that yes, the right data was being passed to fwrite()
, with the big caveat that what COMMAND.COM was doing wasn’t quite as straightforward as one might think:
- Open foo.txt
- Write ‘AB’
- Close foo.txt
- Open foo.txt
- Seek one byte backward from the end of the file
- Read one byte
- Write ‘CD’
- Close foo.txt
The reason for the complexity is that COMMAND.COM tries to deal with a case that the file ends with a Ctrl-Z character (which wasn’t the case for me), and if so, the Ctrl-Z needs to be deleted. Somehow the seek/read/write sequence was confusing things. But why?
Sitting down with a debugger, I could just see how the C run-time library (Open Watcom) could be fixed to avoid this problem. But I could not shake a nagging feeling that such a basic bug would have had to be discovered and fixed years ago.
So I proceeded to write a simple test program which I could try with other compilers.
To my great surprise, the venerable Microsoft Visual C++ 6.0 as well as IBM C/C++ 3.6 for Windows both only wrote ‘AB’ to the output file! The ‘CD’ never got written at all.
I added further logging to determine that in both cases, the second fwrite()
reported that it wrote zero bytes. But that’s where things got a bit weird.
For the Microsoft runtime, ferror()
was set but errno
was zero. For the IBM runtime, ferror()
was clear but errno
was set to 41. Which according to IBM’s errno.h
header means EPUTANDGET
… and what does that error even mean?
At this point, I knew I was doing something wrong. But what? For once, stackoverflow actually had the right answer! Amazing, that almost never happens.
Why Oh Why?
Of course one has to wonder… why is it like this? Having basic file I/O functions behave in this non-obvious way (either quietly failing or not writing the expected data, depending on the sequence of other function calls) is clearly sub-optimal.
It is obvious that it would not be rocket science for the C library to keep a record of whether the most recent I/O was a read or a write, and perform the appropriate flush or seek when switching directions. Indeed it’s clear that for example the IBM C runtime keeps track internally, and issues a very specific error when the correct sequencing is violated.
The closest thing to an answer that I’ve been able to find is that “it’s always been this way”.
With a caveat that “always” means since circa 1979, not always always. Looking at the 1978 edition of K&R, it’s obvious why: The original K&R library only supported the read ("r"
), write ("w"
), and append ("a"
) modes for fopen()
, with append being effectively a write. There was no update mode, ("r+"
) and hence reads and writes could not be mixed at all! That is very likely part of the puzzle.
By the time the oldest preserved ANSI C draft rolled out, the behavior was already set in stone. Consider how little things have changed over the years:
When a file is opened with update mode (
ANSI X3J11 C draft, 1988'+'
as the second or third character in the mode argument), both input and output may be performed on the associated stream. However, output may not be directly followed by input without an intervening call to thefflush
function or to a file positioning function (fseek
,fsetpos
, orrewind
), and input may not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening a file with update mode may open or create a binary stream in some implementations.
The ANSI C Rationale contains the following text:
A change of input/output direction on an update file is only allowed following a
fsetpos
,fseek
,rewind
, orfflush
operation, since these are precisely the functions
which assure that the I/O buffer has been flushed.
The implication is that when the buffer I/O contains data, it’s not safe to switch read/write direction.
The published ANSI C89/ISO C90 is near identical to the draft Standard and does not bear repeating here. In C99, “may not” was replaced with “shall not” but little else changed:
When a file is opened with update mode (
ISO C99, 1999'+'
as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to thefflush
function or to a file positioning function (fseek
,fsetpos
, orrewind
), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.
Fast forward another (almost) quarter century, and we have this:
When a file is opened with update mode (’+’ as the second or third character in the previously described list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the
ISO C23, 2024fflush
function or to a file positioning function (fseek
,fsetpos
, orrewind
), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.
As far as Standard C is concerned, this has provably not changed since 1988 until present.
But of course the ANSI X3J11 Committee did not invent the C library. It worked on the basis of earlier documents, namely the elusive 1984 /usr/group Standard in case of the library.
While I couldn’t find a copy of the /usr/group Standard, the /usr/group committee likewise didn’t create the C library but rather tried to standardize existing implementations. Which means that the answer might lie in old UNIX manuals.
Even System V is too new and we have to look further back. The AT&T UNIX System III manual contains the following text in the fread
manual page:
When a file is opened for update, both input and output may be done on
AT&T UNIX System III manual, 1980
the resulting stream. However, output may not be directly followed by
input without an interveningfseek
orrewind
, and input may not be directly
followed by output without an interveningfseek
,rewind
, or an input opera-
tion which encounters end of file.
Hmm, that text from 1980 is rather similar to what ended up in ANSI C89. Sure, there was no fsetpos()
yet (an ANSI C invention), and the text is oddly missing any mention of fflush()
, even though flushing almost certainly made it OK to switch from writing to reading even then.
But it’s obvious that the restriction on switching between reading and writing on C library streams has been there for a very, very long time.
7th Edition UNIX (1979), even in the updated documentation from 1983, does not mention update mode for fopen()
and hence does not offer any advice on switching read/write directions.
Current Practice
At least Linux (glibc) and FreeBSD allow free intermixing of reads and writes. The FreeBSD man page for fopen()
states:
Reads and writes may be intermixed on read/write streams in any order, and do not require an intermediate seek as in previous versions of stdio. This is not portable to other systems, however; ISO/IEC 9899:1990 (“ISO C90”) and IEEE Std 1003.1 (“POSIX.1”) both require that a file positioning function intervene between output and input, unless an input operation encounters end-of-file.
In contrast, Microsoft’s library documentation (as of 2024) mirrors ISO C and states that flushing or seeking is required when changing read/write direction.
On the one hand, transparently handling the direction switching in the library is not outrageously difficult. On the other hand, doing so encourages programmers to write non-conforming C code which will fail in rather interesting ways on other implementations. As always, there are tradeoffs.
Old Source
Looking at historic source code proved quite interesting.
In 32V UNIX from 1979, fopen clearly opens files for either reading or writing, but not both (and any mode other than ‘w’ or ‘a’ means implicitly ‘r’!).
V6 UNIX from 1975 is too old to even have fopen()
. System III from 1980 on the other hand supports update mode, and opening streams for update sets an explicit _IORW
flag (and, as mentioned above, the System III documentation demands extra care when switching I/O direction).
Things get confusing with V7 UNIX from 1979. Although the documentation does not show any update mode option for fopen()
, the actual implementation supports it. In fact the V7 code from 1979 is nearly identical to what was in System III a year later. Why? I don’t know.
And then there’s the 2BSD code, again from 1979. While the BSD fopen()
has no provision for indicating update mode with the ‘+’ character, it allows specifying open modes like "rw"
, setting both the _IOREAD
and _IOWRT
flags. In fact the 2BSD man page for fopen explicitly lists "rw"
and "ra"
as supported open modes which allow both reading and writing, but there is nothing said about whether mixing fread()
and fwrite()
freely is allowed. There is also an explanatory README file with a note from November 1978 describing the change to allow mixed read and write access.
A 1977 paper by Dennis M. Ritchie A New Input-Output Package is quite clear that when fopen()
was first conceived, a stream would support either reading or writing, but not both. It is also clear that users found this too restrictive and by 1979, there were at least two different implementations (AT&T and BSD) which allowed mixed read/write streams.
Notably in the BSD implementation, fopen()
was modified to allow both reading and writing but fread()
and fwrite()
were not. It is not clear to me if the BSD code was robust enough to allow free mixing of reads and writes. The AT&T documentation has always been clear that it’s not allowed.
And as far as Standard C and POSIX are concerned, that has not changed until today. To write portable code, it is necessary to take some action when changing read/write direction. A dummy call such as
fseek( f, 0, SEEK_CUR );
is entirely sufficient to get the stream into a state where switching between reading and writing is safe.
I suppose oddities like this just happen when you have nearly nearly 50 years of history behind you.
It’s long been clear to me that stdio is just badly designed — it’s
a “hide the complexity” affair, instead of a “bring it out in the open
and make it managable” one. So these kind of awful gotchas are to be
expected.
Just me 2 cents — the insight likely won’t help here.
With many storage devices, it was impossible to have both reads and writes. Paper tape was referred to as different logical devices to the punch versus the reader even if the physical device was designed to do both but not at the same time.
Changing from read or write to read and write for a single open command required a lot of behind the scenes work to correctly handle single direction devices and clearing of buffers. I never found it that hard to close a file and then reopen it in a different mode but I guess preferred programming methods have changed.
Magnetic tape too. Reading and writing can’t be intermixed easily.
What I realized is that, at least in the programs I write, most of the time a file is open either for reading or for writing. Rewriting files is certainly done, but it’s not something that happens all the time.
This is all good, but where does ABBC come from?
UNIX has such a nice byte-wise orthogonal interface, and stdio then
breaks it for the sake of compatibility and efficiency — a rationale
that hasn’t really held up for decades.
Convenience routines around {read,write}(2) certainly can’t hurt —
that’s why me wrote librdwr — but stdio just obscures and confuses
what’s really happening in a rather un-UNIXy manner.
Ironically, stdio may be more suited environments like mess-dos than
it ever was for UNIX.
The single-byte fread() places ‘B’ in the FILE buffer. The following two-byte fwrite() appends ‘CD’, so the buffer now holds ‘BCD’. When the file is closed and the buffer flushed, the library writes two bytes… starting at the beginning of the buffer, so ‘BC’.
If you read the Dennis M. Ritchie paper, stdio was designed to be portable. It obviously pre-dates DOS by a few years, but it was designed to handle systems with record-oriented I/O and all kinds of strangeness. Which it does.
It also has quite good performance thanks to buffering, especially when doing single-charater reads or writes.
But if you think compatibility and efficiency are worthless, there’s nothing to discuss.
dmr’s pursuit of generality, while laudable in general (heh), led to
him making another mistake like stdio, one that didn’t have as much
impact — STREAMS.
Me point about the reasons of compatibility and efficiency is that
they haven’t held up in this case: nearly everything is byte-wise now,
and machines are powerful (parallel) enough that pipeline stalls[0]
have become the greater evil.
So stdio was a design for the past and present, not the future. Yet
many still feel they’re stuck w/ it to this day. Sound familiar?
YMMV, of course, as always…
[0] A term normally used in microarchitecture, but it just as well
applies here.
Oh, and on the issue of short {read,write}s being inefficient, which
remains somewhat of an issue: me’s studied that problem, as well. Me
proposed solution is to implement hardware pipes, where the processor
provides protected access to the kernel-side buffer and only triggers
an interrupt when it’s filled up on write (if that happens before it’s
drained by another process) or empty on read (if that happens before
it’s replenished by another process).
(That’d certainly be more of a modernization than adding the nth vector
instruction set.)
When I tried your example on MS-DOS 6.22:
echo AB> foo.txt
echo CD>> foo.txt
type foo.txt
I see:
AB
CD
Note the newlines between each of the “echo” statements’ output. Which is what I think the expected behavior should be, not “ABCD”, since the echo command appends newlines to the output. DOS’s echo command doesn’t offer an equivalent to the Unix “echo -n” command.
Of course! Because you’re not running it in the emulator I’m working on. So you don’t get the problem. I probably didn’t explain it well enough in the blog post.
And you’re right about the newlines, I don’t think they can be really be suppressed in DOS.
You get me thinking is there is a way to remove the buffer?. let’s say even if we don’t want the speed increase… No, we can’t we would need bit or byte transfers and the hard disk and other media work in sectors at least for the old media, the new media would even read/write longer blocks of information. So we need buffers in our operating system.
Now that I think in File Control Blocks – CP/M doesn’t essentially the application manages the buffer? and that is part of what simplifies stdio.
CP/M uses 128-byte records, so anything that does bytewise I/O on it needs to buffer the data. (My recollection is that the C libraries I’ve seen for CP/M tend to emulate UNIX-style open()/read()/write() and then have a UNIXy stdio library on top of that, rather than implementing fopen()/fread()/fwrite() directly with CP/M API calls).
There were also several C implementations that supported DOS 1.x. I never worked with any of those, but I suspect they likewise provided open()/read()/write() and layered stdio on top.
Interesting topic!
Re designed for the past and the present, but not the future:
A problem is that we still use file I/O at all, treating files as streams.
Sure, there are cases where this is a really good idea.
But for maximum performance it’s better to map files as part of memory, and just let the virtual memory manager handle reads and writes.
The case where this might be a downside would be if you need to insert or delete data in the middle of a file, where you might end up doing a memory copy on the file mapped in memory, and for this specific case it would be good to have some sort of API to insert/delete parts of a file. Not sure how the OS would optimize this though; maybe keep track of where insertions/deletions have taken place where those aren’t an even sector size, and rearrange data on disk when the file is closed?
I recommend reading the reasoning for creating Varnish, the cache proxy software, which kind of both roasts Squid (the previous alternative, albeit that was focused on caching clients on a corporate network while Varnish focuses on caching nearby web servers), and also roasts programming in general that is designed like if the computer are made in the 1970’s, i.e. thinking that files and memory are different things and whatnot.
Re the past:
In addition to some systems not being able to open files both for read or write – it also wasn’t totally unheard of to not be able to position the file pointer freely. A classic example is the disk drives for the 8-bit Commodore computers. For the most part files are sequential, you either open them for read or write, and you just read or write them as streams, without any seeking. There is also a “relative file” which allows seeking but you have to create the file by setting a record size and file length at creation, and you can only seek to a specific record, not any arbitrary position. I might misremember the details; those files were really uncommon. They were obviously designed for database usage. And sure, although C compilers actually appeared in the second half of the 1980’s, I would guess that few if any commercial programs were written in C on that platform.
Fun fact side track: Although it was very rarely used, magnetic tape in the form of regular audio compact cassettes, could have an end-of-tape marker on the Commodore 8-bit computers, and if you tried reading past such marker, you would get an end of tape message. I think this was only used in books about programming.
In general re reading and writing the same opened file: I think it feels a bit weird to have the same position pointer for reading and writing. It somehow would feel better to have separate pointers.
Re magnetic tape: Although it would be more complicated than disks, it would be possible to allow mixing reading and writing the same way as a computer generally writes individual sectors rather than a full track (at least for older disks – I would assume that a modern hard disk rewrites a full track when you write data to a single sector?). For example for tapes with many tracks, like 9-track tape, I would think it’s possible to have a format where it’s possible to keep track of exact positioning (for example read existing data just before overwriting it) and thus be able to overwrite at least single sectors without ending up with incorrect timing, overwriting the start of the following sector or ending up with leftover garbage from the end of what’s supposed to be the overwritten sector. It would likely be much harder although not impossible to overwrite individual bytes.
@MiaM: while me appreciates your exposition, on the main point me’d
argue the direct opposite — as raw power increases, me foresees a
heavier move towards addressing memory as files, or the more general
case behind UNIX files: pipes. After all, pipes are the most general
case of them all, and having algorithms that operate on streams of data
allows all kinds of fancy stuff (including in-process imperative
“pipelines” that micro-massage data within a single process while
somewhat abstracting away any intermediate buffers; me’s already
implemented some of this).
There’s also the “transputer”, the designers of which aimed for
something very similar. The 80s are closer than you think.
Core used to be called “the memory file”, if me’d not too mistaken.
The 70s are closer than you think.
But the Commodore 64 is just ancient history. Honest! 😉
(Okay, the expound on that last one a *teensy* bit: it was quite
generic and moddable w/o complicated interfaces, which is something
we could really use right now.)
@MiaM But thinking that there is not hard disk is thinking 70’s, or at leas for those who worked in Multics released in 1969. I think that also in some operating system of IBM there was not concept of files, System 34/System 38 or something like that but I’m not sure.
“(at least for older disks – I would assume that a modern hard disk rewrites a full track when you write data to a single sector?)” My understanding of modern hard disk is that they have a buffer that accepts multiple sectors, not necessarily contiguous. If the driver uses multi-sector transfers is faster.
“I would think it’s possible to have a format where it’s possible to keep track of exact positioning” at least in many/most? magnetic tapes and floppy drives physically the write gap-sector identifier-gap-data-gap so they could read/write individual sectors, the gaps are because the timing could be a little off or the rotation speed could have a little variation. Some old tape drives could read the sectors backwards and even the most advanced tape drive could write backwards or forwards. In modern hard disk I don’t know what the physical format is, but I would think something similar.
Rewriting a track to modify a sector only happens with tightly packed formats that lack sector gaps. The Amiga floppy format is one example.
Shingled hard drives have a special consideration where writing a sector may require rewriting sectors on the overlapping tracks as well. Flash can only keep a few blocks open at any time so as many writes as possible will need to combined for a given block or the drive won’t last too long.
Hard drives with built-in caches make life interesting. Many of the issues of performance are concealed from the OS but it could take several seconds for a write to actually complete. Not every error handler waits that long.
>But for maximum performance it’s better to map files as part of memory, and just let the virtual memory manager handle reads and writes.
The mmap() is a nice tech, especially now when 64-bit is mainstream and there is no real risk of address space exhaustion. But often there are no big benefits and even some disadvantages in using it.
The last time I tested read/write vs mmap was a few years ago (Fedora 25, FreeBSD 11), but I still roughly remember results:
– reads on small sizes(up to few M) were significantly faster, mapped memory reads were slightly faster on sizes from 10M.
– writes were always faster, significantly so on small sizes.
@Vlad Gnatov:
I assume that you just wrote test data, either just zeros or something generated on the fly (like increment/decrement a variable)?
If you process data sequentially you probably get good performance from treating files as streams.
But if you need to prepare blocks for writing, or parse read data in non-sequential order, I would think that mmap would be much faster, as you avoid first reading or writing between disk and memory within the OS (most likely using DMA) and then have the application copy data to/from the OS buffers.
Also if applications were “better” they could also have their internal storage act the same way as data stored on disk. Not sure how common this is, but I would think that a PDF reader would be a great example for this.
I suppose mmap’ing a file entails mucking with the page table, with the associated TLB activity and other memory mapper plumbing (Looks like Linux defers this activity to the first access to the mmap’d memory (which faults), and only then does the heavy plumbing). Anyway, I can easily imagine that a syscall or two to open a file and read a bunch of it can be faster for small files.
It’s also (to me anyways) questionable how useful mmap’ing is for the biggest files we tend to deal with… streaming video. On a 32-bit OS, it’s not even possible because the files are too big. On a 64-bit OS it is possible, but what is the benefit? The point of streaming A/V is that there is no need to deal with the file as a whole, only a comparatively tiny portion.
>I assume that you just wrote test data, either just zeros or something generated on the fly (like increment/decrement a variable)?
The test methodology was trivial:
For read tests prepare data from /dev/random, reboot and test the chosen size 10 times, reboot and test the next size.
For write tests data from /dev/random was read in a separate buffer first.
The statistics was processed with FreeBSD’s ministat(1).
>If you process data sequentially you probably get good performance from treating files as streams.
@rlkr:
I would assume that the kernel code is similar to the code that handles memory and swapping in general. The difference between mmap and general memory would just be that mmap maps memory blocks to the sectors on disk that forms a specific file, while general memory maps to swap when required, kind of sort of.
@michal:
Agree that streaming video (and I add audio too) is the best anti-example for mmap.
In particular I would say that desktop OS:es has struggled with handling streaming from a local disk for decades. Not 100% sure but it seems like it’s actually solved in Windows 10, but I might be mistaken. The classic problem was that the disk cache code never understood that once audio or video had already been played the likelihood of it being accessed again shortly was almost zero, and thus it treated the data like if it were a word processor file and kept it in cache while throwing out other things from cache, or even swapped out code that hadn’t been in use for a long while.
I haven’t looked in to what is possible with memory mapping in different OS:es, but a reasonable thing would be to be able to map a specific part of a file, obviously as long as it fits the page size. That way it would be possible to map arbitrary large files.
I would think that for example audio editing and if it’s possible to map partial files also video editing would benefit from memory mapping the different source files.
Also: Video games might benefit from memory mapping a temporary file that is just copied to create a save state (or perhaps rather packed to save disk space).
I found the memory mapped databases to be useful only in a limited realm of problems. Once I moved beyond what the developers showcased in demos, the MMDB was rather sluggish. That is even without the problem that the database could be a challenge to move to other architectures. Andrew Crotty over at CMU has a fun little paper showcasing some of the challenges for memory mapping for databases.