In July 1990, Microsoft released a specification for Virtual DMA Services, or VDS. This happened soon after the release of Windows 3.0, one of the first (though not the first) providers of VDS. The VDS specification was designed for writers of real-mode driver code for devices which used DMA, especially bus-master DMA.
Why Bother?
Let’s start with some background information explaining why VDS was necessary and unavoidable.
In the days of PCs, XTs, and ATs, life was simple. In real mode, there was a given, immutable relationship between CPU addresses (16-bit segment plus 16-bit offset) and hardware visible (20-bit or 24-bit) physical addresses. Any address could be trivially converted between a segment:offset format and a physical address.
When the 386 came along, things got complicated. When paging was enabled, the CPU’s memory-management unit (MMU) inserted a layer of translation between “virtual” and physical addresses. Because paging could be (and often was) enabled in environments that ran existing real-mode code in the 386’s virtual-8086 (V86) mode, writers of V86-mode control software had to contend with a nasty problem related to DMA, caused by the fact that any existing real-mode driver software (which notably included the system’s BIOS) had no idea about paging.
The first and most obvious “problem child” was the floppy. The PC’s floppy drive subsystem uses DMA; when the BIOS programs the DMA controller, it uses the real-mode segmented 16:16 address to calculate the corresponding physical address, and uses that to program the DMA page register and the 8237 DMA controller itself.
That works perfectly well… until software tries to perform any floppy reads or writes to or from paged memory. In the simplest case of an EMM386 style memory manager (no multiple virtual DOS machines), the problem strikes when floppy reads and writes target UMBs or the EMS page frame. In both cases, DOS/BIOS would usually work with a segmented address somewhere between 640K and 1M, but the 386’s paging unit translates this address to a location somewhere in extended memory, above 1 MB.
The floppy driver code in the BIOS does not and can not have any idea about this, and sets up the transfer to an address between 640K and 1M, where there often isn’t any memory at all. Floppy reads and writes are not going to work.
DMA Controller Virtualization
For floppy access, V86 mode control software (EMM386, DesqView, Windows/386, etc.) took advantage of the ability to intercept port I/O. The V86 control software intercepts some or all accesses to the DMA controller. When it detects an attempt (by the DMA controller) to access paged memory where the real-mode address does not directly correspond to a physical address, the control software needs to do extra work. In some cases, it is possible to simply change the address programmed into the DMA controller.
In other cases it’s not. The paged memory may not be physically contiguous. That is, real-mode software might be working with a 16 KB buffer, but the buffer could be stored in five 4K pages that aren’t located next to each other in physical memory.
That’s something the PC DMA controller simply can’t deal with—it can only transfer to or from contiguous physical memory. And there are other potential problems. The memory could be above 16 MB, not addressable by the PC/AT memory controller at all. Or it might be contiguous but straddle a 64K boundary, which is another thing the standard PC DMA controller can’t handle.
In such cases, the V86 control software must allocate a separate, contiguous buffer in physical memory that is fully addressable by the PC DMA controller. DMA reads must be directed into the buffer, and subsequently copied to the “real” memory which was the target of the read. For writes, memory contents must be first copied to the DMA buffer and then written to the device.
All this can be done more or less transparently by the V86 control software, at least for standard devices like the floppy.
DMA Bus Mastering
The V86 control software is helpless in the face of bus-mastering storage or network controllers. There is no standardized hardware interface, and often no I/O ports to intercept, either. Bus mastering hardware does not utilize the PC DMA controller and has its own built in DMA controller.
While bus mastering became very common with EISA and PCI, it had been around since the PC/AT, and numerous ISA based bus-mastering controllers did exist.
This became a significant problem circa 1989. Not only were there several existing bus-mastering ISA SCSI HBAs on the market available as options (notably the Adaptec 1540 series), but major OEMs including IBM and Compaq were starting to ship high-end machines with a bus-mastering SCSI HBA (often MCA or EISA) as a standard option.
The V86 control software had no mechanism to intercept and “fix up” bus-mastering DMA transfers. Storage controllers were especially critical, because chances were high that without some kind of intervention, software like Windows/386 or Windows 3.0 in 386 Enhanced mode wouldn’t even load, let alone work.
The first workaround was to use double buffering. Some piece of software, often a disk cache, would allocate a buffer in conventional memory (below 640K) and all disk accesses were funneled through that buffer. This technique was often called double-buffering.
It was also far from ideal. Double-buffering reduced the performance of expensive, supposedly best and fastest storage controllers. And it ate precious conventional memory.
The Windows/386 Solution
Microsoft’s Windows/386 had to contend with all these problems. The optimal Win/386 solution was a native virtual driver, or VxD, which would interface with the hardware.
Due to lack of surviving documentation, it’s difficult to say exactly what services Windows/386 version 2.x offered. But we know exactly what Windows 3.0 offered when operating in 386 Enhanced mode.
Windows 3.0 came with the Virtual DMA Device aka VDMAD. This VxD virtualizes the 8237 DMA controller, but also offers several services intended to be used by drivers of bus-mastering DMA controllers.
For the worst case scenario which requires double buffering, VDMAD has a contiguous DMA buffer; this buffer can be requested using the VDMAD_Request_Buffer
and returned with VDMAD_Release_Buffer
. While the VDMAD API could handle multiple buffers, Windows 3.x in reality only had one buffer. The buffer is in memory that is not necessarily directly addressable by the callers. The VDMAD_Copy_From_Buffer
and VDMAD_Copy_To_Buffer
APIs take care of this.
In some cases, double buffering is not needed. The VDMAD_Lock_DMA_Region
(and the corresponding VDMAD_Unlock_DMA_Region
) API can be used if the target memory is contiguous and accessible by the DMA controller. The OS will lock the memory, which means the physical underlying memory can’t be moved or paged out until it’s unlocked again. This is obviously necessary in a multi-tasking OS, because the target memory must remain in place until a DMA transfer is completed.
In the ideal scenario, a bus-mastering DMA controller supports scatter-gather. That is, the device itself can accept a list of memory descriptors, each with a physical memory address and corresponding length. Thus a buffer can be “scattered” in physical memory and “gathered” by the controller into a single entity. DMA controllers with scatter-gather are ideally suited for operating systems using paging. With scatter-gather, there is no need for double-buffering or any other workarounds.
The VDMAD_Scatter_Lock
API takes the address of a memory buffer, locks its pages in memory, and fills in an “Extended DMA Descriptor Structure” (Extended DDS, or EDDS) with a list of physical addresses and lengths. The list from the EDDS is then supplied to the bus-mastering hardware. The VDMAD_Scatter_Unlock
API unlocks the buffer once the DMA transfer is completed.
When it is available, using scatter/gather does not require any additional buffers and avoids extraneous copying. It takes full advantage of bus-mastering hardware. All modern DMA controllers (storage, networking, USB, audio, etc.) use scatter-gather, and all modern operating systems offer similar functionality to lock a memory region and return a list of corresponding physical addresses and lengths.
The VDMAD VxD also offers services to disable or re-enable default translation for standard 8237 DMA channels, and a couple of other minor services.
VDS, or Virtual DMA Services
Why the long detour into the details of a Windows VxD? Because in Windows 3.x, VDS is nothing more than a relatively thin wrapper around the VDMAD APIs. In fact VDS is implemented by the VDMAD VxD (the Windows 3.1 source code is in the Windows 3.1 DDK; unfortunately the Windows 3.0 DDK has not yet been recovered).
VDS offers the following major services:
- Lock and unlock a DMA region
- Scatter/gather lock and unlock a region
- Request and release a DMA buffer
- Copy into and out of a DMA buffer
- Disable and enable DMA translation
These services obviously rather directly correspond to VDMAD APIs. VDS provides a small amount of added value though.
For example, the API to lock a DMA region can optionally rearrange memory pages to make the buffer physically contiguous, if it wasn’t already (needless to say, this may fail, and many VDS providers do not even implement this functionality). The API can likewise allocate a separate DMA buffer, optionally copy to it when locking, or optionally copy from the buffer when unlocking.
The VDS specification offers a list of possible DMA transfer scenarios, arranged from best to worst:
- Use scatter/gather. Of course, hardware must support this, and not all hardware does.
- Break up DMA requests into small pieces so that double-buffering is not required. This technique will help a lot, but won’t work in all cases (e.g. when the target buffer is not contiguous).
- Break up transfers and use the OS-provided buffer, which is at least 16K in size according to the VDS specification. This involves double-buffering and splitting larger transfers, hurting performance the most.
VDS Implementations
From the above it’s apparent that Windows 3.0 was likely the canonical VDS implementation. But it was far from the only one, and it wasn’t even the first one released. More or less any software using V86 mode and paging had to deal with the problem one way or another.
An instructive list can be found for example in the Adaptec ASW-1410 documentation, i.e. DOS drivers for the AHA-154x SCSI HBAs. The ASPI4DOS.SYS driver had the ability to provide double-buffering, with all the downsides. This was not required by newer software which provided VDS. The list included the following:
- Windows 3.0 (only relevant in 386 Enhanced mode)
- DOS 5.0 EMM386
- QEMM 5.0
- 386MAX 4.08
- Generally, protected mode software with VDS support
A similar list was offered by IBM, additionally including 386/VM.
It appears that Quarterdeck’s QEMM 5.0 may have been the first publicly available VDS implementation in January 1990. Note that QEMM 5.0 was released before Windows 3.0.
VDS was also implemented by OS/2. It wasn’t present in the initial OS/2 2.0 release but was added in OS/2 2.1.
Bugs
The VDS implementation in Windows 3.0 was rather buggy, and it’s obvious that at least some of the functionality was completely untested.
For example, the functions to copy to/from a DMA buffer (VDS functions 09h/0Ah) have a coding error which causes buffer size validation to spuriously fail more often than not; that is, the functions fail because they incorrectly determine that the destination buffer is too small when it’s really not. Additionally, the function to release a DMA buffer (VDS function 04h) fails to do so unless the flag to copy out of the buffer is also set.
There was of course a bit of a chicken and egg problem. VDS was to be used with real mode device drivers, none of which were supplied by Microsoft. It is likely that some of the VDS functionality in Windows 3.0 was tested with real devices prior to the release, but certainly not all of it.
VDS Users
In the Adaptec ASPI4DOS.SYS case, the driver utilizes VDS and takes over the INT 13h BIOS disk service for drives controlled by the HBA’s BIOS.
Newer Adaptec HBAs, such as the AHA-154xC and later, come with a BIOS which itself uses VDS. This poses an interesting issue because the BIOS must be prepared for VDS to come and go. That is not as unlikely as it might sound; for example on a system with just HIMEM.SYS loaded, there will be no VDS. If Windows 3.x in 386 Enhanced mode is started, VDS will be present and must be used, but when Windows terminates, VDS will be gone again.
This is not much of a problem for disk drivers; VDS presence can be checked before each disk transfer and VDS will be either used or not. It’s trickier for network drivers though. If a network driver is loaded when no VDS is present, it may set up receive buffers and program the hardware accordingly. For that reason, the VDS specification strongly suggests that VDS implementations should leave existing memory (e.g. conventional memory) in place, so that already-loaded drivers continue to work.
Not Just SCSI
Documentation for old software (such as Windows 3.0) often talks about “busmastering SCSI controllers” as if it was the only class of devices affected. That was never really true, but bus-mastering SCSI HBAs were by far the most widespread class of hardware affected by the problems with paging and DMA not playing along.
By 1990, the Adaptec 154x HBAs were already well established (the AHA-1540 was available since about 1987), and Adaptec was not the only vendor of bus-mastering SCSI HBAs.
There were also bus-mastering Ethernet adapters that started appearing in 1989-1990, such as ones based on the AMD LANCE or Intel 82586 controllers. Later PCI Ethernet adapters used almost exclusively bus mastering. Their network drivers for DOS accordingly utilized VDS.
VDS Documentation
Microsoft released the initial VDS documentation in July 1990 in a self-extracting archive aptly named VDS.EXE
(as documented in KB article Q63937). After the release of Windows 3.1, Microsoft published an updated VDS specification in October 1992, cunningly disguised in a file called PW0519T.TXT
; said file was also re-published as KB article Q93469.
IBM also published VDS documentation in the PS/2 BIOS Technical Reference, without ever referring to ‘VDS’. The IBM documentation is functionally identical to Microsoft’s, although it was clearly written independently. It is likely that IBM was an early VDS user in PS/2 machines equipped with bus-mastering SCSI controllers.
Original VDS documentation is helpfully archived here, among other places.
Conclusion
VDS was a hidden workhorse making bus-mastering DMA devices transparently work in DOS environments. It was driven by necessity, solving a problem that was initially obscure but circa 1989 increasingly more widespread. The interface was very similar to the API of Windows 3.0 VDMAD VxD, but VDS was implemented more or less by every 386 memory manager. It was used by loadable DOS drivers but also by the ROM BIOS of post-1990 adapters.
Do you think VDS was behind the decision to limit EMM386 int 2f ax=4a15 (IO_Trap) functionality to ports above 100h?
DOS dev/emm386/iotrap.asm line 503:
> cmp dx,0100h ;Q: I/O Addr jae SHORT IOT_NotISASys ; N: check mapping regs
>IOT_Sys: ; Y: dispatch I/O trap handler
> xchg bx, dx ; BL = port address
> shl bx,1 ; BX = BX*2 (word table)
> call cs:DMATable[bx]
DMATable has entries for virtual Keyboard (A20) and PIC (why?), so those also could be the reason, but why cut everything instead of neat carveouts, why limit at all?
This is annoying when trying to emulate Sound Blaster, forces one to use JEMM with no such limitation.
Wouldn’t only the ports above 100h have optionally installed devices that would need specific drivers while the ports below are generally on every PC and thus should be trapped by EMM386 by default? What port below 100h does a Soundblaster need?
It does strike me as odd that Quadtel/HPMM gets the zero for VDS. It suggests there was another white paper not from MS that got the whole concept rolling.
If you declare that you only remap ports above 100h then you can easily look on the value of DX in the exception handler and do what’s needed.
If you plan to support ports below 100h then you need to look on the code which causes that exception, disassemble it to know whether number is encoded in the instruction or in DX, etc.
And since most devices that use ports below 100h don’t need remapping it’s easier to just forbid them than to add all that complexity to EMM386.
Remember that we are talking about an era where every byte had non-trivial price measure in dollars (and bytes below 1MB cutout were even more precious).
I don’t think it was related to VDS. Ports below 100h were considered to be part of the system board and the EMM386 authors probably did not see a good reason why 3rd party software would need to intercept them (and several good reasons why they shouldn’t). As mentioned by Victor Khimenko, the simplification possible by ensuring that only the non-immediate forms of IN and OUT had to be dealt with could also be a factor.
Double buffer memory often didn’t reduce conventional memory since the memory in the real DOS session beyond the VM launching routines was unused in the early 386 multitaskers. Still slow though. Some of the competing designs choose to simply prevent user applications that would use DMA from running.
I am realizing that changing to Hyper-V has made it a lot more difficult to fire up an old Win/386 instance and go spelunking in Win386.EXE to find the precursor for VDMA. There has to be something like that or Win/386 would never have run.
There is other side of Bus Master problem – not using or using incorrectly VDS services in BIOS int 13h. And the solutions like SMARTDRV /DOUBLE_BUFFER or Win9x MSDOS.SYS:DoubleBuffer and DBLBUFF.SYS.
And need to note some related theme – hardware UMBs.
Sad that this was even necessary.
MS/DOS truly held back the PC industry at least 10 years. A proper MS/DOS kernel that provided all an application needed to interface with the hardware efficiently would have abstracted all this away behind an API, and drivers would be totally hidden.
The transition to 286 protected mode, and later 386 32-bit protected mode, would have been totally seamless.
OS/2 wouldn’t have even been necessary.
Yes, except… there were “proper” operating systems for the IBM PC, and users clearly did not care for them, they wanted DOS.
I wonder why they deemed it unnecessary to support 32-bit addressing for ISA DMA in later PCI chip sets? Like it couldn’t had cost much chip space (instead of having counters/registers for all 32 bits it would anyway had to drive the upper 8 bits low, and so on). Was the south bridge chips really that near the limit of what could reasonably be on those chips? :O
Or was it perhaps that later south bridge chips didn’t even have 32 address lines? With that chip interfacing to the 24-bit ISA bus and containing/emulating legacy hardware, maybe it only had 24 address bits, and the north bride took care of driving the upper 8 address bits low whenever the south bridge (itself or ISA bus master cards) did any DMA transfer?
Btw re untested code in Windows 3.0 – it can’t had been that hard to write a test suite that would actually test that the code works. Sure, some things would had needed an actual bus mastering card to test with, but bugs like those two you mention could easily had been tested without any bus mastering ISA card.
I wonder how many of all bugs in Windows would not had existed if Microsoft were better at testing their software? I know that making special test suits was way more uncommon back in the days than nowadays, but still.
Re intercepting ports below hex 100: I would say that the A20 enable/disable thing should really be intercepted, but perhaps not by generic software but by the 386 extender thingie itself. Turning A20 on/off for whatever reason (not sure why “user” code would want to do that, but still?) would of course cause havoc. And thus the 386 extender thingie would need to be able to determine ports even below hex 100. Or did they just ignore any software that might do bad things with the A20 enable/disable thing? :O
Also re what operating systems were available: Sure, there were xenix and whatnot, but in addition to a limited set of software available, there were loads of XT class machines in use that wouldn’t be a good fit for anything else than DOS, which also determined the minimum system that most software packages targeted. TBH in hindsight although I like the weirdness about the segmented protected mode of the 286 and I’ve always wanted to find the time to experiment with writing an OS for it, it was really bad that Intel made the 286 at all. They should had aimed directly at making what became the 386, or at least move in that direction. Even a pure 16-bit processor like the 286 could had had the type of MMU the 386 had, including a V86 mode, making it possible to to all “386 things” except specifically running 32-bit code. If that would had been available when IBM released the AT, I think that the PC operating systems would had looked very different from what we actually got. We would likely had had multi tasking operating systems appearing in the mid 80’s that would had been able to run multiple dos applications at the same time.
Re 80286 — you have to understand that the chip design was effectively done before the IBM PC (and DOS) became a thing. You’d need a time machine and explain to the Intel engineers that the brain damaged real mode was much, much more important than the far, far more powerful protected mode.
Re 32-bit ISA DMA — I strongly suspect it was lack of software. Because there were enough machines that could not do 32-bit ISA DMA, software had to deal with ISA DMA being limited to below 16 MB. That code worked everywhere, so why bother with a fancy alternative. Since widespread software was nonexistent, the additional hardware was unnecessary.