The 8086/8088 is a 16-bit processor and offsets within a 64K segment always wrap around. If a one-byte instruction at offset FFFFh is executed on an 8086, execution will continue at offset 0. This is simply a consequence of the Instruction Pointer (IP) being a 16-bit register.
Funny things happen when an access crosses a segment boundary. On an 8086, it will also wrap around; accessing a word at offset FFFFh will access one byte at offset FFFFh and one byte at offset 0 in a segment. Again, that is a consequence of 16-bit address calculations.
The 80286 got a lot smarter about this. Segment protection prevents accesses that wrap around the end of a segment, for both data and instructions. The 80386 continued using the same logic.
The 286 and 386 support one special case, stack wraparound. When the 16-bit Stack Pointer (SP) is zero, pushing (say) a word on the stack will wrap around and the new SP will be FFFEh. This feature was required for 8086 compatibility, because a full size 64K stack needs to start with SP=0 (the pushes and pops must be aligned for the wraparound to occur; unaligned accesses will cause protection faults).
Does the instruction pointer also wrap around in a way similar to the stack segment?
Let’s consider the following simple DOS program:
.model small .code mov dx, offset msg_bot mov ah, 9 int 21h mov ax, 4C00h int 21h _start: mov ax, _DATA mov ds, ax mov dx, offset msg_str mov ah, 9 int 21h jmp near_end org 0FFF8h near_end: mov dx, offset msg_top mov ah, 9 int 21h inc ax .data msg_bot db 'Wrapped around to start of segment',13,10,'$' msg_top db 'Near top of code segment',13,10,'$' msg_str db 'Entered program',13,10,'$' .stack end _start
The program is constructed such that the one-byte ‘inc ax’ instruction is at offset FFFFh in the code segment.
When executed on a typical PC compatible system, the program will print the following:
C:\>wrap Entered program Near top of code segment Wrapped around to start of segment C:\>
Clearly the instruction pointer wrapped around 64K. Case closed.
But wait! Not so fast. Although it looks like the IP wrapped around, what actually happened is a bit more complicated, and much more interesting.
After executing ‘inc ax’ on a 386 compatible CPU, the EIP instruction pointer will not wrap to zero but rather advance to 10000h. This will trigger a #GP (General Protection) fault when attempting to execute the next instruction (of course, given that 10000h is past the 64K segment limit).
The #GP fault vector is 13 (0Dh). But in a PC compatible system, that is also the vector for hardware interrupt IRQ5. If there is nothing using IRQ5, the default BIOS handler will examine the interrupt controller state, decide that nothing happened, and execute IRET. Even if some peripheral is using IRQ5, the interrupt handler will eventually return with an IRET instruction.
And that’s where the the trick is. When the #GP fault occurs in real mode, the CPU can only push a 16-bit code offset on the stack. Instead of 10000h, it pushes zero. When the interrupt handler returns, it will continue executing at address zero instead of returning where it truly started (offset 10000h).
In protected mode, the behavior is a bit more obvious; assuming that 32-bit interrupt handlers are used, the CPU will push the full 32-bit EIP value on the stack. An IRET instruction will not be able to return because it will #GP fault trying to transfer control to an offset past the segment limit.
The same DOS program shown above does not successfully run in an OS/2 VDM. That is a strong hint that DOS applications do not rely on such wraparound, because it would be relatively easy for OS/2 to support that.
Protected-mode 16-bit programs usually will terminate with some form of protection fault if they try to execute past 64K. it is only the PC compatible DOS environment where the wraparound seemingly occurs, due to a combination of interrupts losing the high half of EIP and the #GP fault being aliased to a hardware interrupt.
Needless to say, 16-bit code segments on a 386 can have any segment limit, up to 4GB. No tool that I know of supports oversized 16-bit code segments (normal 16-bit near jumps and calls can only generate 16-bit offsets, but it is possible to produce 32-bit offsets in 16-bit code). The utility of such segments is extremely problematic in real mode, because every interrupt will lose the high word of EIP. In the end, it’s much more straightforward to use proper 32-bit code segments or at least multiple 16-bit code segments.
I really do not remember this detail to claim with certainty, but isn’t #GP always pushes error code as well, even in real mode? AFAIR it was the constant source of troubles, the need to distinguish between the exception and external interrupt, if (A)PIC was programmed with DOS base interrupt number, because frame formats are different, and plain IRET simply did not worked for exceptions.
I had a hard time finding this, but Intel says: “Exceptions do not return error codes in real-address mode.”
The need to distinguish between exceptions and hardware interrupts did exist, and it was problematic even so. DOS extenders (and other OSes) usually relocated the PIC base somewhere else to avoid trouble. The BIOS was generally not prepared to handle exceptions, except for some minor, well, exceptions (#UD triggered by LOCK prefixes, LOADALL, and such).
Yes, it might be relevant only for vm86 monitors. Thank you for the clarification. BTW, where did you found the Intel statement?
One of the effects in the Area 5150 demo uses IP wrapping (deliberately). There were some problems running that effect in DOSBox as a result (the effect in question otherwise works fine in DOSBox).
It’s in the “8086 Emulation”chapter, not in the chapter about exceptions and interrupts. Subsection “Interrupt and Exception Handling”.
Yeah, to get that right, one has to emulate an 8086/8088 and not any later CPU. Actually I’m not entirely certain where the 80186 falls, but the 286 and later will definitely not behave the same.
Hi,
Even more funny is to tackle the I/O port range wraparound. From my testing, “current” AMD processors will fault if 16 bit access is done to I/O port 0xffff or 32-bit access is done to 0xfffe/0xffff (despite the said I/O ports enabled in the permission bitmaps with the extra byte…) Intel processors won’t fault. Also, at least some very old AMD chipsets will simply create an I/O transaction on the bus which exceeds the 16-bit range.
Ruik
Yes, that is a weirdness which I think started with the 386. It was documented that the I/O port space does not wrap around and it’s possible to generate accesses to I/O ports beyond FFFFh. I suspect it was not exactly a conscious design change but just a side effect of implementation changes. Neither wraparound nor going beyond 64K make much sense, really.
Which AMDs fault, and is that even when the I/O permission bitmap is not in play? Faulting is IMHO the most sensible approach, really.
Hi,
I tried with Linux. At least AMD Ryzen 2700x. Check with following:
#include
#include
int main(void)
{
iopl(3);
printf(“%x %x\n”, inb(0xffff), inw(0xffff));
}
Compile, run as root and it will crash on inw on AMD (use gdb to check). On Intels it will just run fine. Note that recent kernels do not change IOPL to 3 but instead emulate that behavior with IO perm bitmap (to avoid userspace messing with interrupt flag).
Ruik