As part of a hobby project, I set out to reconstruct assembly source code that should be built with an old version of MASM and exactly match an existing old binary. In the process I learned how old MASM versions worked, and why programmers hated MASM. Note that “old versions” in this context means MASM 5.x and older, i.e. older than MASM 6.0.
The way old MASM works is relatively straightforward but its documentation often explains it very poorly or not at all. MASM is a two-pass assembler, and that indirectly explains almost everything about its quirks. This is different from more modern N-pass assemblers which automatically run multiple passes to resolve ambiguities.
The core of the problem is that MASM tries to be clever, but it’s not nearly clever enough. It is very questionable whether MASM’s cleverness is a solution or a problem; other assemblers are stricter, relying on programmers to resolve ambiguities. This perhaps puts slightly more of a burden on the programmer but results in more readable, consistent source code.
Most ambiguities result from the fact that like most assemblers, MASM does not require symbols to be declared before they’re referenced. In the first pass, MASM generates “provisional” code, making guesses about what unknown symbols are. At the end of the first pass, all symbols are known (if they’re not, the assembly will fail).
In the second pass, MASM applies what it learned in the first pass and generates the final object code. If the guesses made in the first pass turn out to be incompatible with the second pass, MASM will report the dreaded “phase error”. More about that later.
Continue reading