greatgib 4 minutes ago

As a side note of appreciation, I think that we can't do better than what he did for being transparent that LLM was used but still just for the proof-reading.

pansa2 41 minutes ago

> Using `lea` […] is useful if both of the operands are still needed later on in other calculations (as it leaves them unchanged)

As well as making it possible to preserve the values of both operands, it’s also occasionally useful to use `lea` instead of `add` because it preserves the CPU flags.

Thorrez 23 minutes ago

>However, in this case it doesn’t matter; those top bits5 are discarded when the result is written to the 32-bit eax.

>Those top bits should be zero, as the ABI requires it: the compiler relies on this here. Try editing the example above to pass and return longs to compare.

Sorry, I don't understand. How could the compiler both discard the top bits, and also rely on the top bits being zero? If it's discarding the top bits, it won't matter whether the top bits are zero or not, so it's not relying on that.

  • 201984 5 minutes ago

    He's actually wrong on the ABI requiring the top bits to be 0. It only requires that the bottom 32 bits match the parameter, but the top bits of a 32-bit parameter passed in a 64-bit register can be anything (at least on Linux).

    You can see that in this godbolt example: https://godbolt.org/z/M1ze74Gh6

    The reason the code in his post works is because the upper 32 bits of the parameters going into an addition can't affect the low 32 bits of the result, and he's only storing the low 32 bits.

  • Joker_vD 15 minutes ago

    (Almost) any instruction on x64 that writes to a 32-bit register as destination, writes the lower 32-bits of the value into the lower 32 bits of the full 64-bit register and zeroes out the upper 32 bits of the full register. He touched on it in his previous note "why xor eax, eax".

    But the funny thing is, the x64-specific supplement for SysV ABI doesn't actually specify whether the top bits should be zeroes or not (and so, if the compiler could rely on e.g. function returning ints to have upper 32 bits zeroes, or those could be garbage), and historically GCC and Clang diverged in their behaviour.

miningape an hour ago

Loving this series! I'm currently implementing a z80 emulator (gameboy) and it's my first real introduction to CISC, and is really pushing my assembly / machine code skills - so having these blog posts coming from the "other direction" are really interesting and give me some good context.

I've implemented toy languages and bytecode compilers/vms before but seeing it from a professional perspective is just fascinating.

That being said it was totally unexpected to find out we can use "addresses" for addition on x86.

  • Joker_vD 44 minutes ago

    A seasoned C programmer knows that "&arr[index]" is really just "arr + index" :) So in a sense, the optimizer rewrote "x + y" into "(int)&(((char*)x)[y])", which looks scarier in C, I admit.

    • crote 11 minutes ago

      The horrifying side effect of this is that "arr[idx]" is equal to "idx[arr]", so "5[arr]" is just as valid as "arr[5]".

      Your colleagues would probably prefer if you forget this.

sethops1 37 minutes ago

This guy is tricking us into learning assembly! Get 'em!!

Joker_vD an hour ago

Honestly, x86 is not nearly as CISC as those go. It just has a somewhat developed addressing modes comparing to the utterly anemic "register plus constant offset" one, and you are allowed to fold some load-arithmetic-store combinations into a single instruction. But that's it, no double- or triple-indexing or anything like what VAXen had.

    BINOP   disp(rd1+rd2 shl #N), rs

        vs.

    SHL     rTMP1, rd2, #N
    ADD     rTMP1, rTMP1, rd1
    LOAD    rTMP2, disp(rTMP1)
    BINOP   rTMP2, rTMP2, rs
    STORE   disp(rTMP1), rTMP2
And all it really takes to support this is just adding a second (smaller) ALU on your chip to do addressing calculations.