Tuesday, October 31, 2017

The Case of the Four Missing Characters

Alternatively: Initialize Your Variables

So it's been almost a year since I've used this. Honestly, I though after ENTR 390 was over, I wouldn't have much else to say. And I didn't - until I ran into something else interesting. Oh well. I guess the 0 readers of the internet will get to hear about my splurges.
(BTW if I should take this down please do tell)

Setting the Scene

The class in question is EECS 370. Introduction to Computer Organization, probably the most hardware-oriented class in the Computer Science major. It's a different way of thinking, but adaptation is survival.

The project in question concerns linking and loading: compiling assembly files into object files, and linking multiple object files together to form a machine code "executable." This executable is then backwards-compatible with the simulation program written for the previous project.

Linking Woes

Linking by itself seems straightforward: copy over the text and data sections, then use the relocation table to figure out which fields need to be changed, and use the symbol table to resolve global ones, and some math to resolve local ones.

And indeed, that's what my code does. After a few minor headaches here and there with the math involved with locals, I've got my code down to a tee. Just a single test case, which is what they give us in the spec, but that's okay; the linker gives the same answer. A simple submission should do the trick...

Hint: the following student test cases exposed the student linker as buggy: countdown
Wait a minute, that can't be right. I manually linked it and assembled that with the INSTRUCTOR solution! And the machine code matches! What gives?

Sleepness Nights

Ok, it wasn't that dramatic. But this issue did bother me a bunch. What's going on here? Here's a hint from now:
But it works on my machine!
But obviously past me doesn't know that.  Instead, I venture towards another solution:

More test cases! Obviously more tests should catch more bugs, right? Indeed, I did end up catching more buggy solutions with my new tests:

  • Multiply. Specifically, the smart multiplication algorithm from the previous project, which chooses the proper multiplier and multiplicand to minimize instructions executed. Here, I broke the choosing part of the algorithm into its own subroutine.
  • Times4: Another test case found in the spec, which simply uses a subroutine to multiply a number by 4.
  • Combination: The third part of the project, which involved writing a function that computes n choose r. The function was also seeded with a driver file, which specifies it compute 4 choose 2.
I also caught an additional bug with the linker, where I forgot to consider the "Stack" label when resolving globals. It's gotta be fine now, right?
Hint: the following student test cases exposed the student linker as buggy: combination countdown multiply times4
Okay. Something's fishy here. I personally linked all these files manually and assembled them with the previous project's INSTRUCTOR solution, and the resulting machine code files were identical (apart from the final 0 I used as a sentinel for the label "Stack," but in both cases it points to the same place)! What the heck is happening?

A diff of the manually linked Combination and the Linker Combination, showing the extra 0 used as the "Stack" label being the only difference


It Works on My Machine!

Sadly, not every platform is the same. I finally had had enough of banging my head against the Windows wall. Maybe it was different on Linux?

So I decided to compile my code on Linux and run it, and voila! There's an anomaly!
The Windows and Linux Results are Different! But what could cause this?
After a bit of tweaking the debug output, I saw something quite odd:
Wait a minute... Why is AllDataStart at THAT Value?
A bit of digging through my code, and the answer is obvious: I had never set allDataStart = 0; One simple mistake had cost so much time and effort. All for four missing characters. Not even, in fact. Because two would suffice.

Fixed?

To keep things short and sweet, yes. Initializing one variable that I had neglected to do before had fixed the issue. And my recurring headaches.

As for why? I'm not too sure. Not initializing the values has undefined behaviour, of course, but on Windows it seems to always have been initialized properly, to 0. In Linux, the environment the grader runs in, that doesn't seem to be the case. A cryptic value of 32768 comes up: not entirely sure why. It could have started as 32764, then gained a value of 4 as it is supposed to, but 32764 is such an odd value. Maybe, in its uninitialized woes, Linux decides to have mercy and initialize it to a nice value?

I don't know. Go ask someone who knows this better than me. I'm just happy this is finished,