Excellent Exceptions and Random Ramblings
Welcome to my Poorly Maintained, Rarely Updated slice of the Internet. Random ramblings about everything you can imagine (mostly CS) make it here... if I'm compelled enough to write about it. Usually because it was really interesting to me.
Tuesday, April 24, 2018
Tuesday, October 31, 2017
The Case of the Four Missing Characters
Alternatively: Initialize Your Variables
So it's been almost a year since I've used this. Honestly, I though after ENTR 390 was over, I wouldn't have much else to say. And I didn't - until I ran into something else interesting. Oh well. I guess the 0 readers of the internet will get to hear about my splurges.
(BTW if I should take this down please do tell)
Setting the Scene
The class in question is EECS 370. Introduction to Computer Organization, probably the most hardware-oriented class in the Computer Science major. It's a different way of thinking, but adaptation is survival.
The project in question concerns linking and loading: compiling assembly files into object files, and linking multiple object files together to form a machine code "executable." This executable is then backwards-compatible with the simulation program written for the previous project.
Linking Woes
Linking by itself seems straightforward: copy over the text and data sections, then use the relocation table to figure out which fields need to be changed, and use the symbol table to resolve global ones, and some math to resolve local ones.
And indeed, that's what my code does. After a few minor headaches here and there with the math involved with locals, I've got my code down to a tee. Just a single test case, which is what they give us in the spec, but that's okay; the linker gives the same answer. A simple submission should do the trick...
Hint: the following student test cases exposed the student linker as buggy: countdownWait a minute, that can't be right. I manually linked it and assembled that with the INSTRUCTOR solution! And the machine code matches! What gives?
Sleepness Nights
Ok, it wasn't that dramatic. But this issue did bother me a bunch. What's going on here? Here's a hint from now:
But it works on my machine!But obviously past me doesn't know that. Instead, I venture towards another solution:
More test cases! Obviously more tests should catch more bugs, right? Indeed, I did end up catching more buggy solutions with my new tests:
- Multiply. Specifically, the smart multiplication algorithm from the previous project, which chooses the proper multiplier and multiplicand to minimize instructions executed. Here, I broke the choosing part of the algorithm into its own subroutine.
- Times4: Another test case found in the spec, which simply uses a subroutine to multiply a number by 4.
- Combination: The third part of the project, which involved writing a function that computes n choose r. The function was also seeded with a driver file, which specifies it compute 4 choose 2.
I also caught an additional bug with the linker, where I forgot to consider the "Stack" label when resolving globals. It's gotta be fine now, right?
Hint: the following student test cases exposed the student linker as buggy: combination countdown multiply times4Okay. Something's fishy here. I personally linked all these files manually and assembled them with the previous project's INSTRUCTOR solution, and the resulting machine code files were identical (apart from the final 0 I used as a sentinel for the label "Stack," but in both cases it points to the same place)! What the heck is happening?
A diff of the manually linked Combination and the Linker Combination, showing the extra 0 used as the "Stack" label being the only difference |
It Works on My Machine!
Sadly, not every platform is the same. I finally had had enough of banging my head against the Windows wall. Maybe it was different on Linux?
So I decided to compile my code on Linux and run it, and voila! There's an anomaly!
The Windows and Linux Results are Different! But what could cause this? |
After a bit of tweaking the debug output, I saw something quite odd:
Wait a minute... Why is AllDataStart at THAT Value? |
A bit of digging through my code, and the answer is obvious: I had never set allDataStart = 0; One simple mistake had cost so much time and effort. All for four missing characters. Not even, in fact. Because two would suffice.
Fixed?
To keep things short and sweet, yes. Initializing one variable that I had neglected to do before had fixed the issue. And my recurring headaches.
As for why? I'm not too sure. Not initializing the values has undefined behaviour, of course, but on Windows it seems to always have been initialized properly, to 0. In Linux, the environment the grader runs in, that doesn't seem to be the case. A cryptic value of 32768 comes up: not entirely sure why. It could have started as 32764, then gained a value of 4 as it is supposed to, but 32764 is such an odd value. Maybe, in its uninitialized woes, Linux decides to have mercy and initialize it to a nice value?
I don't know. Go ask someone who knows this better than me. I'm just happy this is finished,
Friday, December 9, 2016
Monday, December 5, 2016
The Solutions to your Issues Lie in the Most Obscure Places
Memory Management Revisited
So remember the issues that we were having with the Photon and its memory management? Turns out C++ is pretty hard. One simple mistyped character caused a cascade of instability.
One Character?
Yep. One character.
The model we used was essentially an operating system for the Photon, with the input device being a 5 way button thingy and the output device being the screen. The system itself was composed of "input responders," a rough classification that is used for anything that responds to user input accordingly.
This model made use of both Menus and Info screens. The menus consisted of both the location menu, which gives the location of the device using Google's Geolocation API, and several first aid menus, act as collections of responses to first aid situations. Then come the info screens, which give users the information they need to respond to various situations.
The issue we were experiencing arose with the first aid menus. Though we had passed a parameter for the number of different situations on each menu, we happened to forget that while drawing the screen. Instead, we used a constant number 5, which happened to be more than the number of situations. And because this is C++, the Photon attempts to access data that it shouldn't. Sometimes, some other data was there, which resulted in the random gibberish that sometimes popped up everywhere.
A simple fix made the device stable. Or, well, almost. There was another issue that impeded progress.
Be the Better Programmer
This issue arose from my attempt to be a good programmer. Whenever I compiled the code, the compiler always warned me about the deprecated conversion from Particle's String to a char array. So I decided to create a method that would solve this issue and convert it without the trouble.
As luck would have it, this method caused more trouble than it solved. In the conversion process, random characters were added on for some reason. Perhaps it was in the length method. or perhaps it was something else. But one way or another, these random characters were tacked on.
It was perhaps these random characters that also caused similar issues as the ones we had solved beforehand. The library we used for the Nokia 5110, our LCD, made use of a bitmap for each character in order to store how the characters should be rendered. But given how these random characters were not in the range of that array, once again, the Photon begins accessing garbage data once again. Best case scenario, it renders as blank. Common scenario, it renders as random data.
Turns out that, like most deprecated things, support still exists. So instead of going through the conversion ourselves, make the Photon do it for us. It works just fine that way.
So why the Crashing?
To be honest, I can't tell. My best guess is that either the bitmap or the array was allocated near the end of memory, so accessing out-of-bounds data resulted in attempting to access memory that doesn't exist. And the Photon responds by crashing.
Can't blame it.
Of course, this is my best guess. Perhaps someone at Particle could explain this much better than I ever could.
Monday, November 28, 2016
Memory Management Fun
Flash?
There comes a time in every programmer's life when they ask themselves, "how much memory is my code using?" Now is that time.
Background: what are we doing?
Being a first aid kid addon, the logical addition to our product is a way for people to find out first aid information in order to help people in trouble. So, of course, we decided to go and make essentially a new operating system for the Photon to run off of. What could go wrong?
C++ is Fun (this is a lie)
Though it's quite widely known and I do have a bit of experience, C++ still posed quite a few challenges in programming. To put it bluntly, I'm still working on it. The inclusion of libraries and a lot of other factors have made this project quite a bit more complicated than I would have liked, but this is the life that I've chosen.
Crashing and Memory Management
The Photon and its firmware are both quite fickle, and the slightest perturbation has caused countless crashes in our testing. Code that works fine one moment crashes the next, sometimes without any changes, though rarely in such a case. Much more common is random crashing caused by removal of code that has nothing to do with the cause of the crash!
When I was first prototyping the code, I decided to include an abbreviation field to store a menu's abbreviation. But I never used this field, and so I believed that I could free up some critical space by getting rid of this space. No dice, however, as the program crashed as soon as I pressed any buttons. No debug output could explain it; and at this point this memory-eating feature still exists.
Another fun bug came with changing the information that our device provided. I noticed that some characters were being cut off, so I decided to fix it in my code. And again, it just decided to crash randomly?
This issue is very confusing, and the only explanation I possibly have is that the method of allocating memory within the code is not allocating enough. After all, I am using an array of char pointer pointers as a 2 dimensional array, so I do not know the specifics of how the Photon manages its memory. Thus, attempting to read the memory at a certain point may exceed the bounds of the array and read other information that is complete gibberish.
But it this were the case, there still remains a puzzling bug. The location menu currently has no functionality built into its center button. In the final product, pressing the center button should call the Geolocation methods we worked so tediously on before and then display the results to the screen. But right now, we are still prototyping this "Operating System" so that functionality is not included. Despite this, pressing the button appears to be a toggle of some sort. Specifically, toggling whether some gibberish appears on the menu or not. We added tracing calls to see whether some other subclasses' method was being called instead, but it was not. The code within the method itself is just the tracing output.
Color me surprised. These random crashes happen for no apparent reason, either. It's quite frustrating trying to develop in such an environment. Perhaps it would be better to try and develop something similar for a computer instead, just to try and flesh out the concept.
This issue is very confusing, and the only explanation I possibly have is that the method of allocating memory within the code is not allocating enough. After all, I am using an array of char pointer pointers as a 2 dimensional array, so I do not know the specifics of how the Photon manages its memory. Thus, attempting to read the memory at a certain point may exceed the bounds of the array and read other information that is complete gibberish.
But it this were the case, there still remains a puzzling bug. The location menu currently has no functionality built into its center button. In the final product, pressing the center button should call the Geolocation methods we worked so tediously on before and then display the results to the screen. But right now, we are still prototyping this "Operating System" so that functionality is not included. Despite this, pressing the button appears to be a toggle of some sort. Specifically, toggling whether some gibberish appears on the menu or not. We added tracing calls to see whether some other subclasses' method was being called instead, but it was not. The code within the method itself is just the tracing output.
Color me surprised. These random crashes happen for no apparent reason, either. It's quite frustrating trying to develop in such an environment. Perhaps it would be better to try and develop something similar for a computer instead, just to try and flesh out the concept.
Thursday, November 10, 2016
Video Editing is Hard
In Our Best Interests
To not watch the proof of concept video. But if you insist, you can check it out here. Because I don't want to upload the video again.
Also it uses flash. Did we just get out of a time machine a decade in the past?
Broken Code and Broken Dreams
Push to Master
Don't do it if your code doesn't work. Please. And if someone opens an issue, please work on resolving it.
Sorry
I had to get that rant out of my system, I realize that many of these Particle library devs are volunteers who do what they do out of love. So I'd like to take a moment and that all the people working on Open Source projects right now. Y'all the real MVPs.
Cutting it Close
Our proof of concept's requirements weren't really that stringent. That being said, I did want to add as much functionality as possible. In fact, I made an entire post about getting the Geolocation to work. What didn't make the cut into the proof of concept is somewhat more interesting.
Hardware that Failed
Our Nokia 5110 LCD came with a part that we had trouble identifying at first. Turns out, it's a Texas Instruments CD4050BE, a fancy part that made wiring much trickier. In fact, we weren't entirely sure what its purpose was. With a little digging, we found out that the LCD worked just fine without this part. And sure enough, it did. So now there's a CD4050BE just sitting around in our design lab. In case anyone needs it.
The other component we were playing around with was pulsesensor.com's (appropriately enough) pulse sensor. In our demo testing of the hardware, we were able to get it working just fine. But that was 2 weeks before our proof of concept demo. We were content with it working, and so decided to move onto the LCD and Geolocation, figuring that getting the pulse sensor working would be easy enough.
Turns out, it's not that simple. In the two weeks that had passed, the author of the code released an update. Appropriately enough, our code broke. It was time to go hunting for a reason. Fortunately, Particle's online IDE's libraries make use of Github. The repository was just a click away. And checking the changelogs and the Readme gave us an answer as to why it was broken. Previous versions had included a dependency library along with the library code. The latest version separated them, so that it was necessary to include both libraries in the code.
So we went back and did that and... nothing. Instead of complaining about being unable to find a necessary library, it was complaining about an error within the library itself. Quite puzzling indeed.
We still had a working demo with example code, though. So we booted it up, and sure enough it worked just fine. It was running version 1.5.0, while the newest version was 1.5.1.. So instead of including 1.5.1 in our proof of concept code, we decided to go with 1.5.0. After all, the example code worked fine. Why wouldn't the rest of it work?
Well, for some bizarre and unknown reason, that code broke too. Same exact error as the 1.5.1 library, even though the example code worked fine. Our demo code with the 1.5.0 code still worked, and so did reusing the example code for 1.5.0. But when we tried to migrate the Pulse Sensor code to our proof of concept, it broke every single time.
At this point, we had already invested nearly an hour of time trying to get the pulse sensor to work. No matter what we tried, it simply refused to. So instead of wasting more time, we decided to cut it. For the proof of concept, we only really needed a location and a screen display that location. So we decided to work on those for a change.
Hardware that didn't quite Fail
To be quite honest, hardware and software are both hard. And for us in the Hardware IoT section that involves programming our devices, getting both to work together is even more difficult. Case in point - another many years of time used getting the screen to work. Aside from cutting out that weird Texas Instruments part, we did have to do quite a bit of work to get it working. And the other hard part about it - if it's not working 100%, it won't do anything.
Actually, that was a lie. The backlight controls and the screen control are separate. Aside from that, however, we had zero feedback as to where our errors were.
And speaking of errors, there were quite a few of them getting the screen to work. Our Photon currently has a significant amount of its pins in use just for the screen. Misplace any one of them, and the entire thing fails and nobody knows why. In fact, we broke it multiple times over the course of testing. Mostly, it was accidentally pulling out a wire. But sometimes it was trickier., We spent a good 20 minutes trying to get the screen working once, only to find out that somehow a wire had moved from A3 to A4. Resetting it quickly fixed the issue.
One Grand Realization
Experience really does matter in this line of work. There are so many different things that all have to be working in perfect harmony. Even though I have quite a bit of experience coming into this class, I've still been stuck on problems for many hours that turn out to have simple solutions. And it's not because I refuse to learn from my past mistakes. No, it's because there are so many possible mistakes to make, that it becomes impossible to not make one eventually. Even the most experienced of engineers will end up hooking up a wire incorrectly. Some may spend hours in their current state trying to diagnose the issue. Others may decided to simply tear down their machine and start over.
Sometimes I feel like I should do the latter.
Subscribe to:
Posts (Atom)