When an FPU is not an FPU
After finishing up the loose ends, I went to look at the XOs. Mavrothal has already ported the bulk of FatdogArm to XO-4, with only minor nuisance left, so I thought I'd like to attack XO-1.75, which Mav has attempted to and found intractable problems: Xorg desktop won't start (illegal instructions), autoconf won't start (illegal instructions), wpa_supplicant won't start (illegal instructions) (e.g here), so I thought it would be a interesting challenge. And a challenge it was, indeed!I booted of Mav's FatdogArm's XO-build and was immediately faced with the same problem. And it's not only those, even innocuous programs like "tar" or "ps" would crash with illegal instructions. In the beginning we thought that the crash could be caused by the accidental use of NEON instructions (which XO-1.75's CPU, Armada 610 SoC, doesn't have) - perhaps used by pixman (used by Xorg) and encryptions (wpa_supplicant). But surely "tar" and especially "ps" have nothing to do with SIMD processing, so it is unlikely that NEON has anything to do with it.
Gdb was no help - it itself was crashing . I followed the same general steps that I took when troubleshooting Seamonkey illegal instruction, but it didn't get me far.
I got a major clue when I managed to run "tar" without crashing, that is, when I run "tar --version" - it worked well, it showed the version number, and there was no crash. But if I run "tar" by itself - then I got the crash. This informed me that the cause of the crash must have happened after dynamic linking is done; otherwise it will always crash no matter what.
"Tar" is a rather big application, but "ps" is small and it also crashed. So I hacked "ps" source and start peppering it with printf statements to see where the crash happened. This got me to the function that crashed - the "uptime" function. But why did it crash? Why did the entry to the call make the entire program crash (the body of the function never got executed).
I did a few tests and after a while I managed to create a 5-line C program that will crash too:
#include <stdio.h>
main() {
float a=3.0;
printf("%f\n",a);
}
So was it "printf" that was bad? Is there something wrong my with glibc built? But this same program - crashed also when I run it on the Fedora that shipped with XO. I'm quite sure Fedora's libc is fine because other Fedora apps work fine, not crashing. So it has to be my 5-line program.
So I went to "objdump -d" to look at the assembly code generated. I don't know much ARM assembly language, I just hoped that my knowledge of x86 assembly will carry me through --- and it did. A few further experimentation pinpointed the problem with this innocuous instruction: vcvt.f64.f32 d16,s15 (that's a floating-point conversion instruction, converting from single precision to double precision, data source from register s15 and storing the result or register d16). So the crash is caused by a floating point instructions, which isn't surprising because FatdogArm is built from ground-up as a "hardfloat" system.
But there is only one problem - that instruction worked on A10! So what gives? Both A10 SoC and Armada 610 SoC have FPUs which conforms to VFPv3, so the instruction should work on both, right?
Wrong. ARM FPU doesn't come in a single variant. In fact the VFPv3 FPU comes in 4 (four) variants, two of which are common. Look here for the details, one can see that ARM supports multiple versions of the VFPv3 (if you count VFPv2 and VFPv4, you'll get even more variants, but we already know that our SoCs only support VFPv3 so we can ignore the rest).
The major two common variants are VFPv3 (also known as VFPv3-d32 or the full VFPv3, which comes with 32 double-precision registers), and VFPv3-d16 (which comes with half the number of registers, 16 only). And this confirms that Armada 610 only supports VFPv3-d16. In fact, one can look at /proc/cpuinfo and see the same information - but I didn't see it before. XO-1.75's /proc/cpuinfo contains cpuflags of "vfp vfpv3 vfpv3d16" and I thought those combinations means that the CPU supports all combinations. Well I was wrong; the flags means that while "vfpv3" instructions are supported, only the "vfpv3d16" subset will work.
Armed with this information, it is easy to see what the problem is. With 32 registers, one has registers named "d0" to "d31". With 16 registers, one only has "d0" to "d15". And what was the instruction again? "vcvt.f64.f32 d16,s15" --- so that was the problem, it was trying to access a register ("d16") that doesn't exist in vfpv3-d16 - and of course, one will get a "illegal instruction" for that.
The solution? Easy - just re-compile the application with "--mfpu=vfpv3-d16" (instead of the default --mfpu=vfpv3) and then my 5-line C program worked. I did the same for "ps" and it worked too.
Of course, it is one thing to fix one program, it is another issue to fix the entire distro (which apparently the task that I need to do if I want FatdogArm to run on XO-1.75 ... )
After note: I could have short-circuited all the above if I just look at Fedora's gcc build-time configure flags (by running "gcc -v"), which would have told me that it is configured for vfpv3-d16 instead of vfpv3. At the very least Debian publishes explicitly what its hardfloat port will run on.
Comments - Edit - Delete
Seamonkey illegal instruction
The Seamonkey compilation for FatdogArm is finally done. I must have compiled it over 10 times - I lost count already. Once I overcame the libxul.so linking failure problem, everything went smooth - except that the resulting binary refused to run, with a simple message "illegal instructions" or sometimes "segmentation fault". And this happened for all subsequent compiles, no matter what configure flag I used.I went through 4 or 5 more compiles before I realised what was wrong, and this is the story.
Once I was quite certain that the crash had nothing to do with the configure flags, I tried to use gdb to figure out the crash, but it didn't really help. It didn't give meaningful stack trace and it refused to disassemble the location that contains the illegal instruction. I can't even put a breakpoint on or near the crash location.
Then I ran strace. It gave me something useful - the last system call called before the crash (which is, open /proc/self/auxv). There were a few locations in seamonkey code base that do this, but they all seem a bit illogical to me (a few of the location are the libs which didn't even get compiled in because of the configure flags I used).
Then I looked at the content of /proc/self/auxv itself. Among others, it gave me the location of the base address of the dynamic linker (ld-linux.so). Comparing that address with the addresses shown in strace, it became clear that those strace calls - including the last one before the crash - were not seamonkey's; they were in fact calls made by the dynamic linker. This was confirmed by looking at the /proc/self/map - those adderess were indeed mapped to ld-linux.so. What does it mean? It means the the crash happened before seamonkey itself started to run; it happened in the dynamic linker itself, when it was preparing seamonkey for execution.
The only reason for that crash to happen is that when the executable (or one of the dynamic libraries) is/are corrupted.
But who or what can corrupt freshly compiled binaries? It can't be gcc or the linker (unless somehow seamonkey managed to trigger very subtle and obscure toolchain bugs), because all other binaries I built so far works perfectly. As it turns out this was the culprit: https://wiki.mozilla.org/Elfhack.
Once I realised this, it was pretty straightforward to put "--disable-elf-hack" to the configure flags. The resulting seamonkey worked very well - indeed, now the Calendar function is working too (it didn't work in 2.19). I wonder what has happened between 2.19 and 2.20, because I certainly didn't use that switch when I built 2.19 and 2.19 compiled cleanly on the first attempt. I could have tried to look at the differences but for now I'm happy that SM 2.20 works.
SM 2.20 will be the default browser in alpha2 release of FatdogArm.
Comments - Edit - Delete
Speeding up ARM compilation
Compiling on the ARM (Mele) is slow. I recently tried to build the latest version of Seamonkey (SM) 2.20 for FatdogArm. As I said in this post, each build takes about 15-18 hours on the Mele. Yes, it is that slow. Now in the case of SM 2.20 it is even worse. Even after 24 hours it failed to build - because apparently 512MB RAM is now no longer enough to build it (it fails at the link phase). I tried twice - each totalling about 24 hours - and both failed. That's two days wasted.I wanted to try again but I wanted to have a better way than this. I can't be spending 24hours for every build. There must be a better way than this.
Fortunately, there is - enter distcc, a distributed C compiler. distcc allows you to combine the power of several machines to simultaneously compile a single package. The machines don't even have to be identical, if they have the appropriate cross-compiler installed they can be used to join compilation-cluster of other platforms.
Using distcc, the ARM machine acts as the main controller, spreading the compilation load to other, faster machines. From the package build standpoint, however, it still looks and feels like native compilation. Thus, all the ease of native compilation with the speed of cross-compilation.
Using Fatdog64 in combination with FatdogArm, I managed to cutdown the "compilation phase" of SM 2.20 from 15 hours to 3 hours (plus another 3 hours for linking on either side). That's a 5-fold increase in speed, allowing me to perform 4 builds in 24 hours.
Do I get your attention already? This is how to do it.
The fight with SM is still on-going though. Despite the fact that I can compile much faster now, SM still refuses to build because it trashes the swapfile during libxul.so linking stage. In simple words - it is running out of memory. My Mele only has 512MB and apparently it's not good enough, and I don't have any other hardware with more memory, so I'm now building it using distcc-assisted Qemu (here for details of running FatdogArm in Qemu) - using the Vexpress emulation with 1GB RAM. This is similar to what Aboriginal Linux does. For SM build, however, it is very slow, because while the C and C++ compilation is much faster, SM has tons of python code as part of its build system - and python code can't be farmed out to other machines . Still, without distcc, I wouldn't even consider to do it in Qemu at all.
Comments - Edit - Delete
How to run FatdogArm under Qemu
Haha, only yesterday I said I had written the final article for FatdogArm series. Well, never say never . A bloke in the forum (well, not just any bloke, he's actually a good friend of mine too - cheers Mick!) asked me how to run FatdogArm in Qemu, and why not, since I wrote the original HOWTO to get Puppy Arm run under Qemu too.So I wrote the article, and then spent quite a bit of time trying to figure out a good combination of kernel and qemu-supported platform to provide an environment that can run with:
- 512MB or 1GB RAM
- ARMv7
The writing is not so bad; finding a working combination on the other hand actually took a bit of effort - since many of the documentations are "in the source form" (as in, you'd better to read the source code to know what's going on, mate!). In the end I settled with the latest 3.4 stable kernel (similar to the one I use for Mele) and Vexpress and Realview platform for the emulation.
Here is the article, enjoy: How to run FatdogArm under Qemu.
Comments - Edit - Delete
The final FatdogArm porting article
Final article of the FatdogArm porting series, about optimising your build and making use of available SoC/platform features. Since these are usually highly-platform-specific, the article has less depth and others and instead try to provide the bird's eye view of the available strategies.It is a difficult subject as well because many times the information required to use the platform/SoC fully is not available (or only available under NDA); and anyone who is "outside" of the circle and still wishes to use them must do the hard-work of "reverse-engineering" to figure out how to write the drivers for them --- for this, I cheers and salute many of reverse-engineering teams from linux-sunxi (and many others - the Lima graphics, nouveau teams) whose unending efforts have provided the rest of us with usable drivers for devices which otherwise can only act as resistors.
Although this is supposedly the final article, I may expand and revise the series as the need arise or when new information are revealed. But for now, this is the end.
Comments - Edit - Delete
FatdogArm alpha is released.
FatdogArm page: http://distro.ibiblio.org/fatdog/web/arm-index.htmlRelease notes: http://distro.ibiblio.org/fatdog/web/arm-latest.html.
Forum thread: http://murga-linux.com/puppy/viewtopic.php?t=88307.
How to adopt FatdogArm for other platforms: AdoptingFatdogArm.
Comments - Edit - Delete
FatdogArm on OLPC XO laptop
In my previous post I said:In case you are wondering, I already have another target system for FatdogArm - all will be revealed in due time .
Well the cat is out of the bag: FatdogArm has been "adopted" by the Puppy_on_OLPC project to run on One Laptop Per Child - OLPC's XO laptop .
I have been working with mavrothal (forum member from Puppy Linux Forum). mavrothal is the key person and project lead for PuppyLinux XO (XOpup for short) project. XOpup is a modified version of Puppy Linux specially created to run on XO laptop.
When XO transitioned their laptops hardware from x86 (AMD Geode and VIA7-based) to ARM, the XOpup project went into slow-down because there were no Puppy Linux for the ARM platform (well, there was Puppy Lui for Mele and Puppy Sap6 but both were not updated for a long while); thus there was no base for build XOpup for ARM-based XO.
FatdogArm (and Fatdog) is a fork of Puppy Linux which still maintains the spirit and ideals of Puppy Linux; plus it tries to be SoC-agnostic as much as possible; thus it is a natural conclusion to use FatdogArm a base to continue XOpup - and the collaboration is born.
I am still waiting (and looking forward to!) the arrival of my XO-1.75 laptop, meanwhile mavrothal has gone forward and made FatdogArm to run on XO-4 laptop, here with a picture: http://murga-linux.com/puppy/viewtopic.php?p=721450#721450
Exciting days afoot!
Comments - Edit - Delete
FatdogArm - the last hurdle
The last few days were hard work in choosing and deciding package manager to use (spoiler: I've finally decided on Slackware's pkgtools and slapt-get/gslapt); and then converting packages from paco's format into pkgtools, adding descriptions and dependency information, etc.The choosing of the package manager is especially important. In case one wonder why a lowly package manager is important, please consider what its job is, here. You may be surprised.
Along with that, I have reduced the size of the basesfs from 450MB to a more manageable 280MB by uninstalling less-used packages (they are in the repository in case you need them) and getting rid of the /usr/share/locale (that's a 130MB worth of stuff in itself), and removing all the static libraries except those required by gcc (saves 30MB). I could go lower by taking out the docs and using xz compression instead of gzip - in fact I can easily go down to less than 200MB, but that can wait (and I like having all the docs available locally too).
All in all, FatdogArm is now nearing completion. I just need to tie some loose ends; some other articles I plan to write, etc. The alpha release of FatdogArm image is imminent. It only work on Mele but due to the way it is built; it is easy to adopt it for other systems too. In fact, now that I think about it, I may write an article of how exactly to do that. (In case you are wondering, I already have another target system for FatdogArm - all will be revealed in due time ).
Comments - Edit - Delete
Touchscreen input for FatdogArm
The tablet provide to be a distraction I am supposed to start doing something about the package management, but instead I was tinkering with the touchscreen to try to get it working.I was rather lucky, the touchscreen was supported by the kernel I built, so it was a matter of modprobing the correct driver, followed by building tslib and xf64-input-tslib (touchscreen driver for Xorg). With this, the touchscreen then works like a touchpad: I can move the pointer, tap-to-click, and tap-to-drag.
When added with xvkbd (virtual keyboard), one gets a rather complete input system on the tablet: mouse and keyboard entry. However I can tell you that using mouse idioms on a touchscreen is very inefficient indeed, unless large adjustments are made.
The main toolkit on FatdogArm currently is GTK2 which isn't touch-aware. I tried reading a document with evince - it works, but to scroll I need to tap the scrollbar and the scrollbar is tiny. It could be made better if I use a larger scrollbar, but still.
GTK3 is rumoured to support touch gestures. I suppose I should try that - when the tablet returns.
Ref: Touchscreen Input article.
Comments - Edit - Delete
FatdogArm on Tablet
Fatdog Arm reaches another milestone today - it is now capable of booting straight to desktop. Booting to command line only takes 5 seconds (that includes the 3 second wait for SD card media to be recognised), booting to full desktop takes about 25 seconds (may be because of slow SD card, gzip decompression, etc). Once in the desktop, the applications are zippy (they are a bit sluggish to start but once running, works fast). We're talking the entire build here - nothing cutdown yet, all the 450MB gzip-compressed SFS goodness with all the toolchains, headers, static libraries, and what have you.As a bonus, I put the same stuff on the micro SD card for the ARM tablet I mentioned in the previous post, and here is the picture of that tablet running Fatdog Arm.
Unfortunately, I am going to lose that tablet soon (my mum is going on a trip and she will take that tablet with her).
But yes, apart from that, I am excited !!
Comments - Edit - Delete