When an FPU is not an FPU
After finishing up the loose ends, I went to look at the XOs. Mavrothal has already ported the bulk of FatdogArm to XO-4, with only minor nuisance left, so I thought I'd like to attack XO-1.75, which Mav has attempted to and found intractable problems: Xorg desktop won't start (illegal instructions), autoconf won't start (illegal instructions), wpa_supplicant won't start (illegal instructions) (e.g here), so I thought it would be a interesting challenge. And a challenge it was, indeed!I booted of Mav's FatdogArm's XO-build and was immediately faced with the same problem. And it's not only those, even innocuous programs like "tar" or "ps" would crash with illegal instructions. In the beginning we thought that the crash could be caused by the accidental use of NEON instructions (which XO-1.75's CPU, Armada 610 SoC, doesn't have) - perhaps used by pixman (used by Xorg) and encryptions (wpa_supplicant). But surely "tar" and especially "ps" have nothing to do with SIMD processing, so it is unlikely that NEON has anything to do with it.
Gdb was no help - it itself was crashing . I followed the same general steps that I took when troubleshooting Seamonkey illegal instruction, but it didn't get me far.
I got a major clue when I managed to run "tar" without crashing, that is, when I run "tar --version" - it worked well, it showed the version number, and there was no crash. But if I run "tar" by itself - then I got the crash. This informed me that the cause of the crash must have happened after dynamic linking is done; otherwise it will always crash no matter what.
"Tar" is a rather big application, but "ps" is small and it also crashed. So I hacked "ps" source and start peppering it with printf statements to see where the crash happened. This got me to the function that crashed - the "uptime" function. But why did it crash? Why did the entry to the call make the entire program crash (the body of the function never got executed).
I did a few tests and after a while I managed to create a 5-line C program that will crash too:
#include <stdio.h>
main() {
float a=3.0;
printf("%f\n",a);
}
So was it "printf" that was bad? Is there something wrong my with glibc built? But this same program - crashed also when I run it on the Fedora that shipped with XO. I'm quite sure Fedora's libc is fine because other Fedora apps work fine, not crashing. So it has to be my 5-line program.
So I went to "objdump -d" to look at the assembly code generated. I don't know much ARM assembly language, I just hoped that my knowledge of x86 assembly will carry me through --- and it did. A few further experimentation pinpointed the problem with this innocuous instruction: vcvt.f64.f32 d16,s15 (that's a floating-point conversion instruction, converting from single precision to double precision, data source from register s15 and storing the result or register d16). So the crash is caused by a floating point instructions, which isn't surprising because FatdogArm is built from ground-up as a "hardfloat" system.
But there is only one problem - that instruction worked on A10! So what gives? Both A10 SoC and Armada 610 SoC have FPUs which conforms to VFPv3, so the instruction should work on both, right?
Wrong. ARM FPU doesn't come in a single variant. In fact the VFPv3 FPU comes in 4 (four) variants, two of which are common. Look here for the details, one can see that ARM supports multiple versions of the VFPv3 (if you count VFPv2 and VFPv4, you'll get even more variants, but we already know that our SoCs only support VFPv3 so we can ignore the rest).
The major two common variants are VFPv3 (also known as VFPv3-d32 or the full VFPv3, which comes with 32 double-precision registers), and VFPv3-d16 (which comes with half the number of registers, 16 only). And this confirms that Armada 610 only supports VFPv3-d16. In fact, one can look at /proc/cpuinfo and see the same information - but I didn't see it before. XO-1.75's /proc/cpuinfo contains cpuflags of "vfp vfpv3 vfpv3d16" and I thought those combinations means that the CPU supports all combinations. Well I was wrong; the flags means that while "vfpv3" instructions are supported, only the "vfpv3d16" subset will work.
Armed with this information, it is easy to see what the problem is. With 32 registers, one has registers named "d0" to "d31". With 16 registers, one only has "d0" to "d15". And what was the instruction again? "vcvt.f64.f32 d16,s15" --- so that was the problem, it was trying to access a register ("d16") that doesn't exist in vfpv3-d16 - and of course, one will get a "illegal instruction" for that.
The solution? Easy - just re-compile the application with "--mfpu=vfpv3-d16" (instead of the default --mfpu=vfpv3) and then my 5-line C program worked. I did the same for "ps" and it worked too.
Of course, it is one thing to fix one program, it is another issue to fix the entire distro (which apparently the task that I need to do if I want FatdogArm to run on XO-1.75 ... )
After note: I could have short-circuited all the above if I just look at Fedora's gcc build-time configure flags (by running "gcc -v"), which would have told me that it is configured for vfpv3-d16 instead of vfpv3. At the very least Debian publishes explicitly what its hardfloat port will run on.
Edit - Delete
No comments posted yet.