Bug 812928
Summary: | clisp is FTBFS on ARM | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Peter Robinson <pbrobinson> |
Component: | clisp | Assignee: | Jerry James <loganjerry> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | green, gwync, loganjerry, Robert.Harley |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-11-25 15:25:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418 |
Description
Peter Robinson
2012-04-16 15:21:05 UTC
It's running tests, and is in the middle of running the foreign function tests when it hits an illegal instruction. This is almost certainly a bug in the ffcall library. There's a bunch of assembly language in ffcall to form the glue between C and Lisp. I can see numerous places where things can go wrong, starting with the fact that the assembly language files are generated from C sources, and in the ARM case were last generated with gcc 2.95. Regenerating the assembly language files involves some voodoo. Once I regenerate them, ffcall starts failing its testsuite on i386 due to accessing 4 bytes past the end of the stack. Sigh. I'm looking into this, but it may take me awhile just to figure out how to not break the Intel platform. Once I've done that, I'll probably come begging on IRC for access to an ARM device so I can see if I've fixed this problem. Works for me :) No, ffcall is not the problem. Clisp contains some ARM assembly, namely this (lighly macroexpanded for readability): /* extern uintD* copy_loop_up (uintD* sourceptr, uintD* destptr, uintC count); entry a1 = source pointer a2 = destination pointer a3 = count of words to store exit a1 = address of last word stored + 1 a2 - a4, ip destroyed */ .global copy_loop_up /* word aligned copy loop up */ copy_loop_up: ANDS a4,a3,#3 /* multiple of 4 words ? */ BEQ copy_loop_up_l1 /* yup, so branch */ CMP a4,#2 /* copy the first 1-3 words */ LDR a4,[a1],#4 /* to align the total to a multiple */ STR a4,[a2],#4 /* of 4 words */ LDRGE a4,[a1],#4 STRGE a4,[a2],#4 LDRGT a4,[a1],#4 STRGT a4,[a2],#4 copy_loop_up_l1: BICS a4,a3,#3 /* set counter to multiple of 4 */ MOVEQ a1,a2 /* return addr of last word stored */ MOVEQS pc,lr /* if zero then we're done */ STMFD sp!,{v1,lr} /* save work regs */ copy_loop_up_l2: LDMIA a1!,{a3,v1,ip,lr} /* copy 4 words in one go */ STMIA a2!,{a3,v1,ip,lr} SUBS a4,a4,#8 /* decrement counter by 8 */ LDMGEIA a1!,{a3,v1,ip,lr} /* if count still positive then copy */ STMGEIA a2!,{a3,v1,ip,lr} /* 4 more words */ BGT copy_loop_up_l2 /* and loop */ MOV a1,a2 /* return addr of last word stored */ LDMFD sp!,{v1,pc}^ /* restore work regs and return */ The illegal instruction error is being triggered on that very last instruction, "LDMFD sp!,{v1,pc}^". I don't read ARM assembly. Can someone who does tell me what is wrong with that instruction? For completeness, here is how GDB disassembles that function in the executable: 0x00106b08 <+0>: ands r3, r2, #3 0x00106b0c <+4>: beq 0x106b2c <copy_loop_up_l1> 0x00106b10 <+8>: cmp r3, #2 0x00106b14 <+12>: ldr r3, [r0], #4 0x00106b18 <+16>: str r3, [r1], #4 0x00106b1c <+20>: ldrge r3, [r0], #4 0x00106b20 <+24>: strge r3, [r1], #4 0x00106b24 <+28>: ldrgt r3, [r0], #4 0x00106b28 <+32>: strgt r3, [r1], #4 0x00106b2c <+36>: bics r3, r2, #3 0x00106b30 <+40>: moveq r0, r1 0x00106b34 <+44>: movseq pc, lr 0x00106b38 <+48>: push {r4, lr} 0x00106b3c <+52>: ldm r0!, {r2, r4, r12, lr} 0x00106b40 <+56>: stmia r1!, {r2, r4, r12, lr} 0x00106b44 <+60>: subs r3, r3, #8 0x00106b48 <+64>: ldmge r0!, {r2, r4, r12, lr} 0x00106b4c <+68>: stmiage r1!, {r2, r4, r12, lr} 0x00106b50 <+72>: bgt 0x106b3c <copy_loop_up_l2> 0x00106b54 <+76>: mov r0, r1 0x00106b58 <+80>: ldm sp!, {r4, pc}^ We recommend you don't do custom ARM assembler but let gcc and co work it out as it tends to generate better optimisation than hand crafted ASM based on our options (In reply to comment #4) > We recommend you don't do custom ARM assembler but let gcc and co work it > out as it tends to generate better optimisation than hand crafted ASM based > on our options The clisp project has assembly language routines for a variety of processors in order to perform various arithmetic operations where the C semantics differ from the Common Lisp semantics. Admonishing upstream to use C instead of assembler will do no good, as the C semantics are not correct. Admonishing me to use C instead of assembler is completely useless, as I am not an upstream developer. I'm just the poor package maintainer who has to try to get this code working on a platform I don't know much about. Do you know why that instruction triggers an illegal instruction error? Is it, in fact, illegal or have we discovered a bug in the ARM emulator used by qemu? (How does ARM koji work? Is it building on a real ARM processor, or is it emulated, too?) (In reply to comment #5) > (In reply to comment #4) > > We recommend you don't do custom ARM assembler but let gcc and co work it > > out as it tends to generate better optimisation than hand crafted ASM based > > on our options > > The clisp project has assembly language routines for a variety of processors > in order to perform various arithmetic operations where the C semantics > differ from the Common Lisp semantics. Admonishing upstream to use C > instead of assembler will do no good, as the C semantics are not correct. > Admonishing me to use C instead of assembler is completely useless, as I am > not an upstream developer. I'm just the poor package maintainer who has to > try to get this code working on a platform I don't know much about. I'm not "Admonishing upstream to use C" at all.... we "recommend" because in just about all situations the compiler optimises properly. If that's not the case for clisp that's fine but it's your job as the maintainer of the package in Fedora to work with upstream to fix the issue. > Do you know why that instruction triggers an illegal instruction error? Is > it, in fact, illegal or have we discovered a bug in the ARM emulator used by > qemu? (How does ARM koji work? Is it building on a real ARM processor, or > is it emulated, too?) No, we like mainline primary architectures use HW and only HW to build. Ultimately it needs to run on the whole selection of ARM hardware just like we need to run on a whole selection of x86 HW. I'll ask someone who knows the ARM architecture more intimately than I to check the exact instructions. In the mean time please interact with upstream in parallel LDMFD sp!,{v1,pc}^ This instruction means "LoaD Multiple registers from a Full Descending stack". Register sp is the Stack Pointer and the exclamation mark (!) means write back the new stack pointer after loading. The registers to load are v1 and pc, the Program Counter. The function earlier pushed v1 and lr, the Link Register, so this instruction will return to the caller. The problem is probably the caret (^) at the end. On old 26-bit ARMs it was for restoring flags but nowadays it's reserved for privileged modes like exception handlers. You should probably just delete it. Then again this function is just a memcpy() so it's not clear why it's hand-coded. Perhaps to gain a few cycles by using the fact that the pointers and size are multiple of whole words, not just bytes... - Rob. Oh, gosh, I'm sorry. I've been really overloaded at $DAYJOB lately and am not keeping up with my Fedora reponsibilities very well. I had already figured this out. Removing the carat is necessary, but insufficient. I also had to change instructions of the form "MOVS pc,lr" to "BX lr", else I would get illegal instruction errors. The clisp build now goes quite a bit further, but still ultimately fails with what appears to be a garbage collector-related problem. I've asked upstream for help with that, with no response so far. I need to collect some more information and send it upstream again, I guess. Jerry, if you have something to test, I'd be happy to try it on my Pi. Thanks, Jon. The F18 and Rawhide git branches both contain fixes for the ARM assembly language problems. However, the ARM build still fails with some kind of garbage collector problem. I don't have any clear idea on how to track down the source of the problem, nor easy access to any ARM hardware. :-( I'll poke around with it on my Pi and see what I can do. Would shell access to it help? Thanks, Jon. It looks like I can now have an ARM VM on my x86_64 host. I'll try using that to track down the problem. Wish me luck. Good luck. :) Builds in F-18 |