Bug 812928

Summary: clisp is FTBFS on ARM
Product: [Fedora] Fedora Reporter: Peter Robinson <pbrobinson>
Component: clispAssignee: Jerry James <loganjerry>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: green, gwync, loganjerry, Robert.Harley
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-25 15:25:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418    

Description Peter Robinson 2012-04-16 15:21:05 UTC
I can't really see why it's failing

http://arm.koji.fedoraproject.org/taskinfo?taskID=729920

Comment 1 Jerry James 2012-04-18 20:28:19 UTC
It's running tests, and is in the middle of running the foreign function tests when it hits an illegal instruction.  This is almost certainly a bug in the ffcall library.

There's a bunch of assembly language in ffcall to form the glue between C and Lisp.  I can see numerous places where things can go wrong, starting with the fact that the assembly language files are generated from C sources, and in the ARM case were last generated with gcc 2.95.  Regenerating the assembly language files involves some voodoo.  Once I regenerate them, ffcall starts failing its testsuite on i386 due to accessing 4 bytes past the end of the stack.  Sigh.

I'm looking into this, but it may take me awhile just to figure out how to not break the Intel platform.  Once I've done that, I'll probably come begging on IRC for access to an ARM device so I can see if I've fixed this problem.

Comment 2 Peter Robinson 2012-04-18 23:29:06 UTC
Works for me :)

Comment 3 Jerry James 2012-05-21 16:50:12 UTC
No, ffcall is not the problem.  Clisp contains some ARM assembly, namely this (lighly macroexpanded for readability):

/* extern uintD* copy_loop_up (uintD* sourceptr, uintD* destptr, uintC count);
       entry
               a1 = source pointer
               a2 = destination pointer
               a3 = count of words to store
       exit
               a1 = address of last word stored + 1
               a2 - a4, ip destroyed */
        .global copy_loop_up          /* word aligned copy loop up */
copy_loop_up:
        ANDS    a4,a3,#3        /* multiple of 4 words ? */
        BEQ     copy_loop_up_l1 /* yup, so branch */
        CMP     a4,#2           /* copy the first 1-3 words */
        LDR     a4,[a1],#4      /* to align the total to a multiple */
        STR     a4,[a2],#4      /* of 4 words */
        LDRGE   a4,[a1],#4
        STRGE   a4,[a2],#4
        LDRGT   a4,[a1],#4
        STRGT   a4,[a2],#4
copy_loop_up_l1:
        BICS    a4,a3,#3        /* set counter to multiple of 4 */
        MOVEQ   a1,a2           /* return addr of last word stored */
        MOVEQS  pc,lr           /* if zero then we're done */
        STMFD   sp!,{v1,lr}     /* save work regs */
copy_loop_up_l2:
        LDMIA   a1!,{a3,v1,ip,lr} /* copy 4 words in one go */
        STMIA   a2!,{a3,v1,ip,lr}
        SUBS    a4,a4,#8          /* decrement counter by 8 */
        LDMGEIA a1!,{a3,v1,ip,lr} /* if count still positive then copy */
        STMGEIA a2!,{a3,v1,ip,lr} /* 4 more words */
        BGT     copy_loop_up_l2   /* and loop */
        MOV     a1,a2             /* return addr of last word stored */
        LDMFD   sp!,{v1,pc}^      /* restore work regs and return */

The illegal instruction error is being triggered on that very last instruction, "LDMFD sp!,{v1,pc}^".  I don't read ARM assembly.  Can someone who does tell me what is wrong with that instruction?  For completeness, here is how GDB disassembles that function in the executable:

0x00106b08 <+0>:     ands    r3, r2, #3
0x00106b0c <+4>:     beq     0x106b2c <copy_loop_up_l1>
0x00106b10 <+8>:     cmp     r3, #2
0x00106b14 <+12>:    ldr     r3, [r0], #4
0x00106b18 <+16>:    str     r3, [r1], #4
0x00106b1c <+20>:    ldrge   r3, [r0], #4
0x00106b20 <+24>:    strge   r3, [r1], #4
0x00106b24 <+28>:    ldrgt   r3, [r0], #4
0x00106b28 <+32>:    strgt   r3, [r1], #4
0x00106b2c <+36>:    bics    r3, r2, #3
0x00106b30 <+40>:    moveq   r0, r1
0x00106b34 <+44>:    movseq  pc, lr
0x00106b38 <+48>:    push    {r4, lr}
0x00106b3c <+52>:    ldm     r0!, {r2, r4, r12, lr}
0x00106b40 <+56>:    stmia   r1!, {r2, r4, r12, lr}
0x00106b44 <+60>:    subs    r3, r3, #8
0x00106b48 <+64>:    ldmge   r0!, {r2, r4, r12, lr}
0x00106b4c <+68>:    stmiage r1!, {r2, r4, r12, lr}
0x00106b50 <+72>:    bgt     0x106b3c <copy_loop_up_l2>
0x00106b54 <+76>:    mov     r0, r1
0x00106b58 <+80>:    ldm     sp!, {r4, pc}^

Comment 4 Peter Robinson 2012-05-21 17:03:01 UTC
We recommend you don't do custom ARM assembler but let gcc and co work it out as it tends to generate better optimisation than hand crafted ASM based on our options

Comment 5 Jerry James 2012-05-21 17:31:54 UTC
(In reply to comment #4)
> We recommend you don't do custom ARM assembler but let gcc and co work it
> out as it tends to generate better optimisation than hand crafted ASM based
> on our options

The clisp project has assembly language routines for a variety of processors in order to perform various arithmetic operations where the C semantics differ from the Common Lisp semantics.  Admonishing upstream to use C instead of assembler will do no good, as the C semantics are not correct.  Admonishing me to use C instead of assembler is completely useless, as I am not an upstream developer.  I'm just the poor package maintainer who has to try to get this code working on a platform I don't know much about.

Do you know why that instruction triggers an illegal instruction error?  Is it, in fact, illegal or have we discovered a bug in the ARM emulator used by qemu?  (How does ARM koji work?  Is it building on a real ARM processor, or is it emulated, too?)

Comment 6 Peter Robinson 2012-05-21 18:02:00 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > We recommend you don't do custom ARM assembler but let gcc and co work it
> > out as it tends to generate better optimisation than hand crafted ASM based
> > on our options
> 
> The clisp project has assembly language routines for a variety of processors
> in order to perform various arithmetic operations where the C semantics
> differ from the Common Lisp semantics.  Admonishing upstream to use C
> instead of assembler will do no good, as the C semantics are not correct. 
> Admonishing me to use C instead of assembler is completely useless, as I am
> not an upstream developer.  I'm just the poor package maintainer who has to
> try to get this code working on a platform I don't know much about.

I'm not "Admonishing upstream to use C" at all.... we "recommend" because in just about all situations the compiler optimises properly. If that's not the case for clisp that's fine but it's your job as the maintainer of the package in Fedora to work with upstream to fix the issue.

> Do you know why that instruction triggers an illegal instruction error?  Is
> it, in fact, illegal or have we discovered a bug in the ARM emulator used by
> qemu?  (How does ARM koji work?  Is it building on a real ARM processor, or
> is it emulated, too?)

No, we like mainline primary architectures use HW and only HW to build. Ultimately it needs to run on the whole selection of ARM hardware just like we need to run on a whole selection of x86 HW.

I'll ask someone who knows the ARM architecture more intimately than I to check the exact instructions. In the mean time please interact with upstream in parallel

Comment 7 r3obh 2012-06-19 15:20:29 UTC
LDMFD   sp!,{v1,pc}^

This instruction means "LoaD Multiple registers from a Full Descending stack".  Register sp is the Stack Pointer and the exclamation mark (!) means write back the new stack pointer after loading.  The registers to load are v1 and pc, the Program Counter.  The function earlier pushed v1 and lr, the Link Register, so this instruction will return to the caller.

The problem is probably the caret (^) at the end.  On old 26-bit ARMs it was for restoring flags but nowadays it's reserved for privileged modes like exception handlers.  You should probably just delete it.

Then again this function is just a memcpy() so it's not clear why it's hand-coded.  Perhaps to gain a few cycles by using the fact that the pointers and size are multiple of whole words, not just bytes...

- Rob.

Comment 8 Jerry James 2012-06-19 16:23:05 UTC
Oh, gosh, I'm sorry.  I've been really overloaded at $DAYJOB lately and am not keeping up with my Fedora reponsibilities very well.  I had already figured this out.  Removing the carat is necessary, but insufficient.  I also had to change instructions of the form "MOVS pc,lr" to "BX lr", else I would get illegal instruction errors.  The clisp build now goes quite a bit further, but still ultimately fails with what appears to be a garbage collector-related problem.  I've asked upstream for help with that, with no response so far.  I need to collect some more information and send it upstream again, I guess.

Comment 9 Gwyn Ciesla 2012-09-20 17:23:37 UTC
Jerry, if you have something to test, I'd be happy to try it on my Pi.

Comment 10 Jerry James 2012-09-20 19:40:30 UTC
Thanks, Jon.  The F18 and Rawhide git branches both contain fixes for the ARM assembly language problems.  However, the ARM build still fails with some kind of garbage collector problem.  I don't have any clear idea on how to track down the source of the problem, nor easy access to any ARM hardware. :-(

Comment 11 Gwyn Ciesla 2012-09-21 12:54:08 UTC
I'll poke around with it on my Pi and see what I can do.  Would shell access to it help?

Comment 12 Jerry James 2012-09-21 14:24:08 UTC
Thanks, Jon.  It looks like I can now have an ARM VM on my x86_64 host.  I'll try using that to track down the problem.  Wish me luck.

Comment 13 Gwyn Ciesla 2012-09-21 14:25:21 UTC
Good luck. :)

Comment 14 Peter Robinson 2012-11-25 15:25:14 UTC
Builds in F-18