Bug 1035773 - Since glibc-2.18.90-11.fc21, ruby sigabrts in test suite on ARM
Summary: Since glibc-2.18.90-11.fc21, ruby sigabrts in test suite on ARM
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1033546
TreeView+ depends on / blocked
 
Reported: 2013-11-28 12:57 UTC by Vít Ondruch
Modified: 2016-11-24 12:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-14 13:43:48 UTC


Attachments (Terms of Use)
Ruby crash log (13.68 KB, text/x-log)
2013-11-28 12:57 UTC, Vít Ondruch
no flags Details
latest vs -10 (6.95 KB, text/x-log)
2013-11-29 15:01 UTC, Vít Ondruch
no flags Details

Description Vít Ondruch 2013-11-28 12:57:54 UTC
Created attachment 830216 [details]
Ruby crash log

Since glibc-2.18.90-11.fc21, Ruby fails to pass its test suite on ARM (see attached log). Could you please help identify/fix this issue. Thanks.

Comment 1 Carlos O'Donell 2013-11-28 14:32:09 UTC
Could you please double check something for me?

Does Ruby still pass the test suite with glibc-2.18.90-10.fc21 or glibc-2.18.90-9.fc21?

It would be very useful if you provided the version that worked that way I can have a closer look.

Thank you!

Comment 2 Vít Ondruch 2013-11-28 15:03:30 UTC
In my logs, I can see that I tested 2.18-6, 2.18.90 -1, -2, -5, -9, -10, -11, -13, so basically every release which have successful build available. And the -11 was the first failing.

Comment 3 Carlos O'Donell 2013-11-28 19:21:08 UTC
(In reply to Vít Ondruch from comment #2)
> In my logs, I can see that I tested 2.18-6, 2.18.90 -1, -2, -5, -9, -10,
> -11, -13, so basically every release which have successful build available.
> And the -11 was the first failing.

Thanks.

Just to confirm we're talking about rawhide?

Are you certain it isn't something else in rawhide that is causing the problem?

The ARM specific changes in -11 were:

+2013-10-04  Will Newton  <will.newton@linaro.org>
+
+       * sysdeps/arm/__longjmp.S (NO_THUMB): Remove define.
+       (__longjmp): Use Thumb supported instructions.
+       * sysdeps/unix/sysv/linux/arm/____longjmp_chk.S (NO_THUMB):
+       Remove define.
+
+       * sysdeps/arm/setjmp.S (NO_THUMB): Remove define.
+       (__sigsetjmp): Use Thumb supported instructions.
+
+2013-10-03  Will Newton  <will.newton@linaro.org>
+
+       * sysdeps/arm/__longjmp.S (__longjmp): Demangle fp, sp
+       and lr when restoring register values.
+       * sysdeps/arm/include/bits/setjmp.h (JMP_BUF_REGLIST): Remove
+       sp and lr from list and replace fp with a4.
+       * sysdeps/arm/jmpbuf-unwind.h (_jmpbuf_sp): New function.
+       (_JMPBUF_UNWINDS_ADJ): Call _jmpbuf_sp.
+       * sysdeps/arm/setjmp.S (__sigsetjmp): Mangle fp, sp and lr
+       before storing register values.
+       * sysdeps/arm/sysdep.h (LDST_GLOBAL): New macro.
+       * sysdeps/unix/sysv/linux/arm/sysdep.h (PTR_MANGLE): New macro.
+       (PTR_DEMANGLE): Likewise. (PTR_MANGLE2): Likewise.
+       (PTR_DEMANGLE2): Likewise.
+

So we've started using pointer mangling as a security feature to obfuscate fp, sp, and lr.

Here is a build without the ARM specific changes (closer to -10), you can get it from here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=6237791

Next steps:
- Can you please try reproducing the ruby testsuite failures with the scratch build?
- Can you please Try downgrading only glibc to -10 to see if the testsuite failures still persist?

I'd like to rule out any other component.

Comment 4 Vít Ondruch 2013-11-29 15:01:15 UTC
Created attachment 830695 [details]
latest vs -10

(In reply to Carlos O'Donell from comment #3)
> Just to confirm we're talking about rawhide?

Yes, definitely. The above mentioned versions are not available in oleder releases if I am not mistaken.

> Are you certain it isn't something else in rawhide that is causing the
> problem?

Well, that is good question. But by fiddling with glibc version, I was able to find that older works while -11 and newer does not. Of course it might be some transitional problem.

> The ARM specific changes in -11 were:
> 
> +2013-10-04  Will Newton  <will.newton@linaro.org>
> +
> +       * sysdeps/arm/__longjmp.S (NO_THUMB): Remove define.
> +       (__longjmp): Use Thumb supported instructions.
> +       * sysdeps/unix/sysv/linux/arm/____longjmp_chk.S (NO_THUMB):
> +       Remove define.
> +
> +       * sysdeps/arm/setjmp.S (NO_THUMB): Remove define.
> +       (__sigsetjmp): Use Thumb supported instructions.
> +
> +2013-10-03  Will Newton  <will.newton@linaro.org>
> +
> +       * sysdeps/arm/__longjmp.S (__longjmp): Demangle fp, sp
> +       and lr when restoring register values.
> +       * sysdeps/arm/include/bits/setjmp.h (JMP_BUF_REGLIST): Remove
> +       sp and lr from list and replace fp with a4.
> +       * sysdeps/arm/jmpbuf-unwind.h (_jmpbuf_sp): New function.
> +       (_JMPBUF_UNWINDS_ADJ): Call _jmpbuf_sp.
> +       * sysdeps/arm/setjmp.S (__sigsetjmp): Mangle fp, sp and lr
> +       before storing register values.
> +       * sysdeps/arm/sysdep.h (LDST_GLOBAL): New macro.
> +       * sysdeps/unix/sysv/linux/arm/sysdep.h (PTR_MANGLE): New macro.
> +       (PTR_DEMANGLE): Likewise. (PTR_MANGLE2): Likewise.
> +       (PTR_DEMANGLE2): Likewise.
> +
> 
> So we've started using pointer mangling as a security feature to obfuscate
> fp, sp, and lr.
> 
> Here is a build without the ARM specific changes (closer to -10), you can
> get it from here:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=6237791

The build against this version of glibc passes. It seems we are getting closer \o/

> Next steps:
> - Can you please try reproducing the ruby testsuite failures with the
> scratch build?

All started by this scratch build I got from mtasaka:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6220848

and this is fresh one, with latest glibc:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6238619

> - Can you please Try downgrading only glibc to -10 to see if the testsuite
> failures still persist?

See the attached log. I built with latest components and then downgraded glibc.

Comment 5 Carlos O'Donell 2013-12-10 01:35:01 UTC
(In reply to Vít Ondruch from comment #4)
> > So we've started using pointer mangling as a security feature to obfuscate
> > fp, sp, and lr.
> > 
> > Here is a build without the ARM specific changes (closer to -10), you can
> > get it from here:
> > http://koji.fedoraproject.org/koji/taskinfo?taskID=6237791
> 
> The build against this version of glibc passes. It seems we are getting
> closer \o/

Just to confirm the ruby testsuite is passing when built against the scratch build I provided (which has pointer mangling support disabled)?

Cheers,
Carlos.

Comment 6 Vít Ondruch 2013-12-10 09:32:44 UTC
(In reply to Carlos O'Donell from comment #5)
> the ruby testsuite is passing when built against the scratch
> build I provided (which has pointer mangling support disabled)?

Yes, exactly.

Comment 7 Carlos O'Donell 2013-12-10 15:41:29 UTC
(In reply to Vít Ondruch from comment #6)
> (In reply to Carlos O'Donell from comment #5)
> > the ruby testsuite is passing when built against the scratch
> > build I provided (which has pointer mangling support disabled)?
> 
> Yes, exactly.

Thanks, I've reached out to Will Newton (Linaro) to look into this since his code caused the regression.

https://sourceware.org/ml/libc-alpha/2013-12/msg00340.html

Comment 8 Vít Ondruch 2013-12-11 22:21:38 UTC
Thanks Carlos. I am trying to attract some ruby-core developer to join the discussion [1]. Not sure if I succeed.


[1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/59052

Comment 9 Vít Ondruch 2013-12-12 22:39:16 UTC
I've got some response from Ruby developers, if you can forward this reply:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/59078

Thanks

Comment 10 Carlos O'Donell 2013-12-12 23:22:27 UTC
(In reply to Vít Ondruch from comment #9)
> I've got some response from Ruby developers, if you can forward this reply:
> 
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/59078

I also have a new patch for you to test, I'll scratch build a new glibc for you shortly.

The short answer is that while Ruby's inspection of jmp_buf is a violation of the standard and undefined behaviour, we can't be jerks about this and break Ruby. So upstream is going to avoid encrypting the fp register on 32-bit ARM to fix Ruby until such time that Ruby can get an alternate implementation in place.

Cheers,
Carlos.

Comment 11 Carlos O'Donell 2013-12-14 15:40:05 UTC
Filed long-term upstream bug:
https://bugs.ruby-lang.org/issues/9249

Comment 12 Vít Ondruch 2013-12-16 08:11:09 UTC
Thank you Carlos. I appreciate that.

Comment 13 Carlos O'Donell 2014-01-07 20:38:33 UTC
Vit,

My apologies for the delay.

Scratch build with fix:
http://koji.fedoraproject.org/koji/taskinfo?taskID=6370942

Please tell me if the scratch build fixes the problem.

Comment 14 Vít Ondruch 2014-01-14 13:07:04 UTC
Hi Carlos, thanks for the scratch build. I was able to build ruby-2.0.0.353-17 with that version of glibc.

Comment 15 Carlos O'Donell 2014-01-14 13:43:48 UTC
(In reply to Vít Ondruch from comment #14)
> Hi Carlos, thanks for the scratch build. I was able to build
> ruby-2.0.0.353-17 with that version of glibc.

Excellent, I'll pass this to upstream and we'll get this fixed in glibc, and then synchronize rawhide.

Comment 16 Vít Ondruch 2014-01-15 09:00:20 UTC
Thanks! I appreciate that.

Comment 17 Carlos O'Donell 2014-01-15 18:17:29 UTC
(In reply to Vít Ondruch from comment #16)
> Thanks! I appreciate that.

Fixed upstream by this commit.

commit 2f10c4d6901e7a4c4ad294cc5bb8ece6547f4f62
Author: Will Newton <will.newton@linaro.org>
Date:   Tue Dec 10 16:26:38 2013 +0000

    ARM: Don't apply pointer encryption to the frame pointer
    
    The frame pointer register is rarely used for that purpose on ARM and
    applications that look at the contents of the jmp_buf may be relying
    on reading an unencrypted value. For example, Ruby uses the contents
    of jmp_buf to find the root set for garbage collection so relies on
    this pointer value being unencrypted. Without this patch the Ruby
    testsuite fails with a segmentation fault.
    
    ports/ChangeLog.arm:
    
    2013-01-14  Will Newton  <will.newton@linaro.org>
    
        * sysdeps/arm/__longjmp.S: Don't apply pointer encryption
        to fp register.
        * sysdeps/arm/setjmp.S: Likewise.
        * sysdeps/arm/include/bits/setjmp.h (JMP_BUF_REGLIST): Add
        fp to register list, remove a4.
        * sysdeps/unix/sysv/linux/arm/sysdep.h (PTR_MANGLE_LOAD):
        New macro.

Comment 18 Vít Ondruch 2014-02-10 10:52:21 UTC
Ruby are built now: https://koji.fedoraproject.org/koji/buildinfo?buildID=497192


Note You need to log in before you can comment on or make changes to this bug.