Bug 519081 - Random crashes with ld-linux loader on x86_64
Random crashes with ld-linux loader on x86_64
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
rawhide
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Andreas Schwab
Fedora Extras Quality Assurance
: Reopened
Depends On: 519391
Blocks: 519226
  Show dependency treegraph
 
Reported: 2009-08-24 19:46 EDT by Shawn Starr
Modified: 2009-08-31 13:33 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-08-31 08:38:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shawn Starr 2009-08-24 19:46:12 EDT
Description of problem:
With some compiled programs I have been able to crash the linux loader. This seems to be x86_64 specific.

Version-Release number of selected component (if applicable):
glibc-2.10.90-14.x86_64

How reproducible:
Quite often more then 80%

Steps to Reproduce:
1. Certain programs compiled will crash I cannot say what but two of them I have been able to crash loader.

Results from debug:
(gdb) run --check --cache /root/kdebase-dev/runtime/doc/kcontrol/passwords/index.cache.bz2 /root/kdebase/runtime/doc/kcontrol/passwords/index.docbook
Starting program: /usr/local/kde4/bin/meinproc4 --check --cache /root/kdebase-dev/runtime/doc/kcontrol/passwords/index.cache.bz2 /root/kdebase/runtime/doc/kcontrol/passwords/index.docbook
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 15646.

Program received signal SIGILL, Illegal instruction.
_dl_x86_64_restore_sse () at ../sysdeps/x86_64/dl-trampoline.S:223
223             vmovdqa %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE, %ymm0
(gdb) bt full
#0  _dl_x86_64_restore_sse () at ../sysdeps/x86_64/dl-trampoline.S:223
No locals.
#1  0x00007ffff7deb6e5 in _dl_fixup (l=<value optimized out>, reloc_arg=<value optimized out>) at ../elf/dl-runtime.c:126
        version = <value optimized out>
        flags = <value optimized out>
        reloc = <value optimized out>
        sym = 0x7ffff403d5b0
        result = 0x7ffff7fdd990
        value = <value optimized out>
        __PRETTY_FUNCTION__ = "_dl_fixup"
#2  0x00007ffff7df1bb5 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:41
No locals.
#3  0x0000000000405520 in main (argc=5, argv=0x7fffffffe4c8) at /root/kdelibs/kdoctools/meinproc.cpp:179
        buf = "\001\000\000\000\377\177\000\000\230i\375\367\377\177\000\000`\326\377\377\377\177\000\000\000\000\000\000\000\000\000\000h\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\001\000\000\000\256-\356\325\310d\375\367\377\177\000\000\240\327\377\377\377\177\000\000\000\000\000\000\000\000\000\000H\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\001\000\000\000\377\177\000\000\000`\375\367\377\177\000\000\320\327\377\377\377\177\000\000\000\000\000\000\000\000\000\000\070\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\001\000\000\000\000\000\000\000\300v\375\367\377\177\000\000\000\330\377\377\377\177\000\000\000\000\000\000\000\000\000\000(\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\000\300\375\367\377\177\000\000\300v\375\367\377\177\000\000\000`\375\367\377\177\000\000\310d\375\367\377\177\000\000\320\024\376\367\377\177\000\000\230i\375\367\377\177\000\000\260\351\375\367\377\177\000\000\000\320\375\367\377\177\000\000\310\324\375\367\377\177\000\000\220\331\375\367\377\177\000\000\000\000\000\000\000\000\000\000\205B\222\367\377\177\000\000\200\334\377\377\377\177\000\000\000\000\240\260\377\377\377\377\000\000vp\275\357\377\377\340\246\377\367\377\177\000\000\025\000\000\000\000\000\000\000\fB\222\367\377\177\000\000\200\334\377\377\377\177\000\000\000\000\000\261\377\377\377\377\000\000vp\275"...
        noout = true
        cmd = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63ba00, static codecForCStrings = 0x0}
        xmllint = 0x63d3a0
        n = 0
        pwd_buffer = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63b750, static codecForCStrings = 0x0}
        file = {d_ptr = 0x63c180}
        catalogs = {static shared_null = {ref = {_q_value = 182}, alloc = 0, size = 0, data = 0x60c818 "", array = ""}, static shared_empty = {ref = {_q_value = 2}, alloc = 0, size = 0,
            data = 0x7ffff78bf8f8 "", array = ""}, d = 0x63d1c0}
        exe = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63c8f0, static codecForCStrings = 0x0}
        aboutData = {d = 0x614860}
        params = {{p = 0x7fffffffdaa0, d = 0x7fffffffdaa0}}
        app = <incomplete type>
        ins = {_vptr.KComponentData = 0x7ffff7259470, d = 0x62d1f0}
        checkFile = {d_ptr = 0x63b100}
        __PRETTY_FUNCTION__ = "int main(int, char**)"
        tss = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x7fff00000001, static codecForCStrings = 0x0}
        options = {d = 0x613a90}
        args = 0x628ba0
        srcdir = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x60c7e0, static codecForCStrings = 0x0}
        checkFilename = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63af10, static codecForCStrings = 0x0}
        index = false

Additional info:

I also am able to crash loader when using LD_BIND_NOW=true

LD_BIND_NOW=true /usr/local/kde4/lib64/kde4/libexec/start_kdeinit_wrapper +kcminit_startup

^^ but it seems to only crash when used in the bash shell script 'startkde'.

Removing LD_BIND_NOW does cause an illegal instruction either. I don't know if the two are related (they may).
Comment 1 Shawn Starr 2009-08-24 20:23:38 EDT
Removing LD_BIND_NOW does NOT cause an illegal instruction either. I don't know if
the two are related (they may).
Comment 2 Andreas Schwab 2009-08-25 04:48:22 EDT
What is the contents of have_avx at this point?
Comment 3 Shawn Starr 2009-08-25 11:20:37 EDT
how to get this? gdb says that symbol is not in current context.
Comment 4 Shawn Starr 2009-08-25 11:21:14 EDT
model name      : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz

flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority
Comment 5 Shawn Starr 2009-08-25 12:45:16 EDT
This CPU could not have AVX extensions, no CPUS sold have it yet
Comment 6 Ulrich Drepper 2009-08-25 14:22:17 EDT
I have a patch upstream which probably fixes this.  This doesn't have to do with processor versions etc.  Andreas will build a new glibc RPM hopefully soon.
Comment 7 Andreas Schwab 2009-08-26 08:15:14 EDT
Should be fixed in 2.10.90-16.
Comment 8 Tom Lane 2009-08-27 00:49:49 EDT
I have been fighting this problem for a little while now, and I can answer Andreas's question in c#2: have_avx is zero:

Program terminated with signal 4, Illegal instruction.
#0  0x00002b6737c2c229 in _dl_x86_64_restore_sse ()
   from /lib64/ld-linux-x86-64.so.2
(gdb) x/i $pc
0x2b6737c2c229 <_dl_x86_64_restore_sse+9>:      vmovdqa %fs:0x80,%ymm0
(gdb) x/12i _dl_x86_64_restore_sse
0x2b6737c2c220 <_dl_x86_64_restore_sse>:
    cmpl   $0x0,0x20bd29(%rip)        # 0x2b6737e37f50
0x2b6737c2c227 <_dl_x86_64_restore_sse+7>:
    js     0x2b6737c2c27a <_dl_x86_64_restore_sse+90>
0x2b6737c2c229 <_dl_x86_64_restore_sse+9>:      vmovdqa %fs:0x80,%ymm0
0x2b6737c2c233 <_dl_x86_64_restore_sse+19>:     vmovdqa %fs:0x90,%ymm1
...
(gdb) x/w 0x2b6737e37f50
0x2b6737e37f50: 0x00000000
(gdb) bt
#0  0x00002b6737c2c229 in _dl_x86_64_restore_sse ()
   from /lib64/ld-linux-x86-64.so.2
#1  0x00002b6737c256e5 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#2  0x00002b6737c2bbb5 in _dl_runtime_resolve ()
   from /lib64/ld-linux-x86-64.so.2

I am not sure how it is that control might reach _dl_x86_64_restore_sse without having previously passed through _dl_x86_64_save_sse, as the assembly coding clearly expects must always be the case.   However, I have more than enough evidence now to say that it sometimes *does* do that, especially in multithreaded programs.  I would suggest that the quickest and most reliable fix is to duplicate the have_avx initialization logic into _dl_x86_64_restore_sse too.  Dunno if that's what the upstream patch does.
Comment 9 Rex Dieter 2009-08-28 08:07:35 EDT
Re-opening, since no glibc builds > 15 are available yet (koji builds have failed).
Comment 10 Andreas Schwab 2009-08-28 08:11:29 EDT
It's blocked by nss.
Comment 11 Andreas Schwab 2009-08-31 05:18:48 EDT
Fixed.
Comment 12 Ngo Than 2009-08-31 08:30:04 EDT
glibc-2.10.90-17 seems broken, every application starts with LD_BIND_NOW=true crahes, for examle: LD_BIND_NOW=true gdb
Comment 13 Ulrich Drepper 2009-08-31 08:38:14 EDT
Don't hijack bugs with completely unrelated reports.  Open a new bug, if necessary.
Comment 14 Rex Dieter 2009-08-31 08:45:33 EDT
In fairness, it was mentioned here in comment #1 too,
"I also am able to crash loader when using LD_BIND_NOW=true", 
but tracking 1 bug per report is preferable for everyone.

Note You need to log in before you can comment on or make changes to this bug.