Bug 519081

Summary: Random crashes with ld-linux loader on x86_64
Product: [Fedora] Fedora Reporter: Shawn Starr <shawn.starr>
Component: glibcAssignee: Andreas Schwab <schwab>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: drepper, jakub, jcm, rdieter, schwab, tgl, than
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-31 12:38:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 519391    
Bug Blocks: 519226    

Description Shawn Starr 2009-08-24 23:46:12 UTC
Description of problem:
With some compiled programs I have been able to crash the linux loader. This seems to be x86_64 specific.

Version-Release number of selected component (if applicable):
glibc-2.10.90-14.x86_64

How reproducible:
Quite often more then 80%

Steps to Reproduce:
1. Certain programs compiled will crash I cannot say what but two of them I have been able to crash loader.

Results from debug:
(gdb) run --check --cache /root/kdebase-dev/runtime/doc/kcontrol/passwords/index.cache.bz2 /root/kdebase/runtime/doc/kcontrol/passwords/index.docbook
Starting program: /usr/local/kde4/bin/meinproc4 --check --cache /root/kdebase-dev/runtime/doc/kcontrol/passwords/index.cache.bz2 /root/kdebase/runtime/doc/kcontrol/passwords/index.docbook
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 15646.

Program received signal SIGILL, Illegal instruction.
_dl_x86_64_restore_sse () at ../sysdeps/x86_64/dl-trampoline.S:223
223             vmovdqa %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE, %ymm0
(gdb) bt full
#0  _dl_x86_64_restore_sse () at ../sysdeps/x86_64/dl-trampoline.S:223
No locals.
#1  0x00007ffff7deb6e5 in _dl_fixup (l=<value optimized out>, reloc_arg=<value optimized out>) at ../elf/dl-runtime.c:126
        version = <value optimized out>
        flags = <value optimized out>
        reloc = <value optimized out>
        sym = 0x7ffff403d5b0
        result = 0x7ffff7fdd990
        value = <value optimized out>
        __PRETTY_FUNCTION__ = "_dl_fixup"
#2  0x00007ffff7df1bb5 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:41
No locals.
#3  0x0000000000405520 in main (argc=5, argv=0x7fffffffe4c8) at /root/kdelibs/kdoctools/meinproc.cpp:179
        buf = "\001\000\000\000\377\177\000\000\230i\375\367\377\177\000\000`\326\377\377\377\177\000\000\000\000\000\000\000\000\000\000h\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\001\000\000\000\256-\356\325\310d\375\367\377\177\000\000\240\327\377\377\377\177\000\000\000\000\000\000\000\000\000\000H\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\001\000\000\000\377\177\000\000\000`\375\367\377\177\000\000\320\327\377\377\377\177\000\000\000\000\000\000\000\000\000\000\070\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\001\000\000\000\000\000\000\000\300v\375\367\377\177\000\000\000\330\377\377\377\177\000\000\000\000\000\000\000\000\000\000(\353\213\367\377\177\000\000\364\237\336\367\377\177\000\000\000\300\375\367\377\177\000\000\300v\375\367\377\177\000\000\000`\375\367\377\177\000\000\310d\375\367\377\177\000\000\320\024\376\367\377\177\000\000\230i\375\367\377\177\000\000\260\351\375\367\377\177\000\000\000\320\375\367\377\177\000\000\310\324\375\367\377\177\000\000\220\331\375\367\377\177\000\000\000\000\000\000\000\000\000\000\205B\222\367\377\177\000\000\200\334\377\377\377\177\000\000\000\000\240\260\377\377\377\377\000\000vp\275\357\377\377\340\246\377\367\377\177\000\000\025\000\000\000\000\000\000\000\fB\222\367\377\177\000\000\200\334\377\377\377\177\000\000\000\000\000\261\377\377\377\377\000\000vp\275"...
        noout = true
        cmd = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63ba00, static codecForCStrings = 0x0}
        xmllint = 0x63d3a0
        n = 0
        pwd_buffer = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63b750, static codecForCStrings = 0x0}
        file = {d_ptr = 0x63c180}
        catalogs = {static shared_null = {ref = {_q_value = 182}, alloc = 0, size = 0, data = 0x60c818 "", array = ""}, static shared_empty = {ref = {_q_value = 2}, alloc = 0, size = 0,
            data = 0x7ffff78bf8f8 "", array = ""}, d = 0x63d1c0}
        exe = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63c8f0, static codecForCStrings = 0x0}
        aboutData = {d = 0x614860}
        params = {{p = 0x7fffffffdaa0, d = 0x7fffffffdaa0}}
        app = <incomplete type>
        ins = {_vptr.KComponentData = 0x7ffff7259470, d = 0x62d1f0}
        checkFile = {d_ptr = 0x63b100}
        __PRETTY_FUNCTION__ = "int main(int, char**)"
        tss = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x7fff00000001, static codecForCStrings = 0x0}
        options = {d = 0x613a90}
        args = 0x628ba0
        srcdir = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
            righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x60c7e0, static codecForCStrings = 0x0}
        checkFilename = {static null = {<No data fields>}, static shared_null = {ref = {_q_value = 47}, alloc = 0, size = 0, data = 0x60c7fa, clean = 0, simpletext = 0, righttoleft = 0,
            asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value = 57}, alloc = 0, size = 0, data = 0x7ffff78bfa3a, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x63af10, static codecForCStrings = 0x0}
        index = false

Additional info:

I also am able to crash loader when using LD_BIND_NOW=true

LD_BIND_NOW=true /usr/local/kde4/lib64/kde4/libexec/start_kdeinit_wrapper +kcminit_startup

^^ but it seems to only crash when used in the bash shell script 'startkde'.

Removing LD_BIND_NOW does cause an illegal instruction either. I don't know if the two are related (they may).

Comment 1 Shawn Starr 2009-08-25 00:23:38 UTC
Removing LD_BIND_NOW does NOT cause an illegal instruction either. I don't know if
the two are related (they may).

Comment 2 Andreas Schwab 2009-08-25 08:48:22 UTC
What is the contents of have_avx at this point?

Comment 3 Shawn Starr 2009-08-25 15:20:37 UTC
how to get this? gdb says that symbol is not in current context.

Comment 4 Shawn Starr 2009-08-25 15:21:14 UTC
model name      : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz

flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority

Comment 5 Shawn Starr 2009-08-25 16:45:16 UTC
This CPU could not have AVX extensions, no CPUS sold have it yet

Comment 6 Ulrich Drepper 2009-08-25 18:22:17 UTC
I have a patch upstream which probably fixes this.  This doesn't have to do with processor versions etc.  Andreas will build a new glibc RPM hopefully soon.

Comment 7 Andreas Schwab 2009-08-26 12:15:14 UTC
Should be fixed in 2.10.90-16.

Comment 8 Tom Lane 2009-08-27 04:49:49 UTC
I have been fighting this problem for a little while now, and I can answer Andreas's question in c#2: have_avx is zero:

Program terminated with signal 4, Illegal instruction.
#0  0x00002b6737c2c229 in _dl_x86_64_restore_sse ()
   from /lib64/ld-linux-x86-64.so.2
(gdb) x/i $pc
0x2b6737c2c229 <_dl_x86_64_restore_sse+9>:      vmovdqa %fs:0x80,%ymm0
(gdb) x/12i _dl_x86_64_restore_sse
0x2b6737c2c220 <_dl_x86_64_restore_sse>:
    cmpl   $0x0,0x20bd29(%rip)        # 0x2b6737e37f50
0x2b6737c2c227 <_dl_x86_64_restore_sse+7>:
    js     0x2b6737c2c27a <_dl_x86_64_restore_sse+90>
0x2b6737c2c229 <_dl_x86_64_restore_sse+9>:      vmovdqa %fs:0x80,%ymm0
0x2b6737c2c233 <_dl_x86_64_restore_sse+19>:     vmovdqa %fs:0x90,%ymm1
...
(gdb) x/w 0x2b6737e37f50
0x2b6737e37f50: 0x00000000
(gdb) bt
#0  0x00002b6737c2c229 in _dl_x86_64_restore_sse ()
   from /lib64/ld-linux-x86-64.so.2
#1  0x00002b6737c256e5 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#2  0x00002b6737c2bbb5 in _dl_runtime_resolve ()
   from /lib64/ld-linux-x86-64.so.2

I am not sure how it is that control might reach _dl_x86_64_restore_sse without having previously passed through _dl_x86_64_save_sse, as the assembly coding clearly expects must always be the case.   However, I have more than enough evidence now to say that it sometimes *does* do that, especially in multithreaded programs.  I would suggest that the quickest and most reliable fix is to duplicate the have_avx initialization logic into _dl_x86_64_restore_sse too.  Dunno if that's what the upstream patch does.

Comment 9 Rex Dieter 2009-08-28 12:07:35 UTC
Re-opening, since no glibc builds > 15 are available yet (koji builds have failed).

Comment 10 Andreas Schwab 2009-08-28 12:11:29 UTC
It's blocked by nss.

Comment 11 Andreas Schwab 2009-08-31 09:18:48 UTC
Fixed.

Comment 12 Than Ngo 2009-08-31 12:30:04 UTC
glibc-2.10.90-17 seems broken, every application starts with LD_BIND_NOW=true crahes, for examle: LD_BIND_NOW=true gdb

Comment 13 Ulrich Drepper 2009-08-31 12:38:14 UTC
Don't hijack bugs with completely unrelated reports.  Open a new bug, if necessary.

Comment 14 Rex Dieter 2009-08-31 12:45:33 UTC
In fairness, it was mentioned here in comment #1 too,
"I also am able to crash loader when using LD_BIND_NOW=true", 
but tracking 1 bug per report is preferable for everyone.