Bug 519081
Summary: | Random crashes with ld-linux loader on x86_64 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Shawn Starr <shawn.starr> |
Component: | glibc | Assignee: | Andreas Schwab <schwab> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rawhide | CC: | drepper, jakub, jcm, rdieter, schwab, tgl, than |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-08-31 12:38:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 519391 | ||
Bug Blocks: | 519226 |
Description
Shawn Starr
2009-08-24 23:46:12 UTC
Removing LD_BIND_NOW does NOT cause an illegal instruction either. I don't know if the two are related (they may). What is the contents of have_avx at this point? how to get this? gdb says that symbol is not in current context. model name : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority This CPU could not have AVX extensions, no CPUS sold have it yet I have a patch upstream which probably fixes this. This doesn't have to do with processor versions etc. Andreas will build a new glibc RPM hopefully soon. Should be fixed in 2.10.90-16. I have been fighting this problem for a little while now, and I can answer Andreas's question in c#2: have_avx is zero: Program terminated with signal 4, Illegal instruction. #0 0x00002b6737c2c229 in _dl_x86_64_restore_sse () from /lib64/ld-linux-x86-64.so.2 (gdb) x/i $pc 0x2b6737c2c229 <_dl_x86_64_restore_sse+9>: vmovdqa %fs:0x80,%ymm0 (gdb) x/12i _dl_x86_64_restore_sse 0x2b6737c2c220 <_dl_x86_64_restore_sse>: cmpl $0x0,0x20bd29(%rip) # 0x2b6737e37f50 0x2b6737c2c227 <_dl_x86_64_restore_sse+7>: js 0x2b6737c2c27a <_dl_x86_64_restore_sse+90> 0x2b6737c2c229 <_dl_x86_64_restore_sse+9>: vmovdqa %fs:0x80,%ymm0 0x2b6737c2c233 <_dl_x86_64_restore_sse+19>: vmovdqa %fs:0x90,%ymm1 ... (gdb) x/w 0x2b6737e37f50 0x2b6737e37f50: 0x00000000 (gdb) bt #0 0x00002b6737c2c229 in _dl_x86_64_restore_sse () from /lib64/ld-linux-x86-64.so.2 #1 0x00002b6737c256e5 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #2 0x00002b6737c2bbb5 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2 I am not sure how it is that control might reach _dl_x86_64_restore_sse without having previously passed through _dl_x86_64_save_sse, as the assembly coding clearly expects must always be the case. However, I have more than enough evidence now to say that it sometimes *does* do that, especially in multithreaded programs. I would suggest that the quickest and most reliable fix is to duplicate the have_avx initialization logic into _dl_x86_64_restore_sse too. Dunno if that's what the upstream patch does. Re-opening, since no glibc builds > 15 are available yet (koji builds have failed). It's blocked by nss. Fixed. glibc-2.10.90-17 seems broken, every application starts with LD_BIND_NOW=true crahes, for examle: LD_BIND_NOW=true gdb Don't hijack bugs with completely unrelated reports. Open a new bug, if necessary. In fairness, it was mentioned here in comment #1 too, "I also am able to crash loader when using LD_BIND_NOW=true", but tracking 1 bug per report is preferable for everyone. |