Bug 556584
Summary: | crash when running createrepo due to glibc's malloc checking | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bill Nottingham <notting> | ||||||||
Component: | glibc | Assignee: | Andreas Schwab <schwab> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | rawhide | CC: | dmalcolm, fweimer, hongjiu.lu, ivazqueznet, jakub, james.antill, jane.lv, jlaska, jonathansteffan, jvillalo, luyu, rvokal, schwab | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | glibc-2.11.90-15 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-03-14 13:44:48 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Bill Nottingham
2010-01-18 19:08:37 UTC
python-2.6.4-6.fc13 was only just built; python-2.6.4-5.fc13 was built on Friday IIRC (for bug #555943); so more likely to be that. FWIW the specific error checking within free(): errstr = "free(): corrupted unsorted chunks"; seems to have been added to glibc in this upstream commit: http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f6887a0d9a55f5c80c567d9cb153c1c6582410f9 Not sure if this error checking is highlighting an already present bug, or if there's an issue (false-positive?) in this error checking. Created attachment 385233 [details]
Backtrace provided by notting
Looks like a "<str> + <unicode>" operation (frame 8).
If you add the following to the top of the script you'll get a ton of debug information, which may provide further hints as to what's going wrong (but may obscure the crash): def tf(frame, event, arg): print "frame: %s, code: %s, locals: %s, event: %s, arg: %s" \ % (frame, frame.f_code, frame.f_locals, event, arg) import sys sys.settrace(tf) Here's a version of tf that tries to indent, based on stack depth: def tf(frame, event, arg): def depth(f): if f.f_back: return depth(f.f_back) + 1 else: return 0 print "%scode: %s, locals: %s, event: %s, arg: %s" \ % (' ' * depth(frame), frame.f_code, frame.f_locals, event, arg) According to IRC chat: most locals optimized out, but notting was able to query this, in either frame 9 or 10: (gdb) p (PyTypeObject *)v->ob_type->tp_str "str" (gdb) p (char*)((PyStringObject*)v)->ob_sval $19 = 0x85c197c "\n<package pkgid=\"c06516a49e897ff1592bba62ea48176212a35fccf10b0507eae2251ecc3a2bd4\" name=\"crystalspace-debuginfo\" arch=\"x86_64\">\n <version epoch=\"0\" ver=\"1.2.1\" rel=\"6.fc12\"/>\n" so the code is doing "<str> + <unicode>, with the left-hand side as above. Appeared that "unicode" may have be NULL in call to unicode_dealloc, which shouldn't happen; possibly a refcounting error in a unicode? (In reply to comment #6) > (gdb) p (PyTypeObject *)v->ob_type->tp_str I believe this should read: (gdb) p ((PyTypeObject *)v->ob_type)->tp_str Testing it with ElectricFence yields the following crash. Breakpoint 2, __memcpy_ssse3_rep () at ../sysdeps/i386/i686/multiarch/memcpy-ssse3-rep.S:121 121 ENTRANCE 0xbfffbccc: 0xad2abc <PyUnicodeUCS4_Concat+172> 0xb1ff7c58 0xb215cc58 0x883a4 Assuming that's actually the length it's trying to copy, that looks high. No obvious reason why it would be doing that, though. notting: I filed bug 556975 to track the difficulty we had querying variables in gdb with this build of python. That looks like a bug in the optimized memcpy/memset. Can you send me the output of # uname -a # cat /proc/cpu_info Also please show me how to reproduce the bug with minimum setup. If you can give me a testcase in C, I will fix it. The test case is unfortunately, not very minimal. It's running createrepo across ~5G of packages. [notting@bastion2 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz stepping : 6 cpu MHz : 2992.498 cache size : 6144 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm bogomips : 7483.59 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz stepping : 6 cpu MHz : 2992.498 cache size : 6144 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc up pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm bogomips : 7483.59 Kernel is a xen domU, 2.6.18-164.6.1.el5xen. Sorry, that was the wrong box. *Actual* /proc/cpuinfo: [notting@releng2 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 cpu MHz : 2400.084 cache size : 8192 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 6002.17 Kernel is 2.6.18-164.2.1.el5xen. If you can find a testcase in C which I can use gdb to debug, I will take a look. Can you run the same thing with the same glibc on a Core 2 machine? The different memcpy will be used in this case. You can use LD_AUDIT to check the parameters passed to memcpy calls. (In reply to comment #9) > notting: I filed bug 556975 to track the difficulty we had querying variables > in gdb with this build of python. notting: it looks like with a newer build of python (python-2.6.4-8) and/or newer gdb this one should be more amenable to debugging; if it can be reproduced with the newer build of python we could use that to try an isolate a more sane reproducer. Any news here? Have not had time to set up the reproducing case again. Created attachment 390509 [details]
A patch to use unsigned conditional jump
This patch may fix the problem. memcpy uses signed conditional jump. If
you copy data > 2GB, it will get it wrong. Please give it a try. Thanks.
Any updates? If my patch is the right fix, I'd like to push it upstream. Thanks. Haven't had a chance to test. glibc-2.11.90-13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/glibc-2.11.90-13 Created attachment 395514 [details]
A patch to fix memcpy
I found another bug in memcpy-ssse3-rep.S. This
patch fixes it.
glibc-2.11.90-14 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/glibc-2.11.90-14 glibc-2.11.90-14 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update glibc'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F13/FEDORA-2010-2658 glibc-2.11.90-15 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/glibc-2.11.90-15 glibc-2.11.90-15 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update glibc'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/glibc-2.11.90-15 glibc-2.11.90-15 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. |