Bug 1018325
Summary: | valgrind memcheck segfaults on s390x | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Miroslav Franc <mfranc> | ||||
Component: | valgrind | Assignee: | Mark Wielaard <mjw> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Miloš Prchlík <mprchlik> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 | CC: | hkario, jakub, mbenitez, mcermak, mfranc, mjw, mprchlik, nathans, ohudlick | ||||
Target Milestone: | rc | Keywords: | TestBlocker | ||||
Target Release: | --- | ||||||
Hardware: | s390x | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | valgrind-3.9.0-1.2.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1025888 (view as bug list) | Environment: | |||||
Last Closed: | 2014-06-13 11:23:52 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1025888 | ||||||
Attachments: |
|
Comment 1
Mark Wielaard
2013-10-11 18:54:58 UTC
It seems to be 2 issues. ==19206== Process terminating with default action of signal 11 (SIGSEGV) ==19206== Bad permissions for mapped region at address 0x4045000 ==19206== at 0x4032EBC: memcpy (in /usr/lib64/valgrind/vgpreload_memcheck-s390x-linux.so) ==19206== by 0x400BCF9: _dl_new_object (in /usr/lib64/ld-2.17.so) ==19206== by 0x4006757: _dl_map_object_from_fd (in /usr/lib64/ld-2.17.so) Seems to happen always. ==19204== Jump to the invalid address stated on the next line ==19204== at 0x5D6: ??? ==19204== by 0x40276F5: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-s390x-linux.so) ==19204== by 0x400BCF9: _dl_new_object (in /usr/lib64/ld-2.17.so) ==19204== by 0x40444AF: ??? ==19204== Address 0x5d6 is not stack'd, malloc'd or (recently) free'd ==19204== ==19204== ==19204== Process terminating with default action of signal 11 (SIGSEGV) ==19204== Bad permissions for mapped region at address 0x5D6 ==19204== at 0x5D6: ??? ==19204== by 0x40276F5: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-s390x-linux.so) ==19204== by 0x400BCF9: _dl_new_object (in /usr/lib64/ld-2.17.so) ==19204== by 0x40444AF: ??? Disappears when using --run-libc-freeres=no There are two memcpy overrides that cause the trouble with memcheck. I don't fully understand why yet. But with these two commented out the memcheck tests are as good as on other platforms on s390x: --- valgrind-3.9.0.orig/memcheck/mc_replace_strmem.c 2013-11-01 20:42:43.152061965 +0100 +++ valgrind-3.9.0/memcheck/mc_replace_strmem.c 2013-11-01 20:44:28.543802133 +0100 @@ -880,9 +880,9 @@ the overlap check; sigh; see #275284. */ MEMMOVE(VG_Z_LIBC_SONAME, memcpyZAGLIBCZu2Zd2Zd5) /* memcpy.5 */ MEMCPY(VG_Z_LIBC_SONAME, memcpyZAZAGLIBCZu2Zd14) /* memcpy@@GLIBC_2.14 */ - MEMCPY(VG_Z_LIBC_SONAME, memcpy) /* fallback case */ + // MEMCPY(VG_Z_LIBC_SONAME, memcpy) /* fallback case */ MEMCPY(VG_Z_LD_SO_1, memcpy) /* ld.so.1 */ - MEMCPY(VG_Z_LD64_SO_1, memcpy) /* ld64.so.1 */ + // MEMCPY(VG_Z_LD64_SO_1, memcpy) /* ld64.so.1 */ /* icc9 blats these around all over the place. Not only in the main executable but various .so's. They are highly tuned and read memory beyond the source boundary (although work correctly and But helgrind and drd are still very broken. There are two tests that need to be disabled to get the testsuite to finish: - helgrind/tests/pth_destroy_cond.vgtest - drd/tests/annotate_order_1.vgtest This might or might not be related to the following glibc bug: https://bugzilla.redhat.com/show_bug.cgi?id=1020637 We have a patch/workaround for the memcheck segfault. Tracking the helgrind/drd issues in a new bug #1025888. Still investigating. The issue is a bit hard because it is triggered when the dynamic loader calls some functions that valgrind memcheck intercepts through vgpreload_memcheck-s390x-linux.so These functions run under the valgrind simulator. e.g. valgrind --fullpath-after= --run-libc-freeres=no ls ==3667== Memcheck, a memory error detector ==3667== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==3667== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==3667== Command: ls ==3667== ==3667== ==3667== Process terminating with default action of signal 11 (SIGSEGV) ==3667== Bad permissions for mapped region at address 0x4006000 ==3667== at 0x400EEBC: memcpy (/usr/src/debug/valgrind-3.8.1/memcheck/mc_replace_strmem.c:885) ==3667== by 0x4B982D6CF9: _dl_new_object (/usr/src/debug/glibc-2.17-c758a686/elf/dl-object.c:88) ==3667== by 0x4B982D1757: _dl_map_object_from_fd (/usr/src/debug/glibc-2.17-c758a686/elf/dl-load.c:1051) ==3667== ==3667== HEAP SUMMARY: ==3667== in use at exit: 0 bytes in 0 blocks ==3667== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==3667== ==3667== All heap blocks were freed -- no leaks are possible ==3667== ==3667== For counts of detected and suppressed errors, rerun with: -v ==3667== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Segmentation fault (Use --run-libc-freeres=no to prevent valgrind trying to clean up after the crash, which will cause even more trouble.) It is hard to get to those with gdb/vgdb. But when disabling them or when compiling them with -O0 those issues disappear, but they then pop up somewhere else. e.g. ./vg-in-place --fullpath-after= --run-libc-freeres=no perl --version ==3673== Memcheck, a memory error detector ==3673== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==3673== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==3673== Command: perl --version ==3673== ==3673== ==3673== Process terminating with default action of signal 11 (SIGSEGV) ==3673== Bad permissions for mapped region at address 0x4020000 ==3673== at 0x4B982E6816: memmove (/usr/src/debug/glibc-2.17-c758a686/string/memmove.c:112) ==3673== by 0x4B982D2449: _dl_map_object_from_fd (/usr/src/debug/glibc-2.17-c758a686/elf/dl-load.c:1570) ==3673== ==3673== HEAP SUMMARY: ==3673== in use at exit: 0 bytes in 0 blocks ==3673== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==3673== ==3673== All heap blocks were freed -- no leaks are possible ==3673== ==3673== For counts of detected and suppressed errors, rerun with: -v ==3673== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1) ./vg-in-place: line 31: 3673 Segmentation fault VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@" Note that again the call comes from the dynamic linker, but this time it goes to the actual glibc memmove implementation. They can also be seen with the none tool. e.g. valgrind --tool=none --fullpath-after= --run-libc-freeres=no python ==5739== Nulgrind, the minimal Valgrind tool ==5739== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote. ==5739== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==5739== Command: python ==5739== ==5739== ==5739== Process terminating with default action of signal 11 (SIGSEGV) ==5739== Access not within mapped region at address 0x4B98A4A000 ==5739== at 0x4B9841111E: __strncpy_chk (/usr/src/debug/glibc-2.17-c758a686/debug/strncpy_chk.c:88) ==5739== by 0x4B98987C7F: calculate_path (/usr/include/bits/string3.h:120) ==5739== by 0x4B989883ED: Py_GetProgramFullPath (/usr/src/debug/Python-2.7.5/Modules/getpath.c:723) ==5739== by 0x4B9897DF81: _PySys_Init (/usr/src/debug/Python-2.7.5/Python/sysmodule.c:1475) ==5739== by 0x4B9897709D: Py_InitializeEx (/usr/src/debug/Python-2.7.5/Python/pythonrun.c:222) ==5739== by 0x4B98988BC3: Py_Main (/usr/src/debug/Python-2.7.5/Modules/main.c:546) ==5739== If you believe this happened as a result of a stack ==5739== overflow in your program's main thread (unlikely but ==5739== possible), you can try to increase the size of the ==5739== main thread stack using the --main-stacksize= flag. ==5739== The main thread stack size used in this run was 8388608. ==5739== Segmentation fault Note that the last one isn't triggered by the dynamic loader. These can also be looked at through vgdb: valgrind --vgdb-error=0 --tool=none --fullpath-after= --run-libc-freeres=no python ==6951== Nulgrind, the minimal Valgrind tool ==6951== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote. ==6951== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==6951== Command: python ==6951== ==6951== (action at startup) vgdb me ... ==6951== ==6951== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==6951== /path/to/gdb python ==6951== and then give GDB the following command ==6951== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=6951 ==6951== --pid is optional if only one valgrind process is running ==6951== gdb python GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-41.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "s390x-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/python2.7...Reading symbols from /usr/lib/debug/usr/bin/python2.7.debug...done. done. (gdb) target remote | vgdb --pid=6951 Remote debugging using | vgdb --pid=6951 relaying data between gdb and process 6951 Reading symbols from /lib/ld64.so.1...Reading symbols from /usr/lib/debug/lib64/ld-2.17.so.debug...done. done. Loaded symbols for /lib/ld64.so.1 0x0000004b982cc2d0 in _start () from /lib/ld64.so.1 (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0000004b9841111e in __strncpy_chk (s1=<optimized out>, s2=<optimized out>, n=18446744073709551605, s1len=<optimized out>) at strncpy_chk.c:88 88 } (gdb) list 83 do 84 *++s1 = '\0'; 85 while (--n > 0); 86 87 return s; 88 } Note that gdb shows the original code of the program. You can also get a disassembly of the original code. But The problem is most likely a miscompiled instruction from the valgrind VEX jit. Created attachment 821176 [details]
mystrncpy.c reproducer when build with -march=z10 -O2 gcc 3.8.1
Based on the last crash when running python under valgrind --tool=none I created the attached reproducer mystrncpy.c. It is based on the strncpy_chk.c code from glibc code.
gcc -march=z10 -g -Wall -O2 -o mystrncpy mystrncpy.c
valgrind --tool=none ./mystrncpy "/usr/bin:/root/bin"
valgrind --tool=none ./mystrncpy "/usr/bin:/root/bin"
==27444== Nulgrind, the minimal Valgrind tool
==27444== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote.
==27444== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==27444== Command: ./mystrncpy /usr/bin:/root/bin
==27444==
==27444==
==27444== Process terminating with default action of signal 11 (SIGSEGV)
==27444== Access not within mapped region at address 0x80025000
==27444== at 0x80000730: mystrncpy_chk (mystrncpy.c:85)
==27444== by 0x8000052F: main (mystrncpy.c:97)
==27444== If you believe this happened as a result of a stack
==27444== overflow in your program's main thread (unlikely but
==27444== possible), you can try to increase the size of the
==27444== main thread stack using the --main-stacksize= flag.
==27444== The main thread stack size used in this run was 8388608.
==27444==
Segmentation fault
(In reply to Mark Wielaard from comment #12) > Created attachment 821176 [details] > mystrncpy.c reproducer when build with -march=z10 -O2 gcc 3.8.1 Sorry I meant gcc 4.8.2. It will crash under either valgrind 3.8.1 or valgrind 3.9.0 in the same way. Fix from upstream by Christian Borntraeger. vex: r2798 - /trunk/priv/guest_s390_toIR.c Author: cborntra Date: Thu Nov 7 21:37:28 2013 New Revision: 2798 Log: Fix Bug 327284. The condition code of risbg was not correct. This instruction might be used by by gcc for masking out bits, e.g. code like n &= 3; if (n == 0) might result in risbg %r4,%r4,62,128+63,0 je <target> The old code set the condition code depending on the operand before masking. Fix it. This patch also indicates that we need test suite coverage for risbg and friends. Modified: trunk/priv/guest_s390_toIR.c Modified: trunk/priv/guest_s390_toIR.c ============================================================================== --- trunk/priv/guest_s390_toIR.c (original) +++ trunk/priv/guest_s390_toIR.c Thu Nov 7 21:37:28 2013 @@ -7606,7 +7606,7 @@ put_gpr_dw0(r1, binop(Iop_And64, mkexpr(op2), mkU64(mask))); } assign(result, get_gpr_dw0(r1)); - s390_cc_thunk_putS(S390_CC_OP_LOAD_AND_TEST, op2); + s390_cc_thunk_putS(S390_CC_OP_LOAD_AND_TEST, result); return "risbg"; } *** Bug 1025888 has been marked as a duplicate of this bug. *** I don't see errors when running python under openssl so it looks fixed to me. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |