Bug 1018325

Summary: valgrind memcheck segfaults on s390x
Product: Red Hat Enterprise Linux 7 Reporter: Miroslav Franc <mfranc>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED CURRENTRELEASE QA Contact: Miloš Prchlík <mprchlik>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: hkario, jakub, mbenitez, mcermak, mfranc, mjw, mprchlik, nathans, ohudlick
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: valgrind-3.9.0-1.2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1025888 (view as bug list) Environment:
Last Closed: 2014-06-13 11:23:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1025888    
Attachments:
Description Flags
mystrncpy.c reproducer when build with -march=z10 -O2 gcc 3.8.1 none

Comment 1 Mark Wielaard 2013-10-11 18:54:58 UTC
Could you try with valgrind --run-libc-freeres=no
That should prevent valgrind from trying to free up glibc memory at exit.

Comment 2 Mark Wielaard 2013-10-14 13:25:46 UTC
It seems to be 2 issues.

==19206== Process terminating with default action of signal 11 (SIGSEGV)
==19206==  Bad permissions for mapped region at address 0x4045000
==19206==    at 0x4032EBC: memcpy (in /usr/lib64/valgrind/vgpreload_memcheck-s390x-linux.so)
==19206==    by 0x400BCF9: _dl_new_object (in /usr/lib64/ld-2.17.so)
==19206==    by 0x4006757: _dl_map_object_from_fd (in /usr/lib64/ld-2.17.so)

Seems to happen always.

==19204== Jump to the invalid address stated on the next line
==19204==    at 0x5D6: ???
==19204==    by 0x40276F5: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-s390x-linux.so)
==19204==    by 0x400BCF9: _dl_new_object (in /usr/lib64/ld-2.17.so)
==19204==    by 0x40444AF: ???
==19204==  Address 0x5d6 is not stack'd, malloc'd or (recently) free'd
==19204== 
==19204== 
==19204== Process terminating with default action of signal 11 (SIGSEGV)
==19204==  Bad permissions for mapped region at address 0x5D6
==19204==    at 0x5D6: ???
==19204==    by 0x40276F5: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-s390x-linux.so)
==19204==    by 0x400BCF9: _dl_new_object (in /usr/lib64/ld-2.17.so)
==19204==    by 0x40444AF: ???

Disappears when using --run-libc-freeres=no

Comment 6 Mark Wielaard 2013-11-01 19:58:38 UTC
There are two memcpy overrides that cause the trouble with memcheck.
I don't fully understand why yet. But with these two commented out the memcheck tests are as good as on other platforms on s390x:

--- valgrind-3.9.0.orig/memcheck/mc_replace_strmem.c	2013-11-01 20:42:43.152061965 +0100
+++ valgrind-3.9.0/memcheck/mc_replace_strmem.c	2013-11-01 20:44:28.543802133 +0100
@@ -880,9 +880,9 @@
     the overlap check; sigh; see #275284. */
  MEMMOVE(VG_Z_LIBC_SONAME, memcpyZAGLIBCZu2Zd2Zd5) /* memcpy.5 */
  MEMCPY(VG_Z_LIBC_SONAME,  memcpyZAZAGLIBCZu2Zd14) /* memcpy@@GLIBC_2.14 */
- MEMCPY(VG_Z_LIBC_SONAME,  memcpy) /* fallback case */
+ // MEMCPY(VG_Z_LIBC_SONAME,  memcpy) /* fallback case */
  MEMCPY(VG_Z_LD_SO_1,      memcpy) /* ld.so.1 */
- MEMCPY(VG_Z_LD64_SO_1,    memcpy) /* ld64.so.1 */
+ // MEMCPY(VG_Z_LD64_SO_1,    memcpy) /* ld64.so.1 */
  /* icc9 blats these around all over the place.  Not only in the main
     executable but various .so's.  They are highly tuned and read
     memory beyond the source boundary (although work correctly and

But helgrind and drd are still very broken. There are two tests that need to be disabled to get the testsuite to finish:

- helgrind/tests/pth_destroy_cond.vgtest
- drd/tests/annotate_order_1.vgtest

This might or might not be related to the following glibc bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1020637

Comment 7 Mark Wielaard 2013-11-01 20:21:43 UTC
We have a patch/workaround for the memcheck segfault.
Tracking the helgrind/drd issues in a new bug #1025888.

Comment 11 Mark Wielaard 2013-11-07 12:57:14 UTC
Still investigating. The issue is a bit hard because it is triggered when the dynamic loader calls some functions that valgrind memcheck intercepts through vgpreload_memcheck-s390x-linux.so These functions run under the valgrind simulator. e.g.

valgrind --fullpath-after= --run-libc-freeres=no ls
==3667== Memcheck, a memory error detector
==3667== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==3667== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==3667== Command: ls
==3667== 
==3667== 
==3667== Process terminating with default action of signal 11 (SIGSEGV)
==3667==  Bad permissions for mapped region at address 0x4006000
==3667==    at 0x400EEBC: memcpy (/usr/src/debug/valgrind-3.8.1/memcheck/mc_replace_strmem.c:885)
==3667==    by 0x4B982D6CF9: _dl_new_object (/usr/src/debug/glibc-2.17-c758a686/elf/dl-object.c:88)
==3667==    by 0x4B982D1757: _dl_map_object_from_fd (/usr/src/debug/glibc-2.17-c758a686/elf/dl-load.c:1051)
==3667== 
==3667== HEAP SUMMARY:
==3667==     in use at exit: 0 bytes in 0 blocks
==3667==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==3667== 
==3667== All heap blocks were freed -- no leaks are possible
==3667== 
==3667== For counts of detected and suppressed errors, rerun with: -v
==3667== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault

(Use --run-libc-freeres=no to prevent valgrind trying to clean up after the crash, which will cause even more trouble.)

It is hard to get to those with gdb/vgdb.

But when disabling them or when compiling them with -O0 those issues disappear, but they then pop up somewhere else. e.g.

./vg-in-place --fullpath-after= --run-libc-freeres=no perl --version
==3673== Memcheck, a memory error detector
==3673== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==3673== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==3673== Command: perl --version
==3673== 
==3673== 
==3673== Process terminating with default action of signal 11 (SIGSEGV)
==3673==  Bad permissions for mapped region at address 0x4020000
==3673==    at 0x4B982E6816: memmove (/usr/src/debug/glibc-2.17-c758a686/string/memmove.c:112)
==3673==    by 0x4B982D2449: _dl_map_object_from_fd (/usr/src/debug/glibc-2.17-c758a686/elf/dl-load.c:1570)
==3673== 
==3673== HEAP SUMMARY:
==3673==     in use at exit: 0 bytes in 0 blocks
==3673==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==3673== 
==3673== All heap blocks were freed -- no leaks are possible
==3673== 
==3673== For counts of detected and suppressed errors, rerun with: -v
==3673== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
./vg-in-place: line 31:  3673 Segmentation fault      VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"

Note that again the call comes from the dynamic linker, but this time it goes to the actual glibc memmove implementation.

They can also be seen with the none tool. e.g.

valgrind --tool=none --fullpath-after= --run-libc-freeres=no python
==5739== Nulgrind, the minimal Valgrind tool
==5739== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote.
==5739== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==5739== Command: python
==5739== 
==5739== 
==5739== Process terminating with default action of signal 11 (SIGSEGV)
==5739==  Access not within mapped region at address 0x4B98A4A000
==5739==    at 0x4B9841111E: __strncpy_chk (/usr/src/debug/glibc-2.17-c758a686/debug/strncpy_chk.c:88)
==5739==    by 0x4B98987C7F: calculate_path (/usr/include/bits/string3.h:120)
==5739==    by 0x4B989883ED: Py_GetProgramFullPath (/usr/src/debug/Python-2.7.5/Modules/getpath.c:723)
==5739==    by 0x4B9897DF81: _PySys_Init (/usr/src/debug/Python-2.7.5/Python/sysmodule.c:1475)
==5739==    by 0x4B9897709D: Py_InitializeEx (/usr/src/debug/Python-2.7.5/Python/pythonrun.c:222)
==5739==    by 0x4B98988BC3: Py_Main (/usr/src/debug/Python-2.7.5/Modules/main.c:546)
==5739==  If you believe this happened as a result of a stack
==5739==  overflow in your program's main thread (unlikely but
==5739==  possible), you can try to increase the size of the
==5739==  main thread stack using the --main-stacksize= flag.
==5739==  The main thread stack size used in this run was 8388608.
==5739== 
Segmentation fault

Note that the last one isn't triggered by the dynamic loader.

These can also be looked at through vgdb:

valgrind --vgdb-error=0 --tool=none --fullpath-after= --run-libc-freeres=no python
==6951== Nulgrind, the minimal Valgrind tool
==6951== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote.
==6951== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==6951== Command: python
==6951== 
==6951== (action at startup) vgdb me ... 
==6951== 
==6951== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==6951==   /path/to/gdb python
==6951== and then give GDB the following command
==6951==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=6951
==6951== --pid is optional if only one valgrind process is running
==6951== 

gdb python
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-41.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "s390x-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/python2.7...Reading symbols from /usr/lib/debug/usr/bin/python2.7.debug...done.
done.
(gdb) target remote | vgdb --pid=6951
Remote debugging using | vgdb --pid=6951
relaying data between gdb and process 6951
Reading symbols from /lib/ld64.so.1...Reading symbols from /usr/lib/debug/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib/ld64.so.1
0x0000004b982cc2d0 in _start () from /lib/ld64.so.1
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000004b9841111e in __strncpy_chk (s1=<optimized out>, s2=<optimized out>, 
    n=18446744073709551605, s1len=<optimized out>) at strncpy_chk.c:88
88	}
(gdb) list
83	  do
84	    *++s1 = '\0';
85	  while (--n > 0);
86	
87	  return s;
88	}

Note that gdb shows the original code of the program. You can also get a disassembly of the original code. But The problem is most likely a miscompiled instruction from the valgrind VEX jit.

Comment 12 Mark Wielaard 2013-11-07 14:37:18 UTC
Created attachment 821176 [details]
mystrncpy.c reproducer when build with -march=z10 -O2 gcc 3.8.1

Based on the last crash when running python under valgrind --tool=none I created the attached reproducer mystrncpy.c. It is based on the strncpy_chk.c code from glibc code.

gcc -march=z10 -g -Wall -O2 -o mystrncpy mystrncpy.c
valgrind --tool=none ./mystrncpy "/usr/bin:/root/bin"

valgrind --tool=none ./mystrncpy "/usr/bin:/root/bin"
==27444== Nulgrind, the minimal Valgrind tool
==27444== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote.
==27444== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==27444== Command: ./mystrncpy /usr/bin:/root/bin
==27444== 
==27444== 
==27444== Process terminating with default action of signal 11 (SIGSEGV)
==27444==  Access not within mapped region at address 0x80025000
==27444==    at 0x80000730: mystrncpy_chk (mystrncpy.c:85)
==27444==    by 0x8000052F: main (mystrncpy.c:97)
==27444==  If you believe this happened as a result of a stack
==27444==  overflow in your program's main thread (unlikely but
==27444==  possible), you can try to increase the size of the
==27444==  main thread stack using the --main-stacksize= flag.
==27444==  The main thread stack size used in this run was 8388608.
==27444== 
Segmentation fault

Comment 13 Mark Wielaard 2013-11-07 14:43:05 UTC
(In reply to Mark Wielaard from comment #12)
> Created attachment 821176 [details]
> mystrncpy.c reproducer when build with -march=z10 -O2 gcc 3.8.1

Sorry I meant gcc 4.8.2.
It will crash under either valgrind 3.8.1 or valgrind 3.9.0 in the same way.

Comment 14 Mark Wielaard 2013-11-07 22:01:00 UTC
Fix from upstream by Christian Borntraeger.

vex: r2798 - /trunk/priv/guest_s390_toIR.c

Author: cborntra
Date: Thu Nov  7 21:37:28 2013
New Revision: 2798

Log:
Fix Bug 327284. The condition code of risbg was not correct.
This instruction might be used by by gcc for masking out bits,
e.g. code like
 n &= 3;
  if (n == 0)

might result in
        risbg   %r4,%r4,62,128+63,0
        je	<target>

The old code set the condition code depending on the operand before
masking. Fix it. This patch also indicates that we need test suite
coverage for risbg and friends.



Modified:
    trunk/priv/guest_s390_toIR.c

Modified: trunk/priv/guest_s390_toIR.c
==============================================================================
--- trunk/priv/guest_s390_toIR.c (original)
+++ trunk/priv/guest_s390_toIR.c Thu Nov  7 21:37:28 2013
@@ -7606,7 +7606,7 @@
       put_gpr_dw0(r1, binop(Iop_And64, mkexpr(op2), mkU64(mask)));
    }
    assign(result, get_gpr_dw0(r1));
-   s390_cc_thunk_putS(S390_CC_OP_LOAD_AND_TEST, op2);
+   s390_cc_thunk_putS(S390_CC_OP_LOAD_AND_TEST, result);

    return "risbg";
 }

Comment 15 Mark Wielaard 2013-11-07 23:36:40 UTC
*** Bug 1025888 has been marked as a duplicate of this bug. ***

Comment 16 Hubert Kario 2013-11-12 14:12:50 UTC
I don't see errors when running python under openssl so it looks fixed to me.

Comment 19 Ludek Smid 2014-06-13 11:23:52 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.