Bug 1232158 - drpm valgrind tests fail (memcpy/memmove confusion)
Summary: drpm valgrind tests fail (memcpy/memmove confusion)
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: valgrind
Version: 22
Hardware: ppc64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2015-06-16 07:52 UTC by Dan Horák
Modified: 2015-07-23 08:56 UTC (History)
12 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2015-07-19 01:55:37 UTC


Attachments (Terms of Use)
valgrind_v_v log (32.54 KB, text/plain)
2015-07-03 09:34 UTC, Rafael Fonseca
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
KDE Software Compilation 349828 None None None Never

Description Dan Horák 2015-06-16 07:52:34 UTC
valgrind tests fail on ppc64le - http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=2548997

from build.log
...
Scanning dependencies of target check
make[3]: Leaving directory '/builddir/build/BUILD/drpm-0.1.3'
make -f CMakeFiles/check.dir/build.make CMakeFiles/check.dir/build
make[3]: Entering directory '/builddir/build/BUILD/drpm-0.1.3'
/usr/bin/ctest --output-on-failure
Test project /builddir/build/BUILD/drpm-0.1.3
    Start 1: drpm_test
1/2 Test #1: drpm_test ........................   Passed    0.01 sec
    Start 2: drpm_test_valgrind
2/2 Test #2: drpm_test_valgrind ...............***Failed    1.15 sec
==13906== Memcheck, a memory error detector
==13906== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==13906== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==13906== Command: ./drpm_test drpm_test_file_1.drpm drpm_test_file_2.drpm drpm_test_file_3.rpm drpm_test_file_4 drpm_test_file_5
==13906== 
==13906== Source and destination overlap in memcpy(0x405afe8, 0x405aff0, 32)
==13906==    at 0x489D970: memcpy (in /usr/lib64/valgrind/vgpreload_memcheck-ppc64be-linux.so)
==13906==    by 0x402B2EB: memmove (in /usr/lib64/ld-2.21.so)
==13906==    by 0x401A74F: _dl_map_object_deps (in /usr/lib64/ld-2.21.so)
==13906==    by 0x400625B: dl_main (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4027C23: _dl_sysdep_start (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4003BD7: _dl_start_final (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4007D63: _dl_start (in /usr/lib64/ld-2.21.so)
==13906==    by 0x400338F: _start (in /usr/lib64/ld-2.21.so)
==13906== 
==13906== Source and destination overlap in memcpy(0xfff00e266, 0xfff00e268, 8)
==13906==    at 0x489D970: memcpy (in /usr/lib64/valgrind/vgpreload_memcheck-ppc64be-linux.so)
==13906==    by 0x402B2EB: memmove (in /usr/lib64/ld-2.21.so)
==13906==    by 0x401A8EF: _dl_map_object_deps (in /usr/lib64/ld-2.21.so)
==13906==    by 0x400625B: dl_main (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4027C23: _dl_sysdep_start (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4003BD7: _dl_start_final (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4007D63: _dl_start (in /usr/lib64/ld-2.21.so)
==13906==    by 0x400338F: _start (in /usr/lib64/ld-2.21.so)
==13906== 
[==========] Running 7 test(s).
[ RUN      ] test_drpm_read_err_mock
[       OK ] test_drpm_read_err_mock
[ RUN      ] test_drpm_read_err_input
[       OK ] test_drpm_read_err_input
[ RUN      ] test_drpm_read_ok
[       OK ] test_drpm_read_ok
[ RUN      ] test_drpm_get_uint
[       OK ] test_drpm_get_uint
[ RUN      ] test_drpm_get_string
[       OK ] test_drpm_get_string
[ RUN      ] test_drpm_destroy_err
[       OK ] test_drpm_destroy_err
[ RUN      ] test_drpm_destroy_ok
[       OK ] test_drpm_destroy_ok
[==========] 7 test(s) run.
[  PASSED  ] 7 test(s).
==13906== Source and destination overlap in memcpy(0xfff00f0f8, 0xfff00f100, 32)
==13906==    at 0x489D970: memcpy (in /usr/lib64/valgrind/vgpreload_memcheck-ppc64be-linux.so)
==13906==    by 0x402B2EB: memmove (in /usr/lib64/ld-2.21.so)
==13906==    by 0x401CE4F: _dl_sort_fini (in /usr/lib64/ld-2.21.so)
==13906==    by 0x401D197: _dl_fini (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4B64D13: __run_exit_handlers (in /usr/lib64/libc-2.21.so)
==13906==    by 0x4B64D5B: exit (in /usr/lib64/libc-2.21.so)
==13906==    by 0x4B44D0F: generic_start_main.isra.0 (in /usr/lib64/libc-2.21.so)
==13906==    by 0x4B44F37: (below main) (in /usr/lib64/libc-2.21.so)
==13906== 
==13906== Source and destination overlap in memcpy(0xfff00ef96, 0xfff00ef98, 8)
==13906==    at 0x489D970: memcpy (in /usr/lib64/valgrind/vgpreload_memcheck-ppc64be-linux.so)
==13906==    by 0x402B2EB: memmove (in /usr/lib64/ld-2.21.so)
==13906==    by 0x401CF4F: _dl_sort_fini (in /usr/lib64/ld-2.21.so)
==13906==    by 0x401D197: _dl_fini (in /usr/lib64/ld-2.21.so)
==13906==    by 0x4B64D13: __run_exit_handlers (in /usr/lib64/libc-2.21.so)
==13906==    by 0x4B64D5B: exit (in /usr/lib64/libc-2.21.so)
==13906==    by 0x4B44D0F: generic_start_main.isra.0 (in /usr/lib64/libc-2.21.so)
==13906==    by 0x4B44F37: (below main) (in /usr/lib64/libc-2.21.so)
==13906== 
==13906== 
==13906== HEAP SUMMARY:
==13906==     in use at exit: 0 bytes in 0 blocks
==13906==   total heap usage: 445 allocs, 445 frees, 5,596,603 bytes allocated
==13906== 
==13906== All heap blocks were freed -- no leaks are possible
==13906== 
==13906== For counts of detected and suppressed errors, rerun with: -v
==13906== ERROR SUMMARY: 32 errors from 4 contexts (suppressed: 0 from 0)
50% tests passed, 1 tests failed out of 2
Total Test time (real) =   1.16 sec
The following tests FAILED:
	  2 - drpm_test_valgrind (Failed)
CMakeFiles/check.dir/build.make:52: recipe for target 'CMakeFiles/check' failed
...


Version-Release number of selected component (if applicable):
drpm-0.1.3-1.fc22

Comment 1 Dan Horák 2015-06-16 07:54:12 UTC
but this might be a glibc bug actually ...

Comment 2 Dan Horák 2015-06-16 08:11:50 UTC
from a chat with glibc maintainer

<dhorak> siddhesh: hi, can't https://bugzilla.redhat.com/show_bug.cgi?id=1232158 be a ppc64 glibc (ld.so) bug actually?
<siddhesh> dhorak: my reading of this is that ld.so calls memmove, which seems to be handled by the memcpy interceptor function in valgrind
<siddhesh> memmove can take overlapping arguments
<siddhesh> so I'd look at valgrind first

Comment 3 Mark Wielaard 2015-07-02 16:29:50 UTC
Should this be moved to the valgrind component?

Could you run with valgrind -v -v?
That should show the function intercepts.

Comment 4 Dan Horák 2015-07-03 07:48:02 UTC
as can be seen from http://ppc.koji.fedoraproject.org/koji/packageinfo?packageID=19521 - build in f23 succeeds, but f22 and f21 fails. Can't f23 valgrind contain additional ppc64 related fixes?

Rafael, can you take a look on Mark's request?

Comment 5 Mark Wielaard 2015-07-03 08:20:13 UTC
(In reply to Dan Horák from comment #4)
> as can be seen from
> http://ppc.koji.fedoraproject.org/koji/packageinfo?packageID=19521 - build
> in f23 succeeds, but f22 and f21 fails. Can't f23 valgrind contain
> additional ppc64 related fixes?

OK, now I see. That is odd indeed. On f23 both ppc64 and ppc64le succeed, but on both f21 and f22 ppc64 fails while ppc64le succeeds.

f21 is fairly old. But f22 and f23/master shouldn't have any differences that I believe can explain this.

So it must be some interaction issue between glibc and valgrind on f21/f22 on ppc64 which was somehow fixed for f23/rawhide. And that didn't impact ppc64le.

I don't have any theory yet what that could be.

Comment 6 Mark Wielaard 2015-07-03 08:43:00 UTC
Maybe related, maybe not. The following change is in glibc 2.22:

commit b269211467795c71ae0ceb0ce79f2fb6614f33c9
Author: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
Date:   Wed Jan 21 07:41:46 2015 -0500

    powerpc: wordcopy/memmove cleanup for ppc64
    
    This patch cleanup some multiarch code related to memmmove
    optimization. Initial IFUNC support added specialized wordcopy
    symbols which turned in local IFUNC calls used by memmove default
    implementation.
    
    This change by removing then and used the optimized memmove instead
    for supported chips.

Which glibc version does f22/f23 have?

Comment 7 Dan Horák 2015-07-03 08:51:23 UTC
glibc-2.21.90-16.fc23.ppc64
and
glibc-2.21-5.fc22.ppc64p7

Comment 8 Rafael Fonseca 2015-07-03 09:34:00 UTC
Created attachment 1045800 [details]
valgrind_v_v log

Attached is the log output of running valgrind -v -v for the package.

Comment 9 Mark Wielaard 2015-07-03 12:53:04 UTC
Thanks to siddish for the analysis.

What we think is happening is that in glibc 2.20 powerpc enabled MEMCPY_OK_FOR_FORWARD_MEMMOVE (commit glibc-2.19-778-gd6f68bb) which makes memmove just call memcpy in some cases. Normally this would not be an issue. valgrind would already have intercepted memmove so the memmove -> memcpy call wouldn't be there. But... this is a memmove copy in ld.so, which valgrind doesn't intercept. Valgrind does intercept memcpy in ld.so however. So now we see a memmove that calls into memcpy and valgrind complains that memcpy has overlaps.

This was "fixed" in the commit mentioned in comment #6 (commit glibc-2.21-50-gb269211) for glibc 2.22. Which renames the powerpc memmove to __ppc_memmove for internal calls. So now valgrind will not see/intercept that internal memcpy -> memmove call, and so won't do the overlap checking.

If we are right then the fix for glibc 2.20 and 2.21 on powerpc64 is to also intercept memmove calls in ld.so and not just libc.so in valgrind (just like we already do for memcpy).

Comment 10 Mark Wielaard 2015-07-03 12:55:59 UTC
(In reply to Mark Wielaard from comment #9)
> Thanks to siddish for the analysis.

Siddhesh of course. Sorry. If you credit someone, you should at least get their name right. My apologies.

Comment 11 Mark Wielaard 2015-07-05 22:56:46 UTC
I think I know what the solution is, but I am unable to replicate.

-bash-4.3$ valgrind ./drpm_test drpm_test_file_1.drpm drpm_test_file_2.drpm drpm_test_file_3.rpm drpm_test_file_4 drpm_test_file_5
==5071== Memcheck, a memory error detector
==5071== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==5071== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==5071== Command: ./drpm_test drpm_test_file_1.drpm drpm_test_file_2.drpm drpm_test_file_3.rpm drpm_test_file_4 drpm_test_file_5
==5071== 
[==========] Running 7 test(s).
[ RUN      ] test_drpm_read_err_mock
[       OK ] test_drpm_read_err_mock
[ RUN      ] test_drpm_read_err_input
[       OK ] test_drpm_read_err_input
[ RUN      ] test_drpm_read_ok
[       OK ] test_drpm_read_ok
[ RUN      ] test_drpm_get_uint
[       OK ] test_drpm_get_uint
[ RUN      ] test_drpm_get_string
[       OK ] test_drpm_get_string
[ RUN      ] test_drpm_destroy_err
[       OK ] test_drpm_destroy_err
[ RUN      ] test_drpm_destroy_ok
[       OK ] test_drpm_destroy_ok
[==========] 7 test(s) run.
[  PASSED  ] 7 test(s).
==5071== 
==5071== HEAP SUMMARY:
==5071==     in use at exit: 0 bytes in 0 blocks
==5071==   total heap usage: 426 allocs, 426 frees, 16,321,313 bytes allocated
==5071== 
==5071== All heap blocks were freed -- no leaks are possible
==5071== 
==5071== For counts of detected and suppressed errors, rerun with: -v
==5071== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

My system has glibc-2.21-5.fc22.ppc64. Is glibc-2.21-5.fc22.ppc64p7 a special version?

Comment 12 Mark Wielaard 2015-07-06 11:12:01 UTC
The patch that I assume will fix this is:

diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
index d4e5449..0f366bf 100644
--- a/shared/vg_replace_strmem.c
+++ b/shared/vg_replace_strmem.c
@@ -1141,6 +1141,10 @@ static inline void my_exit ( int x )
 #if defined(VGO_linux)
  MEMMOVE(VG_Z_LIBC_SONAME, memmove)
  MEMMOVE(VG_Z_LIBC_SONAME, __GI_memmove)
+ /* See bug #349828 Override for ld64.so.1 like memcpy, because for some
+    arches MEMCPY_OK_FOR_FORWARD_MEMMOVE is set, which might cause memmove
+    to call memcpy.  */
+ MEMMOVE(VG_Z_LD64_SO_1, memmove)
 
 #elif defined(VGO_darwin)
 # if DARWIN_VERS <= DARWIN_10_6

But since I have been unable to replicate the original issue I am not 100% sure this is the correct approach.

Comment 13 Mark Wielaard 2015-07-07 12:41:46 UTC
Turns out this only happens with the ppc64p7 glibc package variant, which only is installed on some newer powerpc hardware. With that it is easy to replicate (even /bin/ls will fail to run under valgrind). The proposed fix in comment #12 does indeed work.

Comment 14 Fedora Update System 2015-07-07 20:10:43 UTC
valgrind-3.10.1-13.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/valgrind-3.10.1-13.fc22

Comment 15 Matěj Chalk 2015-07-08 13:51:25 UTC
This problem occurs on f21 as well (http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=2620726).

Could this be fixed by a similar update for Fedora 21?

Comment 16 Fedora Update System 2015-07-08 15:41:00 UTC
valgrind-3.10.1-13.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/valgrind-3.10.1-13.fc21

Comment 17 Mark Wielaard 2015-07-08 15:41:40 UTC
(In reply to Matěj Chalk from comment #15)
> This problem occurs on f21 as well
> (http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=2620726).
> 
> Could this be fixed by a similar update for Fedora 21?

Yes. Update submitted.

Comment 18 Fedora Update System 2015-07-10 19:14:12 UTC
Package valgrind-3.10.1-13.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing valgrind-3.10.1-13.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-11310/valgrind-3.10.1-13.fc22
then log in and leave karma (feedback).

Comment 19 Fedora Update System 2015-07-19 01:55:37 UTC
valgrind-3.10.1-13.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2015-07-23 08:56:57 UTC
valgrind-3.10.1-13.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.