SSE optimized string/memory functions may read beyond the memory boundary when it is safe to do so. Valgrind doesn't handle them. I got tons of messages like: ==11893== Invalid read of size 8 ==11893== at 0x52FCB4C: __GI_strncmp (in /lib64/libc-2.10.90.so) ==11893== by 0x52A71B3: _nl_load_locale_from_archive (in /lib64/libc-2.10.90. so) ==11893== by 0x52A6371: _nl_find_locale (in /lib64/libc-2.10.90.so) ==11893== by 0x52A5A0B: setlocale (in /lib64/libc-2.10.90.so) ==11893== by 0xA4F682: gcc_init_libintl (intl.c:54)
Hmm, this is a real problem making valgrind's output pretty much useless on rawhide right now.
Is there any chance we can get this fixed quickly? A lot of people are depending that debugging helpers such as vg work to fix the remaining bugs in F12!
Created attachment 359971 [details] Test case for strlen I also hit this bug over the weekend. My simple test case is attached.
*** Bug 526912 has been marked as a duplicate of this bug. ***
If I'm not mistaken this is also the root cause why valgrind shows false-positives for memory leaks. One example from the pidgin plugin I'm trying to debug: Linux l3f1199 2.6.31.1-56.fc12.i686 #1 SMP Tue Sep 29 16:32:02 EDT 2009 i686 i686 i386 GNU/Linux ==9609== 1 bytes in 1 blocks are possibly lost in loss record 30 of 8,851 ==9609== at 0x402794C: malloc (vg_replace_malloc.c:195) ==9609== by 0x4E336B4: g_malloc (gmem.c:131) ==9609== by 0x4E4C2A9: g_strdup (gstrfuncs.c:102) ==9609== by 0x643AB8F: sipmsg_parse_msg (sipmsg.c:43) 33: struct sipmsg *sipmsg_parse_msg(const gchar *msg) { ... 43: smsg->body = g_strdup(tmp + 4); gstrfuncs.c: 101: length = strlen (str) + 1; 102: new_str = g_new (char, length); 103: memcpy (new_str, str, length); gmem.c: 131: mem = glib_mem_vtable.malloc (n_bytes); The corresponding free function: 240: void sipmsg_free(struct sipmsg *msg) { ... 261: g_free(msg->body); Personally I would move this error to high priority, because valgrind is completely useless :(
The patch in https://bugs.kde.org/show_bug.cgi?id=206013 seems to work ok for me. I've put my local builds with that patch up at http://mclasen.fedorapeople.org/
(In reply to comment #6) > The patch in https://bugs.kde.org/show_bug.cgi?id=206013 seems to work ok for > me. > I've put my local builds with that patch up at http://mclasen.fedorapeople.org/ Thanks Matthias for the package. Guys, please update this bug with your feedback about the patched package Matthias has put on http://mclasen.fedorapeople.org/. If it works well enough for you we'll update the rawhide valgrind package with that patch.
Tested on i686: the invalid reads are gone, so I only get heap errors in my testing. False positives are still there though, but I guess that is a different problem.
(In reply to comment #8) > Tested on i686: the invalid reads are gone, so I only get heap errors in my > testing. > > False positives are still there though, but I guess that is a different > problem. Thanks for testing. As for the possible false positive, I'd recommand to file a different bug with a bit more information, so that we can track it with more detail. Did the package you tested make valgrind useful for you or is it still useless? Most of the programs I have tested it on didn't spit any false positive for me so far, unless some python related program. But even there, I haven't really investigated so I am not sure those are false positive yet. Thanks.
(In reply to comment #9) > > Did the package you tested make valgrind useful for you or is it still useless? It is still useless, unfortunately. When I have time I'll try running valgrind and that SAME package on a F11 machine. AFAIR the false-positives started once I updated my development to F12-rawhide.
(In reply to comment #9) > > As for the possible false positive, I'd recommand to file a different bug with > a bit more information, so that we can track it with more detail. See bug #527230
Thanks Stefan.
I've built valgrind-3.5.0-2 with this patch into dist-12-updates-candidate, but I see a bunch of testsuite regressions compared to 3.5.0-1 (which admittedly was run against much older glibc). Some regressions are caused by glibc change to print -nan instead of nan for NaN values with sign bit set, but others, e.g. coolo_strlen are clearly ifunc related: --- coolo_strlen.stderr.exp 2009-08-19 15:37:29.000000000 +0200 +++ coolo_strlen.stderr.out 2009-10-12 18:46:42.000000000 +0200 @@ -1,2 +1,32 @@ +valgrind: m_oset.c:535 (vgPlain_OSetGen_Lookup): Assertion 't' failed. + at 0x........: report_and_quit (m_libcassert.c:145) + by 0x........: vgPlain_assert_fail (m_libcassert.c:217) + by 0x........: vgPlain_OSetGen_Lookup (m_oset.c:535) + by 0x........: vgPlain_redir_change_ifunc_target (m_redir.c:509) + by 0x........: vgPlain_scheduler (scheduler.c:1404) + by 0x........: run_a_thread_NORETURN (syswrap-linux.c:91) + +sched status: + running_tid=1 + +Thread 1: status = VgTs_Runnable + at 0x........: * (vg_preloaded.c:92) + by 0x........: _dl_fixup (in /lib64/ld-2.10.90.so) + by 0x........: _dl_runtime_resolve (in /lib64/ld-2.10.90.so) + by 0x........: main (coolo_strlen.c:7) ...
I am looking at this.
Jakub, thanks for all of the work. On F12, I was seeing way too much noise Invalid read of size 8 in e.g., __GI_strlen and __GI_strcmp. I installed the latest from koji, valgrind.x86_64 1:3.5.0-6, and it seems to have solved the problems.