Bug 518247

Summary: Valgrind doesn't handle SSE optimized string/memory functions
Product: [Fedora] Fedora Reporter: H.J. Lu <hongjiu.lu>
Component: valgrindAssignee: Dodji Seketeli <dodji>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: chemobejk, dodji, erik-fedora, jakub, kalevlember, lpoetter, mclasen, meyering, tbzatek, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-04 15:28:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 473302    
Attachments:
Description Flags
Test case for strlen none

Description H.J. Lu 2009-08-19 16:33:37 UTC
SSE optimized string/memory functions may read beyond the memory
boundary when it is safe to do so. Valgrind doesn't handle them.
I got tons of messages like:

==11893== Invalid read of size 8
==11893==    at 0x52FCB4C: __GI_strncmp (in /lib64/libc-2.10.90.so)
==11893==    by 0x52A71B3: _nl_load_locale_from_archive (in /lib64/libc-2.10.90.
so)
==11893==    by 0x52A6371: _nl_find_locale (in /lib64/libc-2.10.90.so)
==11893==    by 0x52A5A0B: setlocale (in /lib64/libc-2.10.90.so)
==11893==    by 0xA4F682: gcc_init_libintl (intl.c:54)

Comment 1 Lennart Poettering 2009-08-24 16:03:21 UTC
Hmm, this is a real problem making valgrind's output pretty much useless on rawhide right now.

Comment 2 Lennart Poettering 2009-09-01 16:52:37 UTC
Is there any chance we can get this fixed quickly? A lot of people are depending that debugging helpers such as vg work to fix the remaining bugs in F12!

Comment 3 Richard W.M. Jones 2009-09-07 09:08:21 UTC
Created attachment 359971 [details]
Test case for strlen

I also hit this bug over the weekend.  My simple
test case is attached.

Comment 4 Jakub Jelinek 2009-10-02 13:41:09 UTC
*** Bug 526912 has been marked as a duplicate of this bug. ***

Comment 5 Stefan Becker 2009-10-03 09:07:55 UTC
If I'm not mistaken this is also the root cause why valgrind shows false-positives for memory leaks. One example from the pidgin plugin I'm trying to debug:

Linux l3f1199 2.6.31.1-56.fc12.i686 #1 SMP Tue Sep 29 16:32:02 EDT 2009 i686 i686 i386 GNU/Linux

==9609== 1 bytes in 1 blocks are possibly lost in loss record 30 of 8,851
==9609==    at 0x402794C: malloc (vg_replace_malloc.c:195)
==9609==    by 0x4E336B4: g_malloc (gmem.c:131)
==9609==    by 0x4E4C2A9: g_strdup (gstrfuncs.c:102)
==9609==    by 0x643AB8F: sipmsg_parse_msg (sipmsg.c:43)

 33: struct sipmsg *sipmsg_parse_msg(const gchar *msg) {
...
 43: 	smsg->body = g_strdup(tmp + 4);

gstrfuncs.c:
101:       length = strlen (str) + 1;
102:       new_str = g_new (char, length);
103:       memcpy (new_str, str, length);

gmem.c:
131:       mem = glib_mem_vtable.malloc (n_bytes);

The corresponding free function:

240: void sipmsg_free(struct sipmsg *msg) {
...
261: 	g_free(msg->body);

Personally I would move this error to high priority, because valgrind is completely useless :(

Comment 6 Matthias Clasen 2009-10-04 20:12:31 UTC
The patch in https://bugs.kde.org/show_bug.cgi?id=206013 seems to work ok for me.
I've put my local builds with that patch up at http://mclasen.fedorapeople.org/

Comment 7 Dodji Seketeli 2009-10-05 07:50:13 UTC
(In reply to comment #6)
> The patch in https://bugs.kde.org/show_bug.cgi?id=206013 seems to work ok for
> me.
> I've put my local builds with that patch up at http://mclasen.fedorapeople.org/  

Thanks Matthias for the package.

Guys, please update this bug with your feedback about the patched package Matthias has put on http://mclasen.fedorapeople.org/. If it works well enough for you we'll update the rawhide valgrind package with that patch.

Comment 8 Stefan Becker 2009-10-05 10:24:11 UTC
Tested on i686: the invalid reads are gone, so I only get heap errors in my testing.

False positives are still there though, but I guess that is a different problem.

Comment 9 Dodji Seketeli 2009-10-05 12:11:09 UTC
(In reply to comment #8)
> Tested on i686: the invalid reads are gone, so I only get heap errors in my
> testing.
> 
> False positives are still there though, but I guess that is a different
> problem.

Thanks for testing.

As for the possible false positive, I'd recommand to file a different bug with a bit more information, so that we can track it with more detail.

Did the package you tested make valgrind useful for you or is it still useless?

Most of the programs I have tested it on didn't spit any false positive for me so far, unless some python related program. But even there, I haven't really investigated so I am not sure those are false positive yet.

Thanks.

Comment 10 Stefan Becker 2009-10-05 12:22:14 UTC
(In reply to comment #9)
> 
> Did the package you tested make valgrind useful for you or is it still useless?

It is still useless, unfortunately. When I have time I'll try running valgrind and that SAME package on a F11 machine. AFAIR the false-positives started once I updated my development to F12-rawhide.

Comment 11 Stefan Becker 2009-10-05 13:49:07 UTC
(In reply to comment #9)
> 
> As for the possible false positive, I'd recommand to file a different bug with
> a bit more information, so that we can track it with more detail.

See bug #527230

Comment 12 Dodji Seketeli 2009-10-05 13:53:44 UTC
Thanks Stefan.

Comment 13 Jakub Jelinek 2009-10-12 17:43:03 UTC
I've built valgrind-3.5.0-2 with this patch into dist-12-updates-candidate, but I see a bunch of testsuite regressions compared to 3.5.0-1 (which admittedly was run against much older glibc).  Some regressions are caused by glibc change to print -nan instead of nan for NaN values with sign bit set, but others, e.g.
coolo_strlen are clearly ifunc related:

--- coolo_strlen.stderr.exp     2009-08-19 15:37:29.000000000 +0200
+++ coolo_strlen.stderr.out     2009-10-12 18:46:42.000000000 +0200
@@ -1,2 +1,32 @@


+valgrind: m_oset.c:535 (vgPlain_OSetGen_Lookup): Assertion 't' failed.
+   at 0x........: report_and_quit (m_libcassert.c:145)
+   by 0x........: vgPlain_assert_fail (m_libcassert.c:217)
+   by 0x........: vgPlain_OSetGen_Lookup (m_oset.c:535)
+   by 0x........: vgPlain_redir_change_ifunc_target (m_redir.c:509)
+   by 0x........: vgPlain_scheduler (scheduler.c:1404)
+   by 0x........: run_a_thread_NORETURN (syswrap-linux.c:91)
+
+sched status:
+  running_tid=1
+
+Thread 1: status = VgTs_Runnable
+   at 0x........: * (vg_preloaded.c:92)
+   by 0x........: _dl_fixup (in /lib64/ld-2.10.90.so)
+   by 0x........: _dl_runtime_resolve (in /lib64/ld-2.10.90.so)
+   by 0x........: main (coolo_strlen.c:7)
...

Comment 14 Dodji Seketeli 2009-10-12 21:24:46 UTC
I am looking at this.

Comment 15 Jim Meyering 2009-10-26 20:08:19 UTC
Jakub, thanks for all of the work.
On F12, I was seeing way too much noise Invalid read of size 8 in e.g., __GI_strlen and __GI_strcmp.

I installed the latest from koji, valgrind.x86_64 1:3.5.0-6, and it seems to have solved the problems.