512121 – Valgrind --leak-check=full crashes the application

Bug 512121 - Valgrind --leak-check=full crashes the application

Summary: Valgrind --leak-check=full crashes the application

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	valgrind
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-07-16 12:56 UTC by Milan Crha
Modified:	2009-08-06 20:14 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-08-03 07:18:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Ouput of "strace -o strace-audacity.txt audacity" (205.48 KB, text/plain) 2009-07-28 13:59 UTC, Tom London	no flags	Details
View All

Description Milan Crha 2009-07-16 12:56:46 UTC

I tried to use valgrind to check evolution, but it crashes the application as soon as the GUI is up and running. I tried also with smaller application and it's quite the same with gtk-demo too, so I guess it's an issue with valgrind.

Steps to reproduce:
1) run on console this command:
   $ valgrind --leak-check=full gtk-demo

When all the application GUI initializes, the error on console is shown
Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting.
115	m_syswrap/syscall-x86-linux.S: No such file or directory.
Cannot access memory at address 0xfffffffffffffff3
Cannot access memory at address 0xfffffffffffffff3

and application crashes. Without valgrind it works fine. BugBuddy report attached, nothing useful there from my point of view.

I have this on Fedora 11 after yesterday's update. Some package versions I think are related:
valgrind-3.4.1-3.i586
gtk2-2.16.2-1.fc11.i586
gtk2-debuginfo-2.16.1-4.fc11.i586
gcc-4.4.0-4.i586
gcc-c++-4.4.0-4.i586
gdb-6.8.50.20090302-33.fc11.i586

Doing pretty the same on Fedora 10 doesn't do this. There I have valgrind-3.3.0-4.x86_64

Comment 1 Jakub Jelinek 2009-07-28 08:21:37 UTC

Of course valgrind massively changes the timings, that's the only thing where using valgrind matters here.  In rawhide it crashes even without valgrind.

I've instrumented libpthread a little bit and the first failure from pthread_mutex_unlock I get in:

failure at 152 owner 0 tid 14832 lock 14832 count 0 nusers 1
==14832==    at 0xA75D388: __pthread_mutex_unlock_full (pthread_mutex_unlock.c:152)
==14832==    by 0xA75E357: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:201)
==14832==    by 0x3B5E6427EC: pa_cond_wait (in /usr/lib64/libpulsecommon-0.9.15.so)
==14832==    by 0x3B5EA341BF: pa_threaded_mainloop_wait (in /usr/lib64/libpulse.so.0.8.0)
==14832==    by 0xB6943CC: pulse_driver_open (in /usr/lib64/libcanberra-0.12/libcanberra-pulse.so)
==14832==    by 0x3B62E0BC79: (within /usr/lib64/libcanberra.so.0.1.5)
==14832==    by 0x3B62E03307: (within /usr/lib64/libcanberra.so.0.1.5)
==14832==    by 0x3B62E03B5B: ca_context_play_full (in /usr/lib64/libcanberra.so.0.1.5)
==14832==    by 0x3B63A02514: ca_gtk_play_for_widget (in /usr/lib64/libcanberra-gtk.so.0.0.5)
==14832==    by 0xA551538: (within /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so)
==14832==    by 0x3B5A037ABD: g_main_context_dispatch (in /lib64/libglib-2.0.so.0.2000.4)
==14832==    by 0x3B5A03B277: (within /lib64/libglib-2.0.so.0.2000.4)

This is a PI recursive mutex. owner, lock, count and nusers are the values of the mutex->__data.__* fields, tid is current thread's tid.

Comment 2 Jakub Jelinek 2009-07-28 08:26:23 UTC

*** Bug 513854 has been marked as a duplicate of this bug. ***

Comment 3 Jakub Jelinek 2009-07-28 09:14:27 UTC

The problem is that during pthread_mutex_unlock FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG returns ENOSYS and that's actually returned by valgrind:
   switch(ARG2) {
   case VKI_FUTEX_WAIT:
   case VKI_FUTEX_WAIT | VKI_FUTEX_PRIVATE_FLAG:
      if (ARG4 != 0)
         PRE_MEM_READ( "futex(timeout)", ARG4, sizeof(struct vki_timespec) );
      break;

   case VKI_FUTEX_REQUEUE:
   case VKI_FUTEX_REQUEUE | VKI_FUTEX_PRIVATE_FLAG:
   case VKI_FUTEX_CMP_REQUEUE:
   case VKI_FUTEX_CMP_REQUEUE | VKI_FUTEX_PRIVATE_FLAG:
      PRE_MEM_READ( "futex(futex2)", ARG5, sizeof(Int) );
      break;

   case VKI_FUTEX_WAKE:
   case VKI_FUTEX_WAKE | VKI_FUTEX_PRIVATE_FLAG:
   case VKI_FUTEX_FD:
      /* no additional pointers */
      break;

   default:
      SET_STATUS_Failure( VKI_ENOSYS );   // some futex function we don't understand
      break;

Now, in F12 when not under valgrind, I wonder if it is a similar case where that syscall fails.  Can anyone try to strace it?

Comment 4 Tom London 2009-07-28 13:59:43 UTC

Created attachment 355413 [details]
Ouput of "strace -o strace-audacity.txt audacity"

After updating gcc and glibc packages:

Updated:
  gcc.x86_64 0:4.4.1-3   glibc.x86_64 0:2.10.90-10   libgcj.x86_64 0:4.4.1-3  

Dependency Updated:
  cpp.x86_64 0:4.4.1-3                   gcc-c++.x86_64 0:4.4.1-3               
  gcc-gfortran.x86_64 0:4.4.1-3          glibc-common.x86_64 0:2.10.90-10       
  glibc-devel.x86_64 0:2.10.90-10        glibc-headers.x86_64 0:2.10.90-10      
  libgcc.x86_64 0:4.4.1-3                libgfortran.x86_64 0:4.4.1-3           
  libgomp.x86_64 0:4.4.1-3               libstdc++.x86_64 0:4.4.1-3             
  libstdc++-devel.x86_64 0:4.4.1-3      

Complete!
[root@tlondon ~]# 


I get this strace/crash running audacity (also get crash running rhythmbox):
[tbl@tlondon ~]$ strace -o audacity-strace.txt audacity
Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting.
ptrace: Operation not permitted.
/home/tbl/3336: No such file or directory.
No stack.
[tbl@tlondon ~]$

Could this be useful:
open("/var/lib/dbus/machine-id", O_RDONLY) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=33, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f74fcb95000
read(20, "1e4008bf6214497396dedf114a4d23e9\n"..., 4096) = 33
close(20)                               = 0
munmap(0x7f74fcb95000, 4096)            = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 20
fcntl(20, F_GETFD)                      = 0
fcntl(20, F_SETFD, FD_CLOEXEC)          = 0
setsockopt(20, SOL_SOCKET, SO_PRIORITY, [6], 4) = 0
fcntl(20, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
connect(20, {sa_family=AF_FILE, path="/home/tbl/.pulse/1e4008bf6214497396dedf114a4d23e9:runtime/native"...}, 110) = 0
futex(0x2f1b6e0, FUTEX_UNLOCK_PI_PRIVATE, 0) = 0
futex(0x2e13724, 0x8b /* FUTEX_??? */, 1) = 0
write(2, "Assertion 'pthread_mutex_unlock(&"..., 126) = 126
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(3336, 3336, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---

Comment 5 Ulrich Drepper 2009-07-28 16:45:09 UTC

I've checked in upstream a patch for this.  Untested, since I don't have such a new kernel which actually works with my machines.

Comment 6 Jakub Jelinek 2009-08-03 07:18:18 UTC

The valgrind bug is fixed in rawhide valgrind packages, and so is the glibc bug fixed in rawhide glibc packages.

Comment 7 Milan Crha 2009-08-03 09:17:34 UTC

F11 had been "just released", can that be ported there too, please? Because not having basic development tools available in Fedora is not the best thing, at least from my point of view.

Comment 8 Milan Crha 2009-08-06 19:45:30 UTC

I'm not going to install broken rawhide, because I'm supposed to develop for other application, and even I do not understand what's the problem with "backporting" patches for this to F11, then I'm willing to compile my own local *upstream* versions of valgrind/glib, to be able to continue with tools I need for my work. What are the exact upstream versions/commits where this bug had been fixed, please?

Comment 9 Marc-Andre Lureau 2009-08-06 20:14:44 UTC

obviously, other people and distributions might be interested, so I would also appreciate those pointers. thanks!

Note You need to log in before you can comment on or make changes to this bug.