Hide Forgot
Description of problem: After updating to newest gcc/glibc, I am getting this message when I start applications that use sound: Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting. So far, I've seen this with rhythmbox, audacity, qemu-kvm, totem, ... This a side effect of the "f12 rebuild"? Will this be fixed when the pulseaudio packages are rebuilt? Version-Release number of selected component (if applicable): pulseaudio-libs-glib2-0.9.16-2.test2.fc12.x86_64 glibc-headers-2.10.90-7.1.x86_64 glibc-devel-2.10.90-7.1.x86_64 pulseaudio-libs-0.9.16-2.test2.fc12.x86_64 glibc-common-2.10.90-7.1.x86_64 pulseaudio-debuginfo-0.9.16-2.test2.fc12.x86_64 pulseaudio-utils-0.9.16-2.test2.fc12.x86_64 pulseaudio-module-gconf-0.9.16-2.test2.fc12.x86_64 pulseaudio-0.9.16-2.test2.fc12.x86_64 glibc-2.10.90-7.1.x86_64 pulseaudio-module-bluetooth-0.9.16-2.test2.fc12.x86_64 gcc-4.4.1-3.x86_64 pulseaudio-module-x11-0.9.16-2.test2.fc12.x86_64 How reproducible: Every time, all the time. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Didn't provide versions of failing apps: rhythmbox-0.12.3-1.fc12.x86_64 totem-2.27.2-1.fc12.x86_64 audacity-1.3.8-0.2.betamp3.fc12.x86_64 qemu-kvm-0.10.50-13.kvm88.fc12.x86_64
I rebuilt audacity locally, and still get the same error: [tbl@tlondon audacity-src-1.3.8]$ ./audacity Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting. Also, I notice 2 "run away" processes: [root@tlondon ~]# top top - 11:14:57 up 2:08, 4 users, load average: 2.07, 2.35, 2.67 Tasks: 168 total, 3 running, 165 sleeping, 0 stopped, 0 zombie Cpu(s): 98.0%us, 1.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 3884452k total, 2298684k used, 1585768k free, 90604k buffers Swap: 5963768k total, 0k used, 5963768k free, 1367748k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1628 tbl 20 0 519m 6676 5332 R 97.9 0.2 116:44.18 canberra-gtk-pl 1654 tbl 20 0 441m 11m 9276 R 97.2 0.3 97:06.87 gpk-update-icon 1292 root 20 0 292m 75m 12m S 1.0 2.0 3:22.12 Xorg 2007 tbl 20 0 457m 18m 9.9m S 1.0 0.5 1:13.39 npviewer.bin 29782 root 20 0 13020 1192 864 R 1.0 0.0 0:00.22 top The first is canberra-gtk-play, the second is gpk-update-icon. Not sure the second is related to this....
I've tried downgrade to version 0.9.15 which used to work somehow before... With this version 0.9.15 my desktop doesn't crash for now (like with 0.9.16) - but i.e. mplayer must be used with -ao oss (and changing volume slider via aumix) to get some sound out of my machine. This pulseaudio thing becomes really horrible nightmare..... It could not be easily purged from my system...
I've recovered "working pulseaudio and sound apps" by downgrading glibc and gcc packages: cpp-4.4.1-1.x86_64.rpm libgcc-4.4.1-1.x86_64.rpm gcc-4.4.1-1.x86_64.rpm libgcj-4.4.1-1.x86_64.rpm gcc-c++-4.4.1-1.x86_64.rpm libgfortran-4.4.1-1.x86_64.rpm gcc-gfortran-4.4.1-1.x86_64.rpm libgomp-4.4.1-1.x86_64.rpm glibc-2.10.90-3.x86_64.rpm libstdc++-4.4.1-1.x86_64.rpm glibc-common-2.10.90-3.x86_64.rpm libstdc++-devel-4.4.1-1.x86_64.rpm glibc-devel-2.10.90-3.x86_64.rpm nscd-2.10.90-3.x86_64.rpm glibc-headers-2.10.90-3.x86_64.rpm With this, pulseaudio functions great for me....
Assigning to glibc hence. Hey you, glibc/gcc folks, you broke mutexes!
Also see bug 497650
Have you verified that you are never calling pthread_mutex_unlock in a different thread than the one that actually acquired the mutex? For PI mutexes EPERM in pthread_mutex_unlock is then returned (similarly for error checking mutex). There haven't been any changes in the PI mutex glibc code for months btw., but most of the work is done in the kernel, not glibc.
Am seeing hangs in firefox (3.6b1 nightly from last week), top of the stack looks like this: (gdb) where #0 0x00007fbfee69a314 in pa_tls_get () from /usr/lib64/libpulsecommon-0.9.15.so #1 0x00007fbfee699b52 in ?? () from /usr/lib64/libpulsecommon-0.9.15.so #2 0x00007fbfee699fe5 in pa_thread_self () from /usr/lib64/libpulsecommon-0.9.15.so #3 0x00007fbfee8e473b in ?? () from /usr/lib64/libpulse.so.0 #4 0x00007fbfee8e5133 in pa_threaded_mainloop_wait () from /usr/lib64/libpulse.so.0 #5 0x00007fbfeeafe832 in pulse_driver_open () from /usr/lib64/libcanberra-0.14/libcanberra-pulse.so #6 0x0000003a5400bc52 in ?? () from /usr/lib64/libcanberra.so.0 #7 0x0000003a540034a8 in ?? () from /usr/lib64/libcanberra.so.0 #8 0x000000000001183f in ?? () #9 0x00007fbff8b8a000 in ?? () #10 0x00007fbff0d09c40 in ?? () #11 0x0000003a54003cb5 in ca_context_play_full () from /usr/lib64/libcanberra.so.0 #12 0x0000003a54004060 in ca_context_play () from /usr/lib64/libcanberra.so.0 And we just spin our wheels at 100% user-mode CPU after that. Was working with glibc-2.10.90-3.x86_64, but broke today with glibc-2.10.90-7.1.x86_64. For what it's worth, I'm pretty sure it's not the kernel (am running a self-compiled 2.6.31-rc3-mmotm0716 here), because firefox was working this morning, and broke after 'yum update', and there's been no reboots in the meantime.
Quick-n-dirty *partial* workaround: % export CANBERRA_DRIVER="null" % your_busted_pulseaudio_program_here Not a total fix, but will at least keep canberra from triggering the problem behind the program's back. Programs invoking pulseaudio directly will still have issues, of course. (Ended up having to grovel through the libcanberra sources to find that...)
As shown in #512121, this is nothing new in F12. *** This bug has been marked as a duplicate of bug 512121 ***
Reopening, as this is a different problem. Should happen on x86-64 only ATM, the problem is that with 2.6.31ish kernels where FUTEX_WAIT_REQUEUE_PI/FUTEX_CMP_REQUEUE_PI futex ops are newly supported the kernel locks mutex->__data.__locks, but nothing already updates mutex->__data.__owner to the current thread id and mutex->__data.__count to 1, so the next pthread_mutex_unlock fails when using recursive PI mutexes. Why does PA do that btw? You need to be extremely careful with those only to call pthread_cond_*wait with the mutex lock nesting exactly 1, otherwise really bad things happen.
(In reply to comment #11) > Why does PA do that btw? You need to be extremely careful with those only to > call pthread_cond_*wait with the mutex lock nesting exactly 1, otherwise really > bad things happen. Hmm? PA does what exactly? Is that a question about recursive use of mutexes? I am pretty sure that libcanberra only takes the lock once. The lock is recursive only because we export it pretty much directly to the clients, and in some situations it did make sense to have it recursive for that.
So let me clarify this: PA creates that mutex as PI and as recursive since it is exported to the caller almost directly. And if the caller uses RT then he wants PI and in some other cases he might want a recursive lock. However, in most cases it's completely safe to go without either. For testing I have now disabled PI for the mutex in question and things work correctly again. I see some other issue here, which is probably related. In some situations pthread_cond_wait() returns with nothing having happened, and then causes us to enter a spin loop since we call cond_wait in a loop.
(In reply to comment #13) > I see some other issue here, which is probably related. In some situations > pthread_cond_wait() returns with nothing having happened, and then causes us to > enter a spin loop since we call cond_wait in a loop. Tracking that in bug 513629.
(In reply to comment #14) > (In reply to comment #13) > > > I see some other issue here, which is probably related. In some situations > > pthread_cond_wait() returns with nothing having happened, and then causes us to > > enter a spin loop since we call cond_wait in a loop. > > Tracking that in bug 513629. pthread_cond_wait is defined to allow spurious returns. See: http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_timedwait.html "When using condition variables there is always a boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_wait() or pthread_cond_timedwait() functions may occur. Since the return from pthread_cond_wait() or pthread_cond_timedwait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return."
Basically any time you call pthread_cond_wait, it must be in a loop, and you have to pair a call to pthread_mutex_lock with it. The pattern looks like this: pthread_mutex_lock (&lock); while (!my_condition) pthread_cond_wait (&cv, &lock); So it looks like the code in thread-mainloop.c:pa_threaded_mainloop_signal if (wait_for_accept && m->n_waiting > 0) pa_cond_wait(m->accept_cond, m->mutex); needs to be a while loop. But the next one in pa_threaded_mainloop_wait I'm less sure how it's supposed to work. I'm not able to immediately wrap my head around the design of the whole file, but it looks like pulse needs to take this into account.
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > > > I see some other issue here, which is probably related. In some situations > > > pthread_cond_wait() returns with nothing having happened, and then causes us to > > > enter a spin loop since we call cond_wait in a loop. > > > > Tracking that in bug 513629. > > pthread_cond_wait is defined to allow spurious returns. See: > > http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_timedwait.html > > "When using condition variables there is always a boolean predicate involving > shared variables associated with each condition wait that is true if the thread > should proceed. Spurious wakeups from the pthread_cond_wait() or > pthread_cond_timedwait() functions may occur. Since the return from > pthread_cond_wait() or pthread_cond_timedwait() does not imply anything about > the value of this predicate, the predicate should be re-evaluated upon such > return." The _wait() is only called from with the mutex locked. The wait that matters here is the one in pa_threaded_mainloop_wait(), not the one in pa_threaded_mainloop_signal() which almost never used (but probably should be fixed) Also, let's keep track of that in 513629.
Of course _wait() has to be always called with the mutex locked, that's the requirement. The spurious wakeups may happen even then, so if pa doesn't loop checking the condition, it is buggy.
PA doesn't loop. The _wait() is exported 1:1 to the applications as part of the API. And the software I wrote does all the loops. Including libcanberra which is the software in question here. You can easily verify this in the sources: http://git.0pointer.de/?p=libcanberra.git;a=blob;f=src/pulse.c;h=2a61c8fbc5d3a418f7cfd71d45bf1dab17985645;hb=HEAD#l375 Look for the _wait()'s there and you'll see that they are all inside of loops and _lock() called before that loop and _unlock() after. Really, that's not the problem here. The problem is apparently that _wait() does not actually wait for some reasons and we enter a spin loop.
Should be fixed in rawhide (2.10.90-11). If not, reopen.
Just for the record, I updated to 2.10.90-11 and rebooted. No longer get the error, and sound/pulseaudio apps now work.
Confirming - 90-11 eliminates the assert failures in Pulseaudio library calls, and seems to have fixed the Firefox hang (which was almost certainly bug #513629).