Bug 548989 - PI mutexes are broken (again)
Summary: PI mutexes are broken (again)
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Andreas Schwab
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 514060 538638 541359 541420 548259 549247 552512 552544 552553 552590 552595 553756 555262 (view as bug list)
Depends On:
Blocks: F13Beta, F13BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2009-12-19 20:58 UTC by Bruno Wolff III
Modified: 2010-01-21 09:46 UTC (History)
27 users (show)

Fixed In Version: glibc-2.11.90-10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-01-21 09:46:55 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
simple testcase (2.05 KB, text/plain)
2009-12-21 19:31 UTC, Michal Schmidt
no flags Details
possible fix (941 bytes, patch)
2009-12-30 16:45 UTC, Michal Schmidt
no flags Details | Diff
Fix pthread_cond_wait with requeue-PI on i386 (1.84 KB, patch)
2009-12-30 21:56 UTC, Michal Schmidt
no flags Details | Diff

Description Bruno Wolff III 2009-12-19 20:58:05 UTC
Description of problem:
Recently (about a week ago) applications trying to use pulse audio started crashing. Pulseaudio has not been upgraded since before the problem started, so something else triggered it. As sample error message from gmplayer is:

Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting.

Version-Release number of selected component (if applicable):
pulseaudio-0.9.21-3.fc13.i686

How reproducible:
I am seeing this with gmplayer, xmms with both the alsa and pulse plugins, and the games tremulous and Battle for Wesnoth. One odd thing is that if I started Wesnoth with the sound sliders set to no sound and then changed them to get sound things worked. Restarting the game once the sliders were set so that music was set to be played failed with essentially the same error message as for gmplayer.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Lennart Poettering 2009-12-20 12:11:25 UTC
I am pretty sure this is again the fault of the PI mutex code in the kernel. Will verify.

Comment 2 Lennart Poettering 2009-12-20 13:31:23 UTC
Yes, verified. Either the kernel or glibc are at fault here. Reassigning.

(RT folks, why do you break this every second month? Broken PI mutex unlocking is a recurring story...)

Comment 3 Lennart Poettering 2009-12-20 13:41:15 UTC
*** Bug 538638 has been marked as a duplicate of this bug. ***

Comment 4 Lennart Poettering 2009-12-20 13:44:27 UTC
*** Bug 548259 has been marked as a duplicate of this bug. ***

Comment 5 Lennart Poettering 2009-12-20 14:57:56 UTC
*** Bug 541420 has been marked as a duplicate of this bug. ***

Comment 6 Lennart Poettering 2009-12-20 14:59:07 UTC
*** Bug 541359 has been marked as a duplicate of this bug. ***

Comment 7 Owen Taylor 2009-12-20 15:14:32 UTC
*** Bug 549134 has been marked as a duplicate of this bug. ***

Comment 8 Michal Schmidt 2009-12-20 17:37:30 UTC
(In reply to comment #7)
> *** Bug 549134 has been marked as a duplicate of this bug. ***  

That one has a different backtrace. The assertion failure was in pa_pstream_send_tagstruct_with_creds, not in pa_mutex_unlock like it was in all the others. Not sure if it is really duplicate.

And all the other duplicates are from i686.

Comment 9 Michal Schmidt 2009-12-20 17:38:36 UTC
(In reply to comment #2)
> Yes, verified. Either the kernel or glibc are at fault here.

Lennart,
how do you verify it? Do you have a simple test case?

Comment 10 Bill Davidsen 2009-12-21 02:57:25 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > *** Bug 549134 has been marked as a duplicate of this bug. ***  
> 
> That one has a different backtrace. The assertion failure was in
> pa_pstream_send_tagstruct_with_creds, not in pa_mutex_unlock like it was in all
> the others. Not sure if it is really duplicate.
> 
> And all the other duplicates are from i686.  

I looked at these since my report was classified as a duplicate as well, and I'm uncertain if this is just some damage in another place from a common problem, or if all metacity bugs were classified as duplicates. On initial reading I'm with you, I don't think 549134 is the same thing.

I do think that what we are all seeing in the i686 is more common and should be addresses first, then 549134 can be retested.

Comment 11 Lennart Poettering 2009-12-21 11:10:28 UTC
(In reply to comment #9)
> (In reply to comment #2)
> > Yes, verified. Either the kernel or glibc are at fault here.
> 
> Lennart,
> how do you verify it? Do you have a simple test case?  

In the PA dev tree is a relatively simple example which I used, but its not exactly trivial to compile because the dev tree pulls in quite a few dependencies.

Comment 12 Lennart Poettering 2009-12-21 11:12:50 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > *** Bug 549134 has been marked as a duplicate of this bug. ***  
> 
> That one has a different backtrace. The assertion failure was in
> pa_pstream_send_tagstruct_with_creds, not in pa_mutex_unlock like it was in all
> the others. Not sure if it is really duplicate.

Oops. Goot catch. I have split this up again now. Got confused by the bt since it also included an _unlock() call...

Comment 13 Lennart Poettering 2009-12-21 12:18:19 UTC
*** Bug 549247 has been marked as a duplicate of this bug. ***

Comment 14 Lennart Poettering 2009-12-21 12:20:18 UTC
bug 549247 suggests glibc needs fixing, not the kernel. Tentatively reassigning.

Comment 15 Bruno Wolff III 2009-12-21 16:23:45 UTC
I also tried going back to an older glibc.Downgrading to glibc-2.11.90-3.i686 (same for other packages from that src rpm) got sound working again.

Comment 16 Michal Schmidt 2009-12-21 19:31:57 UTC
Created attachment 379689 [details]
simple testcase

I was able to reproduce in an i686 KVM guest with F12 + glibc-2.11.90-4 from Rawhide. Here's a minimal testcase, loosely based on thread-mainloop-test.c from Pulseaudio.

Comment 17 Adam Williamson 2009-12-22 15:48:00 UTC
This should block beta as basic sound functionality is a beta criterion.

Comment 18 Rahul Sundaram 2009-12-29 14:34:36 UTC
*** Bug 514060 has been marked as a duplicate of this bug. ***

Comment 19 Michal Schmidt 2009-12-30 16:45:15 UTC
Created attachment 380966 [details]
possible fix

glibc-2.11.90-4 introduced requeue-PI support on i386 (x86_64 already had it). It seems the problem is in the new code.

This is the upstream commit:
http://sourceware.org/git/?p=glibc.git;a=commit;h=75956694f3f80a1c32389c95069641f52c236c8b

I reviewed it and I believe I found the bug in pthread_cond_wait. I am attaching a possible fix. So far it's completely untested, I'm building glibc with it now.

Comment 20 Michal Schmidt 2009-12-30 21:56:19 UTC
Created attachment 381009 [details]
Fix pthread_cond_wait with requeue-PI on i386

The previous patch caused a segfault. Here's a new one. It works for me. It fixes pthread_cond_timedwait too, though I tested only pthread_cond_wait so far.

Comment 21 Michal Schmidt 2009-12-30 23:03:44 UTC
Scratch build of glibc with the patch in Koji:
http://kojipkgs.fedoraproject.org/scratch/michich/task_1896278/

Looks like I broke something else though. From the build.log:
tst-robustpi8: pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
Didn't expect signal from child: got `Aborted'

Comment 22 Owen Taylor 2010-01-05 18:48:18 UTC
*** Bug 552512 has been marked as a duplicate of this bug. ***

Comment 23 Owen Taylor 2010-01-05 18:48:24 UTC
*** Bug 552553 has been marked as a duplicate of this bug. ***

Comment 24 Owen Taylor 2010-01-05 18:48:32 UTC
*** Bug 552590 has been marked as a duplicate of this bug. ***

Comment 25 Owen Taylor 2010-01-05 18:48:37 UTC
*** Bug 552595 has been marked as a duplicate of this bug. ***

Comment 26 Michal Schmidt 2010-01-08 16:10:45 UTC
(In reply to comment #21)
> Scratch build of glibc with the patch in Koji:
> http://kojipkgs.fedoraproject.org/scratch/michich/task_1896278/
> 
> Looks like I broke something else though. From the build.log:
> tst-robustpi8: pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion
> `(-(e)) != 3 || !robust' failed.
> Didn't expect signal from child: got `Aborted'  

I could neither reproduce this testsuite failure locally nor did it occur in another try in Koji. Dinakar Guniguntala (the author of the requeue-PI patch for i386) found nothing wrong with my fix.

Did anyone test the Koji build? Did it fix the bug? Were there any problems?
It seems to have expired already, but here's the repeated one:
http://kojipkgs.fedoraproject.org/scratch/michich/task_1904693/

Comment 27 Ola Thoresen 2010-01-08 20:06:47 UTC
Just tested the latest build, and at least youtube videos work fine in firefox again.

glibc-devel-2.11.90-4.m3.i686
glibc-common-2.11.90-4.m3.i686
glibc-debuginfo-2.11.90-4.i686
glibc-headers-2.11.90-4.m3.i686
glibc-2.11.90-4.m3.i686

Comment 28 Lennart Poettering 2010-01-08 20:55:22 UTC
*** Bug 553756 has been marked as a duplicate of this bug. ***

Comment 29 Scorporat 2010-01-08 22:04:54 UTC
Recompliled glibc-2.11.90-4 with last patch - work fine for me.

Comment 30 Stepan Kasal 2010-01-08 22:15:23 UTC
(In reply to comment #26)
> Did anyone test the Koji build? Did it fix the bug? Were there any problems?
> It seems to have expired already, but here's the repeated one:
> http://kojipkgs.fedoraproject.org/scratch/michich/task_1904693/  

I witness that this build of glibc* fixes the problem for me.

Comment 31 Dodji Seketeli 2010-01-10 12:56:05 UTC
(In reply to comment #30)
> > Did anyone test the Koji build? Did it fix the bug? Were there any problems?
> > It seems to have expired already, but here's the repeated one:
> > http://kojipkgs.fedoraproject.org/scratch/michich/task_1904693/  
> 
> I witness that this build of glibc* fixes the problem for me.    

Yes, it fixed it for me too. I am currently running Rawhide with those RPM on i686 just fine.

Comment 32 Lennart Poettering 2010-01-11 22:39:46 UTC
*** Bug 552544 has been marked as a duplicate of this bug. ***

Comment 33 Owen Taylor 2010-01-14 21:12:12 UTC
*** Bug 555262 has been marked as a duplicate of this bug. ***

Comment 34 Michal Schmidt 2010-01-16 13:37:14 UTC
The patch is now applied upstream:
http://sourceware.org/git/?p=glibc.git;a=commit;h=893549c5a06956d2559391a3ffdeb6ded53b65c0


Note You need to log in before you can comment on or make changes to this bug.