Bug 559467 - Regression : PA 0.9.21-4 jumps to 100% CPU when playing sound
Summary: Regression : PA 0.9.21-4 jumps to 100% CPU when playing sound
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pulseaudio
Version: 12
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Lennart Poettering
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 558656 558981 564383 564525 565883 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-28 08:32 UTC by Didier
Modified: 2010-03-06 03:42 UTC (History)
9 users (show)

Fixed In Version: pulseaudio-0.9.21-5.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-23 23:20:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
pacmd ls output (21.92 KB, text/plain)
2010-02-18 06:20 UTC, udo
no flags Details
pulseaudio -vvvvv (102.95 KB, text/plain)
2010-02-18 06:26 UTC, udo
no flags Details
another -vvvv run, more v's used (103.03 KB, text/plain)
2010-02-18 07:39 UTC, udo
no flags Details
another pacmd capture (24.17 KB, text/plain)
2010-02-18 13:35 UTC, Jeff Guerdat
no flags Details
another fuser capture (156 bytes, text/plain)
2010-02-18 13:36 UTC, Jeff Guerdat
no flags Details
another pulseaudio capture (78.07 KB, text/plain)
2010-02-18 13:37 UTC, Jeff Guerdat
no flags Details
compressed strace output (52.67 KB, application/octet-stream)
2010-02-19 23:09 UTC, Jeff Guerdat
no flags Details

Description Didier 2010-01-28 08:32:13 UTC
Description of problem:
After updating to pulseaudio-0.9.21-4.fc12, pulseaudio jumps to 100% CPU usage when sound is played.


Going back to pulseaudio-0.9.21-2.fc12 fixes the issue.



Stracing the 100% CPU process yields a loop in 'ppoll' (which goes about as far as my limited debugging capacities permit) :



[pid 11783] <... ioctl resumed> , 0x7f1bf8f11c50) = 0
[pid 11776] <... ppoll resumed> )       = 1 ([{fd=27, revents=POLLHUP}])
[pid 11783] ppoll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=35, events=POLLOUT|POLLERR|POLLNVAL}], 3, {0, 119632000}, NULL, 8 <unfinished ...>
[pid 11776] ppoll([{fd=4, events=POLLIN}, {fd=27, events=0}, {fd=21, events=POLLIN}, {fd=20, events=POLLIN}, {fd=34, events=POLLIN}, {fd=9, events=POLLIN}, {fd=7, events=POLLIN}, {fd=22, events=POLLIN}, {fd=29, events=POLLIN}, {fd=33, events=POLLIN}, {fd=32, events=POLLIN}, {fd=31, events=POLLIN}, {fd=30, events=POLLIN|POLLERR|POLLHUP}, {fd=30, events=0}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=16, events=POLLIN}, {fd=19, events=POLLIN}, {fd=15, events=POLLIN|POLLERR|POLLHUP}, {fd=15, events=0}, {fd=14, events=POLLIN}, {fd=13, events=POLLIN}, {fd=6, events=POLLIN}], 23, NULL, NULL, 8) = 1 ([{fd=27, revents=POLLHUP}])
[pid 11776] ppoll([{fd=4, events=POLLIN}, {fd=27, events=0}, {fd=21, events=POLLIN}, {fd=20, events=POLLIN}, {fd=34, events=POLLIN}, {fd=9, events=POLLIN}, {fd=7, events=POLLIN}, {fd=22, events=POLLIN}, {fd=29, events=POLLIN}, {fd=33, events=POLLIN}, {fd=32, events=POLLIN}, {fd=31, events=POLLIN}, {fd=30, events=POLLIN|POLLERR|POLLHUP}, {fd=30, events=0}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=16, events=POLLIN}, {fd=19, events=POLLIN}, {fd=15, events=POLLIN|POLLERR|POLLHUP}, {fd=15, events=0}, {fd=14, events=POLLIN}, {fd=13, events=POLLIN}, {fd=6, events=POLLIN}], 23, NULL, NULL, 8) = 1 ([{fd=27, revents=POLLHUP}])

[... etc ...]

Comment 1 Didier 2010-01-28 09:41:03 UTC
Strange : the above was reproducible on a Dell i1720 (deleting ~/.pulse between trials), but appearantly on first sight does not occur on an ASUS EeePC 901 ...


i1720 :
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)

EeePC901 :
Intel 82801G (ICH7)

Comment 2 Lennart Poettering 2010-02-12 14:55:46 UTC
Hmm, do you think you could do a git bisect run for me? Youd need to know your way around with compiling PA and git.

Comment 3 Lennart Poettering 2010-02-12 15:05:49 UTC
(if someone is able and willing to do the git bisect, what I am interested in is the path between v0.9.21..stable-queue in git://git.0pointer.de/pulseaudio.git)

Comment 4 Lennart Poettering 2010-02-12 15:06:53 UTC
*** Bug 558981 has been marked as a duplicate of this bug. ***

Comment 5 Lennart Poettering 2010-02-12 15:08:59 UTC
*** Bug 558656 has been marked as a duplicate of this bug. ***

Comment 6 Didier 2010-02-12 15:38:19 UTC
Well, I'm away for some days. I'llget back to you when I'm back (recompiling PA is no problem, but I am a complete novice with git).

Comment 7 David Rees 2010-02-13 03:05:38 UTC
Add me to the list - strace looks exactly the same as the reporter and have just downgraded to pulseaudio-0.9.21-2.fc12.

Unfortunately not familiar with compiling PA and git to bisect unless you can give some pointers...

Comment 8 Lennart Poettering 2010-02-15 13:39:15 UTC
*** Bug 564383 has been marked as a duplicate of this bug. ***

Comment 9 Lennart Poettering 2010-02-15 13:40:46 UTC
*** Bug 564525 has been marked as a duplicate of this bug. ***

Comment 10 udo 2010-02-16 16:52:48 UTC
Over here I see the same happen.
Hardware used:
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)

Comment 11 Lennart Poettering 2010-02-18 03:04:26 UTC
*** Bug 565883 has been marked as a duplicate of this bug. ***

Comment 12 Lennart Poettering 2010-02-18 03:33:50 UTC
I have tried for multiple hours to reproduce this issue, but was unable to. I have sent myself gazillions of emails to trigger the new mail sounds but things never behaved badly. So I am really not sure what is going on here. Can anybody reduce this to a simple reproducible test case that I can reliably run here?

Anyway, as an educated guess here's what I believe what could be happening here:

1) PA suspends (i.e. closes) the audio devices because it is idle

2) some legacy app (Flash?) opens the audio device bypassing PA, and can do so because the device is not used because PA is idle.

3) Thunderbird wants to play audio due to an incoming mail, and plays audio to PA. Now, PA tries to unsuspend (i.e. reopen) the audio devices, but cannot because the legacy app still blocks the device. PA will hence leave the devices suspended. Because Thunderbird didn't set the PA_STREAM_FAIL_ON_SUSPEND flag when playing the stream it will be queued up. This happens a couple of times. So a more and more streams are queued up.

4) Eventually the limit of concurrent streams in PA is reached (defined to 32 right now). Now clients are refused. Some client doesn't accept that, and requests a new connection when the first try fails.

5) We enter a busy loop due to that, you'll see 100% CPU used.

Now, this is just a guess. To verify these issues, I'd like to ask you all to reproduce the issue locally and then run "pacmd ls" and attach the output here. Also, at the same time please run "fuser -v /dev/dsp* /dev/snd/*" and attach it here too. Finally, if you could run PA in debug mode (Pass -vvvvv while running it in a terminal, and set autospawn=no in ~/pulse/client.conf) and paste the output here that is generated when you enter this lockup I should be able to track this down.

The reason I came up with this guess is because I can make thunderbird freeze by issuing "pacmd suspend 1" (which is a manual hard suspend of PA's device access) and then wait for the new mail sound. So one lesson here is that Thunderbird needs to be fixed not to queue up event sounds, but instead use PA_STREAM_FAIL_ON_SUSPEND -- which libcanberra, the event sound library they should be using, of course does.

Comment 13 Lennart Poettering 2010-02-18 03:39:28 UTC
Hmm, the reason why thundebird isn't setting PA_STREAM_FAIL_ON_SUSPEND seems to be that they still use the legacy esd protocol instead of pulseaudio's native protocol.

Here's another thing I'd like you to ask to test: please edit /etc/pulse/default.pa and comment the "load-module module-esound-protocol-unix" line. Now restart PA and try to reproduce the issue. Does this fix things for you?

Comment 14 udo 2010-02-18 06:20:08 UTC
$ fuser -v /dev/dsp* /dev/snd/*
Cannot stat /dev/dsp*: No such file or directory
Cannot stat /dev/dsp*: No such file or directory
                     USER        PID ACCESS COMMAND
/dev/snd/controlC0:  udo       14822 F.... pulseaudio
/dev/snd/pcmC0D0p:   udo       14822 F...m pulseaudio

Comment 15 udo 2010-02-18 06:20:50 UTC
Created attachment 394859 [details]
pacmd ls output

pacmd ls output

Comment 16 udo 2010-02-18 06:26:10 UTC
Created attachment 394860 [details]
pulseaudio  -vvvvv

Comment 17 udo 2010-02-18 07:39:32 UTC
Created attachment 394865 [details]
another -vvvv run, more v's used

Comment 18 udo 2010-02-18 09:36:55 UTC
Did we provide enough of the right info/logging?
Do we need to provide more data?

Comment 19 Lennart Poettering 2010-02-18 11:58:10 UTC
*** Bug 565883 has been marked as a duplicate of this bug. ***

Comment 20 Jeff Guerdat 2010-02-18 13:35:49 UTC
Created attachment 394898 [details]
another pacmd capture

Comment 21 Jeff Guerdat 2010-02-18 13:36:43 UTC
Created attachment 394899 [details]
another fuser capture

Comment 22 Jeff Guerdat 2010-02-18 13:37:34 UTC
Created attachment 394900 [details]
another pulseaudio capture

Comment 23 Jeff Guerdat 2010-02-18 13:41:12 UTC
Added my various captures.  One thing that was noted in my original bug report is that this ONLY happens when I use a non-system sound for new mail.  The little beep causes no problem (because of the length it plays? because it's not a .wav?) but use of the wav file results in the 100% CPU every time.  Playing the wav any other way works fine - it's only in T'bird.

Comment 24 udo 2010-02-18 13:59:49 UTC
I think I can confirm comment 23; I am using the notify.wav from a software vendor in Redmond, WA, USA.

Comment 25 Jeff Guerdat 2010-02-18 14:06:22 UTC
Well, mine's a cow so we can rule out dark forces...  ;)

Comment 26 Lennart Poettering 2010-02-19 22:33:01 UTC
Could someone try to do what I suggested in comment 13 and see if that fixes the problem?

Comment 27 Lennart Poettering 2010-02-19 22:38:34 UTC
I think I would be able to track this down if someone could send me a full strace -f of PA when this happens. The small strace excerpt in the original post shows that that poll() returns POLLHUP on fd 27 which is not processed. Now the problem is, what is fd 27? the strace should allow us to figure that out by going backwards and looking for where it got created. 

A full strace will be substantial in size, so better don't attach it here, but upload it somewhere or mail it to me.

Comment 28 Jeff Guerdat 2010-02-19 22:41:25 UTC
Sorry, I did try the suggestion in comment 13 and found it made no difference at all.  I forgot to post a reply.

Comment 29 Jeff Guerdat 2010-02-19 23:09:33 UTC
Created attachment 395194 [details]
compressed strace output

full strace output

Comment 30 Jeff Guerdat 2010-02-19 23:10:23 UTC
Hopefully the attachment I just put up in tar.gz format is what you're after.

Comment 31 udo 2010-02-20 15:53:11 UTC
Do you have enough strace data etc etc to analyse the root cause?

I hope you understand that most people with a 'modern' distro might be affected by this bug, because these ship pulseaudio, especially if they use a less dull signal for their incoming mail.
They will have at least one core wasting all cycles to this problematic issue.
That means a LOT of wasted power!! (100% CPU versus typically ~20% or so?)

So please HURRY fixing this bug.
Please.

Comment 32 Patrick Bakker 2010-02-20 17:44:09 UTC
This bug seems to happen to me several times a day. I'm not sure of exactly how to reproduce it but it seems to happen to me fairly often when I have two firefox windows open on two different monitors. If I'm playing some flash video on one and then open a new window on my second monitor to browse other websites it appears to be happen more frequently. It may be triggered by navigating to a web page on the second instance of firefox that also uses flash.

Comment 33 udo 2010-02-20 18:22:11 UTC
(In reply to comment #32)
So you mean no thunderbird connection at all? That would mean a second way to trigger this bug.
Can you supply a strace as asked in comment 27?

Comment 34 Patrick Bakker 2010-02-20 20:08:18 UTC
comment 13 fix of commenting out the esound line makes no difference. However, I may be seeing a combination of another problem as well. Sometimes there is a complete system lock-up but sometimes it may just be that the keyboard is stuck in some spots. When the system slowed right, I closed all my open programs, I tried to open a terminal to type strace but the terminal just sits there and won't accept any keyboard input. The caps lock key doesn't toggle the light at all. Since that happens to be how I usually check if the system is locked up I may not have noticed other times that I can launch other programs. I restarted Firefox just now and the keyboard works here. CAPS LOCK works here (even though the light doesn't change). However, it still doesn't work in the terminal. Also, the terminal won't close. This is the same thing pulse audio does if I try to restart the system. It warns me that PulseAudio is not responding. If I quit anyway then in the black shutdown sequence it hangs at "Sending processes the TERM signal..."

Comment 35 Sam Varshavchik 2010-02-20 22:16:53 UTC
Re comment 33: this is not a Thunderbird issue. This is 100% reproducible for me in Firefox, see dupe bug 564525.

Comment 36 Lennart Poettering 2010-02-22 21:13:17 UTC
(In reply to comment #34)
> comment 13 fix of commenting out the esound line makes no difference. 

Did you restart PA and have you verified that the esound modules are not loaded with "pacmd list-modules"?

I now managed to reproduce this issue with a simple "esdplay". Seems the fix for bug 553607 broke this.

And as mentioned, an easy temporary workaround to fix this is to disable the esd support in PA.

Comment 37 Lennart Poettering 2010-02-23 00:08:39 UTC
Fixed in this commit:

http://git.0pointer.de/?p=pulseaudio.git;a=patch;h=30f28ebf3619a86b49009e8dbce154233f597dbb

Will upload a new version of PA shortly.

Comment 38 Fedora Update System 2010-02-23 23:12:58 UTC
pulseaudio-0.9.21-6.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/pulseaudio-0.9.21-6.fc13

Comment 39 Fedora Update System 2010-02-23 23:13:25 UTC
pulseaudio-0.9.21-5.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/pulseaudio-0.9.21-5.fc12

Comment 40 udo 2010-02-24 16:38:44 UTC
Even though we did not get a chance to confirm the fixing of the bug before you closed it (!!), I want to confirm it appears to be fixed.

Comment 41 Lennart Poettering 2010-02-24 17:56:57 UTC
Well, please understand that I like to keep my list of open bugs minimal. If I believe I fixed a bug I close it. If it turns out that my fix was incorrect or incomplete then there's nothing that would stop you from reopening it then.

Comment 42 Fedora Update System 2010-03-05 03:41:10 UTC
pulseaudio-0.9.21-6.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 43 Fedora Update System 2010-03-06 03:41:57 UTC
pulseaudio-0.9.21-5.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.