Bug 521961

Summary: /usr/bin/pulseaudio hangs in loop, 100% user mode
Product: [Fedora] Fedora Reporter: Valdis Kletnieks <valdis.kletnieks>
Component: pulseaudioAssignee: Lennart Poettering <lpoetter>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: lkundrak, lpoetter, mschmidt, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-21 12:54:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
"thread apply all bt full" output of hung pulseaudio none

Description Valdis Kletnieks 2009-09-08 21:12:29 UTC
Description of problem:
Pulseaudio hangs in an infinite loop, 100% user mode.  GDB says it's here:

(gdb) where
#0  0x00007f4671fdb5d3 in pa_mainloop_iterate (m=0x11a5920, 
    block=<value optimized out>, retval=0x7fffe9810bbc) at pulse/mainloop.c:953
#1  0x00007f4671fdb680 in pa_mainloop_run (m=0x11a5920, retval=0x7fffe9810bbc)
    at pulse/mainloop.c:968
#2  0x0000000000409aa2 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at daemon/main.c:924
(gdb) info locals
r = <value optimized out>
__func__ = "pa_mainloop_iterate"
__PRETTY_FUNCTION__ = "pa_mainloop_iterate"
(gdb) up 
#1  0x00007f4671fdb680 in pa_mainloop_run (m=0x11a5920, retval=0x7fffe9810bbc)
    at pulse/mainloop.c:968
968         while ((r = pa_mainloop_iterate(m, 1, retval)) >= 0);
(gdb) info locals
r = <value optimized out>
(gdb) up
#2  0x0000000000409aa2 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at daemon/main.c:924
924         if (pa_mainloop_run(mainloop, &retval) < 0)
(gdb) info locals
c = <value optimized out>
buf = <value optimized out>
conf = <value optimized out>
mainloop = <value optimized out>
s = <value optimized out>
r = <value optimized out>
retval = 0
d = 3
valid_pid_file = true
ltdl_init = <value optimized out>
passed_fd = <value optimized out>
e = <value optimized out>
daemon_pipe = {-1, -1}
autospawn_fd = -1
autospawn_locked = false
dbus = <value optimized out>
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
(gdb) quit

Version-Release number of selected component (if applicable):
pulseaudio-0.9.16-13.test7.fc12.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Lennart Poettering 2009-09-09 20:58:58 UTC
Hmpf. Could you make that a full bt over all threads? ('thread apply all bt full')

Also, what did you do to trigger this?

Can you also please provide the output of 'pulseaudio -vvvv' when this happens?

Comment 2 Valdis Kletnieks 2009-09-09 21:34:05 UTC
I've only had it bite a few times, and don't have a reliable reproducer. Will get that info next time I hit it.

Comment 3 Valdis Kletnieks 2009-09-14 18:12:26 UTC
Created attachment 360982 [details]
"thread apply all bt full" output of hung pulseaudio

For what it's worth, it appears based on the PID that pulseaudio had *already* been restarted at least once during this login session (for comparison, the window manager PID is 2619, so this pulseaudio started some 24,000 processes later. Checking the logs:

Sep 14 09:54:50 turing-police pulseaudio[13261]: pid.c: Stale PID file, overwriting.
Sep 14 15:44:34 turing-police rtkit-daemon[2336]: Sucessfully made thread 2334 of process 2334 ((unreachable)/usr/bin/pulseaudio) owned by 'valdis' high priority at nice level -11.
Sep 14 16:08:39 turing-police rtkit-daemon[2336]: Sucessfully made thread 26811 of process 26811 ((unreachable)/usr/bin/pulseaudio) owned by 'valdis' high priority at nice level -11.
Sep 14 12:08:39 turing-police pulseaudio[26811]: pid.c: Stale PID file, overwriting.

The first one is from when I logged in - no indication what the restart at 12:08 was. It wedged and I did the gdb at 12:55.

Comment 4 Michal Schmidt 2009-09-16 20:34:18 UTC
Today while I was away from my laptop for a minute pulseaudio got into a similar state, eating 100% CPU time making no syscalls.

With gdb I found that it looped in pa_mainloop_iterate(). In pa_mainloop_poll() m->n_enabled_defer_events always equaled 2, so no poll call was ever made. dispatch_defer() never dispatched anything, because the head of the list (e = m->defer_events) had e->enabled==false and e->next was NULL.

This is with pulseaudio-0.9.17-1.fc12.x86_64.
I don't know how to reproduce this. The only thing that could have produced any sound in that time was the Jabber client "jabbim" which spawns aplay when a message is received.

Comment 5 Michal Schmidt 2009-09-16 20:37:41 UTC
I forgot to add that m->defer_events->callback points to defer_callback (pulsecore/pstream.c:216)

Comment 6 Michal Schmidt 2009-09-21 10:12:20 UTC
Infinite loop in userspace was one of the symptoms I was able to reproduce using the testcase with many concurrent aplay processes in bug 523855, which is now fixed in 0.9.18. So perhaps we should close this bug unless it can be reproduced.

Comment 7 Valdis Kletnieks 2009-09-21 12:50:28 UTC
It's only been a day since I installed 0.9.18, but I've not seen a recurrence.  Let's go ahead and close this one as "fixed in 0.9.18", and I'll re-open it if I see it happen again...