Created attachment 368441 [details] Alsa Hang Reproducer Description of problem: Alsa hangs when seeking back and forth in an audio sample. Our sound tools allow users to scrub back and forth in a loaded sample. The tools play tiny bits of the sound as the user moves the current location around. This worked fine on FC5, but we've found that under F11 the tools hang in a poll() in snd1_pcm_wait_nocheck (backtrace listed below). I ran our test case through pasuspender (using "pasuspender -- ./hang") and the result is the same. Our machines are using on-board Intel HD Audio controllers: 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller Subsystem: Hewlett-Packard Company Device 130b Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at e3300000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: HDA Intel Kernel modules: snd-hda-intel Version-Release number of selected component (if applicable): How reproducible: Pretty reliably with the test case attached. Just let it run for several seconds and it should get stuck. Steps to Reproduce: 1. Run the test case. After several seconds it should get stuck. Note that the time delays it has coded in included three nearly-one-second-pauses, so some skipping is correct. When the program hangs it'll go quiet permently. 2. 3. Actual results: Test case stops playing audio. Expected results: Test case will continue looping its audio samples with embedded delays forever. Additional info: Here's the backtrace: #0 0x00007ffff70b8f73 in poll () from /lib64/libc.so.6 #1 0x00007ffff7b4e069 in snd1_pcm_wait_nocheck (pcm=0x61cf20, timeout=<value optimized out>) at pcm.c:2367 #2 0x00007ffff7b4e3dd in snd1_pcm_write_areas (pcm=0x61cf20, areas=0x7fffffffd1b0, offset=1074, size=760, func=0x7ffff7b8c8c0 <ioplug_priv_transfer_areas>) at pcm.c:6646 #3 0x00007ffff7b8cc4a in snd_pcm_ioplug_writei (pcm=0x61cf20, buffer=<value optimized out>, size=1834) at pcm_ioplug.c:561 #4 0x00000000004010ba in Write (buffer=0x601820, frameCount=1834) at hang.cpp:405 #5 0x0000000000400ce2 in main () at hang.cpp:287
This issue appears to be in F12 as well. Neil
Please, attach contents of /proc/asound/card0/pcm0p/* files when hangup occurs.
Here are those files: ./sub0/prealloc_max 32768 ./sub0/prealloc 4096 ./sub0/status state: RUNNING trigger_time: 20418.305887250 tstamp : 20440.925353343 delay : 1291 avail : 86901 avail_max : 86901 ----- hw_ptr : 997560 appl_ptr : 998851 ./sub0/sw_params tstamp_mode: ENABLE period_step: 1 avail_min: 87047 start_threshold: 4294967295 stop_threshold: 1444937728 silence_threshold: 0 silence_size: 0 boundary: 1444937728 ./sub0/hw_params access: MMAP_INTERLEAVED format: S16_LE subformat: STD channels: 2 rate: 44100 (44100/1) period_size: 44096 buffer_size: 88192 ./sub0/info card: 0 device: 0 subdevice: 0 stream: PLAYBACK id: AD198x Analog name: AD198x Analog subname: subdevice #0 class: 0 subclass: 0 subdevices_count: 1 subdevices_avail: 0 ./xrun_debug 0 ./info card: 0 device: 0 subdevice: 0 stream: PLAYBACK id: AD198x Analog name: AD198x Analog subname: subdevice #0 class: 0 subclass: 0 subdevices_count: 1 subdevices_avail: 0
(In reply to comment #3) > ./sub0/status > state: RUNNING > trigger_time: 20418.305887250 > tstamp : 20440.925353343 > delay : 1291 > avail : 86901 > avail_max : 86901 > ----- > hw_ptr : 997560 > appl_ptr : 998851 Could you check, if contents of this file changes in time when hangup occurs?
Jaroslav, Yes, the contents do change. Here is a before, during sound, during hang, and after contents of that file: # for i in $(seq 0 9); do cat /proc/asound/card0/pcm0p/sub0/status ; sleep 2; done closed closed state: RUNNING trigger_time: 39684.851169494 tstamp : 39686.222872794 delay : 4507 avail : 83685 avail_max : 83685 ----- hw_ptr : 60536 appl_ptr : 65043 state: RUNNING trigger_time: 39684.851169494 tstamp : 39688.227730141 delay : 4923 avail : 83269 avail_max : 83269 ----- hw_ptr : 148952 appl_ptr : 153875 state: RUNNING trigger_time: 39684.851169494 tstamp : 39690.232946416 delay : 4515 avail : 83677 avail_max : 83677 ----- hw_ptr : 237384 appl_ptr : 241899 ###################hung here################## state: RUNNING trigger_time: 39684.851169494 tstamp : 39692.238110098 delay : 4515 avail : 83677 avail_max : 83677 ----- hw_ptr : 325808 appl_ptr : 330323 state: RUNNING trigger_time: 39684.851169494 tstamp : 39694.244752310 delay : 5003 avail : 83189 avail_max : 83189 ----- hw_ptr : 414304 appl_ptr : 419307 ###################control+c after this################## state: RUNNING trigger_time: 39684.851169494 tstamp : 39696.249609966 delay : 4235 avail : 83957 avail_max : 83957 ----- hw_ptr : 502720 appl_ptr : 507835 state: RUNNING trigger_time: 39684.851169494 tstamp : 39698.254418594 delay : 15712 avail : 72480 avail_max : 72480 ----- hw_ptr : 591128 appl_ptr : 606840 state: RUNNING trigger_time: 39684.851169494 tstamp : 39700.260983083 delay : 11200 avail : 76992 avail_max : 76992 ----- hw_ptr : 679616 appl_ptr : 690816
OK, It appears like a user space issue. The audio DMA engine is running. Please, attach output from snd_pcm_dump() after snd_pcm_sw_params() to see what plugins are involved in alsa-lib.
Here is that info and more. main(): snd_pcm_dump ALSA <-> PulseAudio PCM I/O Plugin Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_BE subformat : STD channels : 1 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 22050 period_size : 64 period_time : 1451 tstamp_mode : NONE period_step : 1 avail_min : 64 period_event : 0 start_threshold : 735 stop_threshold : 22050 silence_threshold: 0 silence_size : 0 boundary : 1445068800 main(): snd_pcm_dump_setup stream : PLAYBACK access : RW_INTERLEAVED format : S16_BE subformat : STD channels : 1 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 22050 period_size : 64 period_time : 1451 tstamp_mode : NONE period_step : 1 avail_min : 64 period_event : 0 start_threshold : 735 stop_threshold : 22050 silence_threshold: 0 silence_size : 0 boundary : 1445068800 main(): snd_pcm_hw_params_dump ACCESS: RW_INTERLEAVED FORMAT: S16_BE SUBFORMAT: STD SAMPLE_BITS: 16 FRAME_BITS: 16 CHANNELS: 1 RATE: 44100 PERIOD_TIME: (1451 1452) PERIOD_SIZE: (64 65) PERIOD_BYTES: (128 130) PERIODS: [340 344] BUFFER_TIME: 500000 BUFFER_SIZE: 22050 BUFFER_BYTES: 44100 TICK_TIME: [0 0] main(): snd_pcm_sw_params_dump tstamp_mode: NONE period_step: 1 avail_min: 64 start_threshold: 735 stop_threshold: 22050 silence_threshold: 0 silence_size: 0 boundary: 1445068800 main(): snd_pcm_status_dump state : PREPARED trigger_time: 0.000000 tstamp : 0.000000 delay : 0 avail : 22050 avail_max : 0 ------ printf("main(): snd_pcm_dump\n"); snd_pcm_dump(_alsaHandle, _alsaOutput); printf("main(): snd_pcm_dump_setup\n"); snd_pcm_dump_setup(_alsaHandle, _alsaOutput); printf("main(): snd_pcm_hw_params_dump\n"); snd_pcm_hw_params_dump(hwparams, _alsaOutput); printf("main(): snd_pcm_sw_params_dump\n"); snd_pcm_sw_params_dump(swparams, _alsaOutput); printf("main(): snd_pcm_status_dump\n"); snd_pcm_status_t *status; snd_pcm_status_alloca(&status); if ((err = snd_pcm_status(_alsaHandle, status)) < 0) { printf("Stream status error: %s\n", snd_strerror(err)); exit(0); } snd_pcm_status_dump(status, _alsaOutput);
It looks like problem in PulseAudio plugin in alsa-plugins package. Reassigning to Lennart. For verification, you may try to use real raw ALSA device name like 'plughw:0' if things works correctly.
Lennart, It does indeed seem like a bug in pulseaudio, if I remove pulseaudio and use the hardware, then this hang does not happen. Below are my notes from a gdb session attached to the hung process. Thread 2 (Thread 0x7ffff00ef910 (LWP 5442)): #0 0x00007ffff70ba0d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=1499) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007ffff65a181b in poll_func (ufds=0x7fffe8001460, nfds=2, timeout=1499, userdata=0x619a00) at pulse/thread-mainloop.c:71 #2 0x00007ffff658f2d4 in pa_mainloop_poll (m=0x619ac0) at pulse/mainloop.c:857 #3 0x00007ffff658f683 in pa_mainloop_iterate (m=0x619ac0, block=1, retval=0x0) at pulse/mainloop.c:923 #4 0x00007ffff658f6f6 in pa_mainloop_run (m=0x619ac0, retval=0x0) at pulse/mainloop.c:941 #5 0x00007ffff65a1898 in thread (userdata=0x619a80) at pulse/thread-mainloop.c:90 #6 0x00007ffff6356c4f in internal_thread_func (userdata=0x61a960) at pulsecore/thread-posix.c:72 #7 0x00007ffff6bcb86a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #8 0x00007ffff70c33bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #9 0x0000000000000000 in ?? () Thread 1 (Thread 0x7ffff7fd3710 (LWP 5273)): #0 0x00007ffff70ba0d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007ffff7b4e069 in snd1_pcm_wait_nocheck (pcm=0x61cfd0, timeout=<value optimized out>) at pcm.c:2367 #2 0x00007ffff7b4e3dd in snd1_pcm_write_areas (pcm=0x61cfd0, areas=0x7fffffff70a0, offset=1426, size=408, func=0x7ffff7b8c8c0 <ioplug_priv_transfer_areas>) at pcm.c:6646 #3 0x00007ffff7b8cc4a in snd_pcm_ioplug_writei (pcm=0x61cfd0, buffer=<value optimized out>, size=1834) at pcm_ioplug.c:561 #4 0x0000000000401118 in Write (buffer=0x6018a0, frameCount=1834) at hang.cpp:407 #5 0x0000000000400d2a in main () at hang.cpp:289 (gdb) thread 2 [Switching to thread 2 (Thread 0x7ffff00ef910 (LWP 5442))]#0 0x00007ffff70ba0d3 in *__GI___poll ( fds=<value optimized out>, nfds=<value optimized out>, timeout=1499) at ../sysdeps/unix/sysv/linux/poll.c:87 87 int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout); (gdb) p poll_func::ufds->fd $25 = 5 (gdb) p poll_func::ufds->events $26 = 1 (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fd3710 (LWP 5273))]#0 0x00007ffff70ba0d3 in *__GI___poll ( fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 87 int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout); (gdb) p snd1_pcm_wait_nocheck::pcm->poll_fd $27 = 3 (gdb) p snd1_pcm_wait_nocheck::pcm->poll_events $28 = 1 avail = snd_pcm_avail_update(pcm); if (avail < 0) { err = avail; goto _end; } if ((state == SND_PCM_STATE_RUNNING && (snd_pcm_uframes_t)avail < pcm->avail_min && size > (snd_pcm_uframes_t)avail)) { if (pcm->mode & SND_PCM_NONBLOCK) { err = -EAGAIN; goto _end; } err = snd_pcm_wait(pcm, -1); if (err < 0) break; goto _again; }
Lennart, The bug happens in F11, but not in F12, the version of pulseaudio in F11 is 0.9.15-17.fc11, while F12 alpha has 0.9.16-5.test4.fc12. In addition, this appears similar to bz #496608, although I can not reproduce the problem with out pulseaudio.
I have managed to reproduce this on F12 btw. This seems to be a problem with a wakeup request being eaten at one of the layers in our stack. I haven't been able yet to figure out what is going here.
I think I have a pretty good idea what is going on now. The client's and the server's byte request counter get out of sync if the client sends more data than the server requested. I have a pretty good idea how to fix this. Will prep patch tomorrow.
Here's my patch: http://git.0pointer.de/?p=pulseaudio.git;a=patch;h=8d356659e69556fa25d0579a66084f820683e2b8 I am a bit afraid of moving this into our packages right-away since this modifies some logic that might break other code. Could you test this patch? Can you build test rpms yourself, or do you need rpms? This patch is for the F12 version of PA, it should apply to the F11 version, too. I have run your test tool for a couple of hours here, and everything behaved properly with this patch applied.
Hello Lennart, Thanks for the patch! I just built the RPMs for F11 and am about to start testing. cheers, -lars
Hi Lennart, So far so good. Our QA engineer is enthusiastic about the patch's behavior, and I'm recruiting more production users to give it a try. I'll keep you posted. cheers, -lars
(In reply to comment #18) > Hi Lennart, > > So far so good. Our QA engineer is enthusiastic about the patch's behavior, and > I'm recruiting more production users to give it a try. I'll keep you posted. Great. I will prepare a new upload for F12 today that will include the patch. Would be good if you could then go to bodhi and vote it up ;-)
pulseaudio-0.9.21-3.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/pulseaudio-0.9.21-3.fc12
Ok, the patch does break some things (seeking in mplayer). I have canceled the update now. Will try to fix this today.
The seeking problem is now fixed in git. The update will follow shortly.
pulseaudio-0.9.21-4.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/pulseaudio-0.9.21-4.fc12
pulseaudio-0.9.21-4.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update pulseaudio'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-0745
pulseaudio-0.9.21-4.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.