Bug 534130 - alsa tools hang in a poll() in snd1_pcm_wait_nocheck
alsa tools hang in a poll() in snd1_pcm_wait_nocheck
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: pulseaudio (Show other bugs)
11
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Lennart Poettering
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-10 12:00 EST by Neil Doane
Modified: 2010-10-23 09:11 EDT (History)
7 users (show)

See Also:
Fixed In Version: 0.9.21-4.fc12
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-01-25 20:04:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Alsa Hang Reproducer (6.51 KB, application/x-bzip2)
2009-11-10 12:00 EST, Neil Doane
no flags Details

  None (edit)
Description Neil Doane 2009-11-10 12:00:44 EST
Created attachment 368441 [details]
Alsa Hang Reproducer

Description of problem:
Alsa hangs when seeking back and forth in an audio sample.

Our sound tools allow users to scrub back and forth in a loaded sample. The
tools play tiny bits of the sound as the user moves the current location
around. This worked fine on FC5, but we've found that under F11 the tools hang
in a poll() in snd1_pcm_wait_nocheck (backtrace listed below).

I ran our test case through pasuspender (using "pasuspender -- ./hang") and
the result is the same.

Our machines are using on-board Intel HD Audio controllers:

   00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller
       Subsystem: Hewlett-Packard Company Device 130b
       Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
       Latency: 0, Cache Line Size: 64 bytes
       Interrupt: pin A routed to IRQ 11
       Region 0: Memory at e3300000 (64-bit, non-prefetchable) [size=16K]
       Capabilities: <access denied>
       Kernel driver in use: HDA Intel
       Kernel modules: snd-hda-intel

Version-Release number of selected component (if applicable):


How reproducible:
Pretty reliably with the test case attached. Just let it run for several
seconds and it should get stuck.

Steps to Reproduce:
1. Run the test case. After several seconds it should get stuck. Note that the
time delays it has coded in included three nearly-one-second-pauses, so some
skipping is correct. When the program hangs it'll go quiet permently.
2.
3.
  
Actual results:
Test case stops playing audio.

Expected results:
Test case will continue looping its audio samples with embedded delays forever.

Additional info:
Here's the backtrace:

#0  0x00007ffff70b8f73 in poll () from /lib64/libc.so.6
#1  0x00007ffff7b4e069 in snd1_pcm_wait_nocheck (pcm=0x61cf20, timeout=<value optimized out>) at pcm.c:2367
#2  0x00007ffff7b4e3dd in snd1_pcm_write_areas (pcm=0x61cf20, areas=0x7fffffffd1b0, offset=1074, size=760, func=0x7ffff7b8c8c0 <ioplug_priv_transfer_areas>)
   at pcm.c:6646
#3  0x00007ffff7b8cc4a in snd_pcm_ioplug_writei (pcm=0x61cf20, buffer=<value optimized out>, size=1834) at pcm_ioplug.c:561
#4  0x00000000004010ba in Write (buffer=0x601820, frameCount=1834) at hang.cpp:405
#5  0x0000000000400ce2 in main () at hang.cpp:287
Comment 1 Neil Doane 2009-11-10 16:44:39 EST
This issue appears to be in F12 as well.


Neil
Comment 2 Jaroslav Kysela 2009-11-11 03:59:53 EST
Please, attach contents of /proc/asound/card0/pcm0p/* files when hangup occurs.
Comment 3 Vlady Zlatkin 2009-11-23 14:22:56 EST
Here are those files:

./sub0/prealloc_max
32768
./sub0/prealloc
4096
./sub0/status
state: RUNNING
trigger_time: 20418.305887250
tstamp      : 20440.925353343
delay       : 1291
avail       : 86901
avail_max   : 86901
-----
hw_ptr      : 997560
appl_ptr    : 998851
./sub0/sw_params
tstamp_mode: ENABLE
period_step: 1
avail_min: 87047
start_threshold: 4294967295
stop_threshold: 1444937728
silence_threshold: 0
silence_size: 0
boundary: 1444937728
./sub0/hw_params
access: MMAP_INTERLEAVED
format: S16_LE
subformat: STD
channels: 2
rate: 44100 (44100/1)
period_size: 44096
buffer_size: 88192
./sub0/info
card: 0
device: 0
subdevice: 0
stream: PLAYBACK
id: AD198x Analog
name: AD198x Analog
subname: subdevice #0
class: 0
subclass: 0
subdevices_count: 1
subdevices_avail: 0
./xrun_debug
0
./info
card: 0
device: 0
subdevice: 0
stream: PLAYBACK
id: AD198x Analog
name: AD198x Analog
subname: subdevice #0
class: 0
subclass: 0
subdevices_count: 1
subdevices_avail: 0
Comment 5 Jaroslav Kysela 2009-11-24 08:27:52 EST
(In reply to comment #3)
> ./sub0/status
> state: RUNNING
> trigger_time: 20418.305887250
> tstamp      : 20440.925353343
> delay       : 1291
> avail       : 86901
> avail_max   : 86901
> -----
> hw_ptr      : 997560
> appl_ptr    : 998851


Could you check, if contents of this file changes in time when hangup occurs?
Comment 6 Vlady Zlatkin 2009-11-24 13:16:11 EST
Jaroslav,

Yes, the contents do change.  Here is a before, during sound, during hang, and after contents of that file:

# for i in $(seq 0 9); do  cat /proc/asound/card0/pcm0p/sub0/status ; sleep 2; done
closed
closed
state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39686.222872794
delay       : 4507
avail       : 83685
avail_max   : 83685
-----
hw_ptr      : 60536
appl_ptr    : 65043
state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39688.227730141
delay       : 4923
avail       : 83269
avail_max   : 83269
-----
hw_ptr      : 148952
appl_ptr    : 153875
state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39690.232946416
delay       : 4515
avail       : 83677
avail_max   : 83677
-----
hw_ptr      : 237384
appl_ptr    : 241899

###################hung here##################


state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39692.238110098
delay       : 4515
avail       : 83677
avail_max   : 83677
-----
hw_ptr      : 325808
appl_ptr    : 330323
state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39694.244752310
delay       : 5003
avail       : 83189
avail_max   : 83189
-----
hw_ptr      : 414304
appl_ptr    : 419307

###################control+c after this##################

state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39696.249609966
delay       : 4235
avail       : 83957
avail_max   : 83957
-----
hw_ptr      : 502720
appl_ptr    : 507835
state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39698.254418594
delay       : 15712
avail       : 72480
avail_max   : 72480
-----
hw_ptr      : 591128
appl_ptr    : 606840
state: RUNNING
trigger_time: 39684.851169494
tstamp      : 39700.260983083
delay       : 11200
avail       : 76992
avail_max   : 76992
-----
hw_ptr      : 679616
appl_ptr    : 690816
Comment 7 Jaroslav Kysela 2009-11-24 13:32:39 EST
OK, It appears like a user space issue. The audio DMA engine is running.
Please, attach output from snd_pcm_dump() after snd_pcm_sw_params() to see what plugins are involved in alsa-lib.
Comment 8 Vlady Zlatkin 2009-11-24 14:10:52 EST
Here is that info and more.  

main(): snd_pcm_dump
ALSA <-> PulseAudio PCM I/O Plugin
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_BE
  subformat    : STD
  channels     : 1
  rate         : 44100
  exact rate   : 44100 (44100/1)
  msbits       : 16
  buffer_size  : 22050
  period_size  : 64
  period_time  : 1451
  tstamp_mode  : NONE
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 735
  stop_threshold   : 22050
  silence_threshold: 0
  silence_size : 0
  boundary     : 1445068800
main(): snd_pcm_dump_setup
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_BE
  subformat    : STD
  channels     : 1
  rate         : 44100
  exact rate   : 44100 (44100/1)
  msbits       : 16
  buffer_size  : 22050
  period_size  : 64
  period_time  : 1451
  tstamp_mode  : NONE
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 735
  stop_threshold   : 22050
  silence_threshold: 0
  silence_size : 0
  boundary     : 1445068800
main(): snd_pcm_hw_params_dump
ACCESS:  RW_INTERLEAVED
FORMAT:  S16_BE
SUBFORMAT:  STD
SAMPLE_BITS: 16
FRAME_BITS: 16
CHANNELS: 1
RATE: 44100
PERIOD_TIME: (1451 1452)
PERIOD_SIZE: (64 65)
PERIOD_BYTES: (128 130)
PERIODS: [340 344]
BUFFER_TIME: 500000
BUFFER_SIZE: 22050
BUFFER_BYTES: 44100
TICK_TIME: [0 0]
main(): snd_pcm_sw_params_dump
tstamp_mode: NONE
period_step: 1
avail_min: 64
start_threshold: 735
stop_threshold: 22050
silence_threshold: 0
silence_size: 0
boundary: 1445068800
main():  snd_pcm_status_dump
  state       : PREPARED
  trigger_time: 0.000000
  tstamp      : 0.000000
  delay       : 0
  avail       : 22050
  avail_max   : 0

------
    printf("main(): snd_pcm_dump\n");
    snd_pcm_dump(_alsaHandle, _alsaOutput);

    printf("main(): snd_pcm_dump_setup\n");
    snd_pcm_dump_setup(_alsaHandle, _alsaOutput);

    printf("main(): snd_pcm_hw_params_dump\n");
    snd_pcm_hw_params_dump(hwparams, _alsaOutput);

    printf("main(): snd_pcm_sw_params_dump\n");
    snd_pcm_sw_params_dump(swparams, _alsaOutput);

    printf("main():  snd_pcm_status_dump\n");
    snd_pcm_status_t *status;

    snd_pcm_status_alloca(&status);
    if ((err = snd_pcm_status(_alsaHandle, status)) < 0) {
            printf("Stream status error: %s\n", snd_strerror(err));
            exit(0);
    }
    snd_pcm_status_dump(status, _alsaOutput);
Comment 9 Jaroslav Kysela 2009-11-24 16:51:14 EST
It looks like problem in PulseAudio plugin in alsa-plugins package. Reassigning to Lennart.

For verification, you may try to use real raw ALSA device name like 'plughw:0' if things works correctly.
Comment 10 Vlady Zlatkin 2009-11-25 21:19:58 EST
Lennart,

It does indeed seem like a bug in pulseaudio, if I remove pulseaudio and use the hardware, then this hang does not happen.

Below are my notes from a gdb session attached to the hung process.

Thread 2 (Thread 0x7ffff00ef910 (LWP 5442)):
#0  0x00007ffff70ba0d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=1499)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007ffff65a181b in poll_func (ufds=0x7fffe8001460, nfds=2, timeout=1499, userdata=0x619a00)
    at pulse/thread-mainloop.c:71
#2  0x00007ffff658f2d4 in pa_mainloop_poll (m=0x619ac0) at pulse/mainloop.c:857
#3  0x00007ffff658f683 in pa_mainloop_iterate (m=0x619ac0, block=1, retval=0x0) at pulse/mainloop.c:923
#4  0x00007ffff658f6f6 in pa_mainloop_run (m=0x619ac0, retval=0x0) at pulse/mainloop.c:941
#5  0x00007ffff65a1898 in thread (userdata=0x619a80) at pulse/thread-mainloop.c:90
#6  0x00007ffff6356c4f in internal_thread_func (userdata=0x61a960) at pulsecore/thread-posix.c:72
#7  0x00007ffff6bcb86a in start_thread (arg=<value optimized out>) at pthread_create.c:297
#8  0x00007ffff70c33bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7fd3710 (LWP 5273)):
#0  0x00007ffff70ba0d3 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007ffff7b4e069 in snd1_pcm_wait_nocheck (pcm=0x61cfd0, timeout=<value optimized out>) at pcm.c:2367
#2  0x00007ffff7b4e3dd in snd1_pcm_write_areas (pcm=0x61cfd0, areas=0x7fffffff70a0, offset=1426, size=408, 
    func=0x7ffff7b8c8c0 <ioplug_priv_transfer_areas>) at pcm.c:6646
#3  0x00007ffff7b8cc4a in snd_pcm_ioplug_writei (pcm=0x61cfd0, buffer=<value optimized out>, size=1834)
    at pcm_ioplug.c:561
#4  0x0000000000401118 in Write (buffer=0x6018a0, frameCount=1834) at hang.cpp:407
#5  0x0000000000400d2a in main () at hang.cpp:289
(gdb) thread 2 
[Switching to thread 2 (Thread 0x7ffff00ef910 (LWP 5442))]#0  0x00007ffff70ba0d3 in *__GI___poll (
    fds=<value optimized out>, nfds=<value optimized out>, timeout=1499) at ../sysdeps/unix/sysv/linux/poll.c:87
87        int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout);
(gdb) p poll_func::ufds->fd
$25 = 5
(gdb) p poll_func::ufds->events
$26 = 1
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fd3710 (LWP 5273))]#0  0x00007ffff70ba0d3 in *__GI___poll (
    fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
87        int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout);
(gdb) p snd1_pcm_wait_nocheck::pcm->poll_fd
$27 = 3
(gdb) p snd1_pcm_wait_nocheck::pcm->poll_events
$28 = 1

                avail = snd_pcm_avail_update(pcm);
                if (avail < 0) {
                        err = avail;
                        goto _end;
                }
                if ((state == SND_PCM_STATE_RUNNING &&
                     (snd_pcm_uframes_t)avail < pcm->avail_min &&
                     size > (snd_pcm_uframes_t)avail)) {
                        if (pcm->mode & SND_PCM_NONBLOCK) {
                                err = -EAGAIN;
                                goto _end;
                        }

                        err = snd_pcm_wait(pcm, -1);
                        if (err < 0)
                                break;
                        goto _again;
                }
Comment 11 Vlady Zlatkin 2009-11-26 06:39:43 EST
Lennart,

The bug happens in F11, but not in F12, the version of pulseaudio in F11 is 0.9.15-17.fc11, while F12 alpha has 0.9.16-5.test4.fc12. 

In addition, this appears similar to bz #496608, although I can not reproduce the problem with out pulseaudio.
Comment 13 Lennart Poettering 2010-01-06 13:39:01 EST
I have managed to reproduce this on F12 btw. This seems to be a problem with a wakeup request being eaten at one of the layers in our stack. I haven't been able yet to figure out what is going here.
Comment 15 Lennart Poettering 2010-01-07 19:19:45 EST
I think I have a pretty good idea what is going on now. The client's and the server's byte request counter get out of sync if the client sends more data than the server requested. I have a pretty good idea how to fix this. Will prep patch tomorrow.
Comment 16 Lennart Poettering 2010-01-08 14:22:27 EST
Here's my patch:

http://git.0pointer.de/?p=pulseaudio.git;a=patch;h=8d356659e69556fa25d0579a66084f820683e2b8

I am a bit afraid of moving this into our packages right-away since this modifies some logic that might break other code.

Could you test this patch? Can you build test rpms yourself, or do you need rpms?

This patch is for the F12 version of PA, it should apply to the F11 version, too.

I have run your test tool for a couple of hours here, and everything behaved properly with this patch applied.
Comment 17 Lars Damerow 2010-01-08 16:19:07 EST
Hello Lennart,

Thanks for the patch! I just built the RPMs for F11 and am about to start testing.

cheers,
-lars
Comment 18 Lars Damerow 2010-01-14 13:36:27 EST
Hi Lennart,

So far so good. Our QA engineer is enthusiastic about the patch's behavior, and I'm recruiting more production users to give it a try. I'll keep you posted.

cheers,
-lars
Comment 21 Lennart Poettering 2010-01-14 13:54:54 EST
(In reply to comment #18)
> Hi Lennart,
> 
> So far so good. Our QA engineer is enthusiastic about the patch's behavior, and
> I'm recruiting more production users to give it a try. I'll keep you posted.

Great. 

I will prepare a new upload for F12 today that will include the patch. Would be good if you could then go to bodhi and vote it up ;-)
Comment 22 Fedora Update System 2010-01-14 20:46:34 EST
pulseaudio-0.9.21-3.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/pulseaudio-0.9.21-3.fc12
Comment 23 Lennart Poettering 2010-01-15 09:19:40 EST
Ok, the patch does break some things (seeking in mplayer). I have canceled the update now. Will try to fix this today.
Comment 24 Lennart Poettering 2010-01-17 20:07:56 EST
The seeking problem is now fixed in git. The update will follow shortly.
Comment 25 Fedora Update System 2010-01-17 20:24:06 EST
pulseaudio-0.9.21-4.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/pulseaudio-0.9.21-4.fc12
Comment 26 Fedora Update System 2010-01-18 20:01:52 EST
pulseaudio-0.9.21-4.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update pulseaudio'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-0745
Comment 27 Fedora Update System 2010-01-22 17:38:46 EST
pulseaudio-0.9.21-4.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update pulseaudio'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-0745
Comment 28 Fedora Update System 2010-01-25 20:04:05 EST
pulseaudio-0.9.21-4.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.