Bug 1262957

Summary: rtl8821ae suspected of causing 100ms scheduling latency
Product: [Fedora] Fedora Reporter: James <james>
Component: kernelAssignee: Lennart Poettering <lpoetter>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: brom, gansalmon, itamar, jonathan, kernel-maint, lkundrak, lpoetter, madhu.chinakonda, mchehab, rdieter, wtaymans
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-25 20:20:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
alsa-info
none
pulseaudio -vvvv log none

Description James 2015-09-14 18:18:23 UTC
Created attachment 1073390 [details]
dmesg

Description of problem:
In general audio through the Realtek ALC282 works fine, but every so often (every few minutes) I hear a glitch. Then things are good again. The glitch is associated with messages from PA like:


I: [alsa-sink-ALC282 Analog] alsa-sink.c: Scheduling delay of 97.41 ms > 20.00 ms, you might want to investigate this to improve latency...
I: [alsa-sink-ALC282 Analog] alsa-sink.c: Underrun!
I: [alsa-sink-ALC282 Analog] alsa-sink.c: Increasing wakeup watermark to 30.00 ms
D: [alsa-sink-ALC282 Analog] protocol-native.c: Implicit underrun of 'AudioStream'
D: [alsa-sink-ALC282 Analog] sink.c: Found underrun 4760 bytes ago (3968 bytes ahead in playback buffer)
D: [alsa-sink-ALC282 Analog] sink.c: Found underrun 4760 bytes ago (3968 bytes ahead in playback buffer)
D: [alsa-sink-ALC282 Analog] protocol-native.c: Requesting rewind due to end of underrun.
D: [alsa-sink-ALC282 Analog] alsa-sink.c: Requested to rewind 4760 bytes.
D: [alsa-sink-ALC282 Analog] alsa-sink.c: Limited to 4760 bytes.
D: [alsa-sink-ALC282 Analog] alsa-sink.c: before: 1190
D: [alsa-sink-ALC282 Analog] alsa-sink.c: after: 1190
D: [alsa-sink-ALC282 Analog] alsa-sink.c: Rewound 4760 bytes.
D: [alsa-sink-ALC282 Analog] sink.c: Processing rewind...
D: [alsa-sink-ALC282 Analog] sink.c: latency = 21929
D: [alsa-sink-ALC282 Analog] sink-input.c: Have to rewind 4760 bytes on render memblockq.
D: [alsa-sink-ALC282 Analog] source.c: Processing rewind...


Note: I realise Pulse may not be ultimately to blame here. If it's ALSA or the kernel at fault, please let me know and what info I should collect for reporting that. I've attached the alsa-info and dmesg in case it's some other interrupt issue or something. There's nothing untoward that I can see in dmesg.

Version-Release number of selected component (if applicable):
kernel-4.1.6-201.fc22.x86_64
pulseaudio-6.0-8.fc22.x86_64
alsa-lib-1.0.29-1.fc22.x86_64

How reproducible:
Every few minutes or so.

Comment 1 James 2015-09-14 18:19:53 UTC
Created attachment 1073391 [details]
alsa-info

Comment 2 James 2015-09-25 19:29:34 UTC
Created attachment 1077224 [details]
pulseaudio -vvvv log

I've attached a pulseaudio -vvvv log. The scheduling delay message is towards the end. I'm currently testing with some usb-audio speakers to see if it happens with them, too.

Comment 3 James 2015-09-25 19:47:37 UTC
Hmm... I see (and hear) underruns on this machine with the usb-audio speakers, too:

( 139.902|   9.255) I: [pulseaudio] module-device-restore.c: Synced.
( 190.635|  50.732) I: [alsa-sink-USB Audio] ratelimit.c: 117 events suppressed
( 190.635|   0.000) I: [alsa-sink-USB Audio] alsa-sink.c: Underrun!
( 253.632|  62.997) I: [alsa-sink-USB Audio] alsa-sink.c: Underrun!
( 316.653|  63.020) I: [alsa-sink-USB Audio] alsa-sink.c: Underrun!

This kit works fine on another machine. Now running kernel-4.1.7-200.fc22.x86_64. i5-5200U processor. Machine's not under especialy heavy load. Anything I can try to find out where the CPU time is going or whether interrupts are going AWOL etc.?

Comment 4 James 2015-09-27 21:28:02 UTC
I had a play with perf sched. What's interesting there is that sometimes other threads show up with around a 100ms delay. I'm not very familiar with perf, so I might be barking up the wrong tree, but here's what I did.

1. Ran perf sched record for a while.

2. Checked perf sched latency to spot when the latency occurred.

3. Looked in perf sched script's output to see what was going on at the time.

Here's a 100ms scheduling delay in watchdog/2 at 170597.339822:


  wpa_supplicant  1168 [002] 170597.339814: sched:sched_wakeup: kworker/2:2:22637 [120] success=1 CPU:002
  wpa_supplicant  1168 [002] 170597.339820: sched:sched_stat_runtime: comm=wpa_supplicant pid=1168 runtime=692443 [ns] vruntime=4718
2887560 [ns]
  wpa_supplicant  1168 [002] 170597.339822: sched:sched_switch: wpa_supplicant:1168 [120] R ==> watchdog/2:20 [0]


Another in kworker/1 at 171226.630376:


         swapper     0 [003] 171226.629871: sched:sched_wakeup: skype:3592 [120] success=1 CPU:003
         swapper     0 [003] 171226.629874: sched:sched_switch: swapper/3:0 [120] R ==> skype:3592 [120]
  wpa_supplicant  1168 [001] 171226.630372: sched:sched_switch: wpa_supplicant:1168 [120] R ==> migration/1:14 [0]
     migration/1    14 [001] 171226.630376: sched:sched_switch: migration/1:14 [0] S ==> kworker/1:0:24376 [120]


And one more in  at 171604.146285 in watchdog/1:


  wpa_supplicant  1168 [001] 171604.146276: sched:sched_wakeup: kworker/1:0:24376 [120] success=1 CPU:001
  wpa_supplicant  1168 [001] 171604.146283: sched:sched_stat_runtime: comm=wpa_supplicant pid=1168 runtime=50022 [ns] vruntime=14207
4593534 [ns]
  wpa_supplicant  1168 [001] 171604.146284: sched:sched_switch: wpa_supplicant:1168 [120] R ==> watchdog/1:13 [0]
      watchdog/1    13 [001] 171604.146287: sched:sched_switch: watchdog/1:13 [0] S ==> kworker/1:0:24376 [120]


They all seem to follow wpa_supplicant. And I've not heard any glitches when the wireless network is turned off (using the GNOME network menu). The wireless driver is rtl8821ae, so I'm guessing that this is causing latency somewhere whenever wpa_supplicant does something.

Can any expert confirm (or at least concur) before I move this over to a kernel bug report against rtl8821ae?

Comment 5 James 2015-09-28 17:46:25 UTC
OK, I've not had any more glitches after switching off the wireless NIC. Changing to kernel.

Recap following change of summary and component: I found intermittent glitches in audio, which turned out to be ~100ms scheduling delays. Running perf showed other threads with similar delays, all preceded by wpa_supplicant. This seems to suggest that something's getting bogged down in rtl8821ae.

The wireless adapter is a:

01:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter

The latencies went away after turning off this device. I've not seen these latencies on any of my other F22 machines, and none of these use rtl8821ae.

This is on kernel-4.1.7-200.fc22.x86_64. Let me know what further debug info is needed and how to get it.

Comment 6 James 2015-10-01 07:39:29 UTC
Still present with kernel-4.2.2-300.fc23.x86_64.

Comment 7 Justin M. Forbes 2015-10-20 19:32:31 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 8 James 2015-10-20 20:19:51 UTC
Still present in kernel-4.2.3-200.fc22.x86_64

Comment 9 James 2015-11-15 19:04:30 UTC
Still present in F23, kernel-4.2.5-300.fc23.x86_64.

(Has this bug been routed correctly after the component change, or shall I move it to kernel.org?)

Comment 10 James 2015-11-25 20:20:48 UTC
Moved to upstream bugzilla.