Bug 460100 - RHEL 4u7: Keyboard not functional after plug out usb 1.1 speaker
Summary: RHEL 4u7: Keyboard not functional after plug out usb 1.1 speaker
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Pete Zaitcev
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: AMD4.8
TreeView+ depends on / blocked
 
Reported: 2008-08-26 07:31 UTC by Andiry
Modified: 2018-10-19 23:55 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-10 18:06:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmesg info taken after the issue occurs. (33.01 KB, text/plain)
2008-09-28 01:48 UTC, Andiry
no flags Details
test 1 - printk in keyboard.c (1.17 KB, patch)
2008-10-14 02:29 UTC, Pete Zaitcev
no flags Details | Diff
dmesg info with kbd patch. (65.14 KB, text/plain)
2008-10-14 07:17 UTC, Andiry
no flags Details
x.trace (336.26 KB, text/plain)
2008-10-15 02:31 UTC, Andiry
no flags Details
x.lsof (4.17 KB, text/plain)
2008-10-15 02:32 UTC, Andiry
no flags Details
ps -auxw (7.76 KB, text/plain)
2008-10-15 02:33 UTC, Andiry
no flags Details
test 2 - printk in tty work (2.42 KB, patch)
2008-10-16 03:35 UTC, Pete Zaitcev
no flags Details | Diff
dmesg with tty patch (47.66 KB, text/plain)
2008-10-16 04:43 UTC, Andiry
no flags Details
dmesg with tty revised (46.57 KB, text/plain)
2008-10-17 01:39 UTC, Andiry
no flags Details
test 3 - zoom in on kbd_keycode (4.17 KB, text/plain)
2008-10-17 02:11 UTC, Pete Zaitcev
no flags Details
dmesg with new patch (52.21 KB, text/plain)
2008-10-17 07:43 UTC, Andiry
no flags Details
test 4 - zoom in on shift state (4.33 KB, patch)
2008-10-18 02:11 UTC, Pete Zaitcev
no flags Details | Diff
dmesg-test4 (48.97 KB, text/plain)
2008-10-20 02:59 UTC, Andiry
no flags Details
test 5 - zoom at x86_keycodes and the flip (4.65 KB, application/octet-stream)
2008-10-21 00:28 UTC, Pete Zaitcev
no flags Details
dmesg-test5 (39.31 KB, text/plain)
2008-10-21 01:23 UTC, Andiry
no flags Details
dmesg w/ sysrq-t before pulling speaker cable (88.10 KB, application/octet-stream)
2008-10-21 15:16 UTC, Evan McNabb
no flags Details
dmesg w/ sysrq-t after pulling speaker cable (120.45 KB, application/octet-stream)
2008-10-21 15:17 UTC, Evan McNabb
no flags Details
dmesg w/ sysrq-t (90.25 KB, application/octet-stream)
2008-10-21 17:11 UTC, Evan McNabb
no flags Details
dmesg-SysRq-T (101.46 KB, text/plain)
2008-10-22 01:18 UTC, Andiry
no flags Details
A test helper (397 bytes, text/plain)
2009-01-05 18:16 UTC, Pete Zaitcev
no flags Details

Description Andiry 2008-08-26 07:31:42 UTC
Description of problem:
RHEL 4u7: Keyboard not functional after plug out usb 1.1 speaker

Version-Release number of selected component (if applicable):
RHEL 4u7

How reproducible:
Always

Steps to Reproduce:
1. install RHEL 4u7 32/64bit
2. insert a usb 1.1 speaker
3. reboot system with speaker plugged in
4. cat /proc/asound/cards to make sure its been detected
5. plug out usb speaker after enter in os
6. The keyboard will not functional
  
Actual results:
The keyboard lost responsbility.

Expected results:
The keyboard should function normally.

Additional info:
This issue occurs on AMD sb600, sb700 and intel platforms.

Comment 1 Pete Zaitcev 2008-09-26 12:29:34 UTC
May I see dmesg, taken after the issue occurs (after step 6)? Please
attach it to the bug, don't drop into comment box.

Does this problem exist on RHEL 5?

Comment 2 Andiry 2008-09-28 01:48:26 UTC
Created attachment 317877 [details]
dmesg info taken after the issue occurs.

Comment 3 Andiry 2008-09-28 01:55:14 UTC
Yeah, dmesg attached, the issue occurs from line 658.

This problem has not been seen on RHEL 5.

Comment 4 Pete Zaitcev 2008-10-09 03:18:47 UTC
Sadly, in the captured dmesg there's not a peep after the 1-3 disconnect
from bus #4 where the keyboard is, although lots of very strange happenings
occur between #1 and #6. I suspect it's something in the usbhid. I'm going
to come up with some instrumentation to check what's going on with the
usbhid and bus #4.

Comment 5 Andiry 2008-10-09 03:33:00 UTC
Hey Pete, we've created an issue-tracker for this and emcnabb is also investigating it. Link below.

https://enterprise.redhat.com/issue-tracker/215951

And this issue cannot be reproduced on console mode(init 3), no matter SMP or EL. FYI.

Comment 7 Pete Zaitcev 2008-10-14 02:29:58 UTC
Created attachment 320245 [details]
test 1 - printk in keyboard.c

Reproducing this is next to impossible without a suitably broken hardware.
Evan had no luck either, naturally. We must have these strain disconnects
at bus #1 (although keyboard is at #4, maybe some lock is get stuck...)

So, the only hope is to run tests on site.

My first idea is very basic: split the stack into pieces horizontally
until the culprit is found. The attached trivial patch prints events
at the interface between input stack and keyboard driver. It would be
nice to apply it to 2.6.9-usb-dbg and see what happens after the
intentional disconnect.

Comment 8 Andiry 2008-10-14 03:25:53 UTC
So do we need to apply this patch, test it and provide dmesg info to you?

Comment 9 Pete Zaitcev 2008-10-14 04:04:49 UTC
Andiry, yes, please. Normally I should be building test kernels for you,
but since you already use a modified kernel, and since turning around patches
is quicker (and I expect several turns), let's do this.

The reason I start so high up the stack is the comment #5, that X gets
involved somehow. It may have something to do with X having open two ttyN,
not just one like console shell does.

Comment 10 Andiry 2008-10-14 07:17:54 UTC
Created attachment 320263 [details]
dmesg info with kbd patch.

This time we use a PS/2 keyboard, apply the patch and reproduce the issue.

Comment 11 Pete Zaitcev 2008-10-14 19:34:11 UTC
Thanks, I see, the events are coming. Let's cut even higher up the stack,
at the kernel boundary.

Please capture the strace output for the X process. Keep in mind, in
RHEL 4 the X server usually polls a lot, so the trace is going to be
very long. So, start "strace -o x.trace -p NNNN" (over ssh), pull the
cable of the speaker, verify that the X stopped reacting, hit ^C
on the strace. Also, capture the output of "lsof -p NNNN > x.lsof",
because I need to know which number is which in the x.trace, and
"ps -auxw" (so we can see if anything got stuck in D state).

Comment 12 Andiry 2008-10-15 02:31:31 UTC
Created attachment 320378 [details]
x.trace

x.trace taken after remove the speaker.

Comment 13 Andiry 2008-10-15 02:32:45 UTC
Created attachment 320379 [details]
x.lsof

Comment 14 Andiry 2008-10-15 02:33:09 UTC
Created attachment 320380 [details]
ps -auxw

Comment 15 Pete Zaitcev 2008-10-15 04:55:29 UTC
Andiry, I have a follow-up question. If you look at x.trace, you can see
that the last read(3, ...) is at line 2312. I want to make sure that tty7
stops delivering keystrokes. So, if you re-run this test, unplug the speaker,
hit the keys, and watch the output of strace, does this read(3,) occur?
Presumably not, but I need to be sure. No need to re-attach new x.trace,
just watch the trace and let me know, please.

Also, does the mouse die together with keyboard, or it continues to work?

Comment 16 Andiry 2008-10-15 05:35:31 UTC
Well, it does not show any read(3,) after plugout the speaker, no matter how many times I hit the keys.

The mouse still works after plugout the speaker. Only the keyboard dies.

Comment 19 Evan McNabb 2008-10-15 14:31:56 UTC
I spent some more time testing this today and I am able to reproduce it with the Logitech V10 Notebook Speakers (which AMD is also using). It turns out I only see this behavior if I'm actually logged into Gnome, but not at the GDM login prompt. The keyboard only shows this disconnect behavior if the speakers are plugged in before the login occurs (if I log in first, plug in the speakers, and then disconnect them, everything works as expected).

Also, when it occurs the PS2 keyboard which is also attached no longer functions.

About the speakers themselves, there are no buttons on them besides the up/down volume control.

Comment 20 Pete Zaitcev 2008-10-15 22:51:21 UTC
Evan, does the SysRq work?

Comment 21 Pete Zaitcev 2008-10-16 03:35:50 UTC
Created attachment 320512 [details]
test 2 - printk in tty work

This is a very small step, but may I have this run?
Evan, if you can build kernels, it would be good too.
My own attemts to replicate the issue using the local equipment
were not successful so far.

Comment 22 Andiry 2008-10-16 04:43:25 UTC
Created attachment 320514 [details]
dmesg with tty patch

Comment 23 Pete Zaitcev 2008-10-16 22:24:40 UTC
Andiry, do you apply these patches with patch(1), or a text editor?
It looks like the elements of the old patch from comment #10 were
still in effect, e.g. there's no event_type. Please let me know immediately
if patch(1) fails with rejects, and do not try to fix rejects with vi.
It patch(1) fails, it means that your tree does not match mine and
we don't have a consistent base to work on.

Evan, please back Andiry up. If I attach a bad test patch, please let
me know immediately, so I can fix it before the next day in Shanghai.
(That said, I still think Andiry mixed up two patches today).

Comment 24 Andiry 2008-10-17 01:39:46 UTC
Created attachment 320625 [details]
dmesg with tty revised

Sorry, my mistake. I thought the first kbd patch is same with part of the new patch.

Comment 25 Pete Zaitcev 2008-10-17 02:11:42 UTC
Created attachment 320629 [details]
test 3 - zoom in on kbd_keycode

Please try this.

Comment 26 Andiry 2008-10-17 07:43:44 UTC
Created attachment 320644 [details]
dmesg with new patch

Comment 27 Pete Zaitcev 2008-10-18 02:11:57 UTC
Created attachment 320735 [details]
test 4 - zoom in on shift state

Please try the attached.

I suspect that the keyboard driver itself may not be at fault.
Perhaps something in the hotplug loads something. But so far there's
no breakthrough, so let's continue the turtle steps.

Comment 28 Andiry 2008-10-20 02:59:29 UTC
Created attachment 320836 [details]
dmesg-test4

Comment 29 Pete Zaitcev 2008-10-21 00:28:54 UTC
Created attachment 320949 [details]
test 5 - zoom at x86_keycodes and the flip

I made a mistake when reading the previous dmesg and went down the wrong
road. The last dmesg makes it clear that there's no damage to shift states.

Please run this patch and capture dmesg for me.

Comment 30 Andiry 2008-10-21 01:23:56 UTC
Created attachment 320954 [details]
dmesg-test5

Comment 31 Pete Zaitcev 2008-10-21 03:10:46 UTC
Oh god, how could I miss this? The data was in the ps output attached
on 10/14. The disconnect killed keventd.

I need to see the output of SysRq-T. Please kill all unnecessary
processes before triggering it, in case dmesg overflows.

Comment 33 Evan McNabb 2008-10-21 15:15:55 UTC
Pete,

I'm attaching sysrq-t output from right before and after a pull. I put the delimiter "--- disconnect starts ---" in the file dmesg-postdisconnect to show the split between the first and second sysrq-t executions. Let me know if this isn't what you're looking for.

Note, on this run I pulled the speaker cable and the keyboard still worked. So I plugged the speaker back in and unplugged it a second time, which disabled the keyboard.

Comment 34 Evan McNabb 2008-10-21 15:16:42 UTC
Created attachment 321021 [details]
dmesg w/ sysrq-t before pulling speaker cable

Comment 35 Evan McNabb 2008-10-21 15:17:43 UTC
Created attachment 321022 [details]
dmesg w/ sysrq-t after pulling speaker cable

Comment 36 Pete Zaitcev 2008-10-21 15:47:29 UTC
Evan, the dmesg you attached wrapped, unfortunately. Please kill all
unnecessary processes (from an ssh connection, of course, since the
keyboard is dead) before triggering sysrq. I need to see the hanging
keventd (process name is "events/0").

Comment 37 Evan McNabb 2008-10-21 17:11:04 UTC
Created attachment 321041 [details]
dmesg w/ sysrq-t

Here's round two. It looks like "events/2" is in D state.

Comment 38 Andiry 2008-10-22 01:18:01 UTC
Created attachment 321104 [details]
dmesg-SysRq-T

My SysRq-T.
Hope it helps.

Comment 39 Pete Zaitcev 2008-10-26 22:14:04 UTC
Thanks for the SysRq-T. I think we essentially are done with diagnosing
the problem (keventd is stuck in snd_card_free), now it's time to figure
out a fix.

Comment 40 Andiry 2008-11-04 01:29:22 UTC
Hi Pete,
Any update on this issue?

Comment 41 Pete Zaitcev 2008-11-07 17:31:47 UTC
After we identified ALSA as the culprit (with the help of SysRq-T),
I tried to get Jaroslav to look into this, since he wrote the whole
subsystem in the first place. He gave me some suggestions, but did not
want to take the bug. Therefore, I'll have to read up on that code and
come up with a solution. It looks like ~3000 line patch, no kABI break.
But please don't hold me responsible to this estimate. I'm not intimate
with ALSA, but there's nothing impossible for me in kernel. Just takes
some time.

Currently I had to rush to work on some other bug for 4.8 release.
This bug needs to be schedules for work since it's not as trivial
as I hoped. I'll mention this to PM & my manager.

Comment 45 Pete Zaitcev 2009-01-05 18:16:42 UTC
Created attachment 328220 [details]
A test helper

This is a program to help crashing the OS. It holds onto the file descriptor
where normal ALSA programs exit.

Comment 48 Andrius Benokraitis 2009-01-10 18:06:57 UTC
After speaking to kernel engineering management, this patch looks to be very complex, breaks kABI, and could introduce regressions into a mature RHEL 4. We are finalizing what's going into RHEL 4.8 Beta on the kernel side and this just doesn't seem to be a candidate for inclusion.


Note You need to log in before you can comment on or make changes to this bug.