Description of problem: RHEL 4u7: Keyboard not functional after plug out usb 1.1 speaker Version-Release number of selected component (if applicable): RHEL 4u7 How reproducible: Always Steps to Reproduce: 1. install RHEL 4u7 32/64bit 2. insert a usb 1.1 speaker 3. reboot system with speaker plugged in 4. cat /proc/asound/cards to make sure its been detected 5. plug out usb speaker after enter in os 6. The keyboard will not functional Actual results: The keyboard lost responsbility. Expected results: The keyboard should function normally. Additional info: This issue occurs on AMD sb600, sb700 and intel platforms.
May I see dmesg, taken after the issue occurs (after step 6)? Please attach it to the bug, don't drop into comment box. Does this problem exist on RHEL 5?
Created attachment 317877 [details] dmesg info taken after the issue occurs.
Yeah, dmesg attached, the issue occurs from line 658. This problem has not been seen on RHEL 5.
Sadly, in the captured dmesg there's not a peep after the 1-3 disconnect from bus #4 where the keyboard is, although lots of very strange happenings occur between #1 and #6. I suspect it's something in the usbhid. I'm going to come up with some instrumentation to check what's going on with the usbhid and bus #4.
Hey Pete, we've created an issue-tracker for this and emcnabb is also investigating it. Link below. https://enterprise.redhat.com/issue-tracker/215951 And this issue cannot be reproduced on console mode(init 3), no matter SMP or EL. FYI.
Created attachment 320245 [details] test 1 - printk in keyboard.c Reproducing this is next to impossible without a suitably broken hardware. Evan had no luck either, naturally. We must have these strain disconnects at bus #1 (although keyboard is at #4, maybe some lock is get stuck...) So, the only hope is to run tests on site. My first idea is very basic: split the stack into pieces horizontally until the culprit is found. The attached trivial patch prints events at the interface between input stack and keyboard driver. It would be nice to apply it to 2.6.9-usb-dbg and see what happens after the intentional disconnect.
So do we need to apply this patch, test it and provide dmesg info to you?
Andiry, yes, please. Normally I should be building test kernels for you, but since you already use a modified kernel, and since turning around patches is quicker (and I expect several turns), let's do this. The reason I start so high up the stack is the comment #5, that X gets involved somehow. It may have something to do with X having open two ttyN, not just one like console shell does.
Created attachment 320263 [details] dmesg info with kbd patch. This time we use a PS/2 keyboard, apply the patch and reproduce the issue.
Thanks, I see, the events are coming. Let's cut even higher up the stack, at the kernel boundary. Please capture the strace output for the X process. Keep in mind, in RHEL 4 the X server usually polls a lot, so the trace is going to be very long. So, start "strace -o x.trace -p NNNN" (over ssh), pull the cable of the speaker, verify that the X stopped reacting, hit ^C on the strace. Also, capture the output of "lsof -p NNNN > x.lsof", because I need to know which number is which in the x.trace, and "ps -auxw" (so we can see if anything got stuck in D state).
Created attachment 320378 [details] x.trace x.trace taken after remove the speaker.
Created attachment 320379 [details] x.lsof
Created attachment 320380 [details] ps -auxw
Andiry, I have a follow-up question. If you look at x.trace, you can see that the last read(3, ...) is at line 2312. I want to make sure that tty7 stops delivering keystrokes. So, if you re-run this test, unplug the speaker, hit the keys, and watch the output of strace, does this read(3,) occur? Presumably not, but I need to be sure. No need to re-attach new x.trace, just watch the trace and let me know, please. Also, does the mouse die together with keyboard, or it continues to work?
Well, it does not show any read(3,) after plugout the speaker, no matter how many times I hit the keys. The mouse still works after plugout the speaker. Only the keyboard dies.
I spent some more time testing this today and I am able to reproduce it with the Logitech V10 Notebook Speakers (which AMD is also using). It turns out I only see this behavior if I'm actually logged into Gnome, but not at the GDM login prompt. The keyboard only shows this disconnect behavior if the speakers are plugged in before the login occurs (if I log in first, plug in the speakers, and then disconnect them, everything works as expected). Also, when it occurs the PS2 keyboard which is also attached no longer functions. About the speakers themselves, there are no buttons on them besides the up/down volume control.
Evan, does the SysRq work?
Created attachment 320512 [details] test 2 - printk in tty work This is a very small step, but may I have this run? Evan, if you can build kernels, it would be good too. My own attemts to replicate the issue using the local equipment were not successful so far.
Created attachment 320514 [details] dmesg with tty patch
Andiry, do you apply these patches with patch(1), or a text editor? It looks like the elements of the old patch from comment #10 were still in effect, e.g. there's no event_type. Please let me know immediately if patch(1) fails with rejects, and do not try to fix rejects with vi. It patch(1) fails, it means that your tree does not match mine and we don't have a consistent base to work on. Evan, please back Andiry up. If I attach a bad test patch, please let me know immediately, so I can fix it before the next day in Shanghai. (That said, I still think Andiry mixed up two patches today).
Created attachment 320625 [details] dmesg with tty revised Sorry, my mistake. I thought the first kbd patch is same with part of the new patch.
Created attachment 320629 [details] test 3 - zoom in on kbd_keycode Please try this.
Created attachment 320644 [details] dmesg with new patch
Created attachment 320735 [details] test 4 - zoom in on shift state Please try the attached. I suspect that the keyboard driver itself may not be at fault. Perhaps something in the hotplug loads something. But so far there's no breakthrough, so let's continue the turtle steps.
Created attachment 320836 [details] dmesg-test4
Created attachment 320949 [details] test 5 - zoom at x86_keycodes and the flip I made a mistake when reading the previous dmesg and went down the wrong road. The last dmesg makes it clear that there's no damage to shift states. Please run this patch and capture dmesg for me.
Created attachment 320954 [details] dmesg-test5
Oh god, how could I miss this? The data was in the ps output attached on 10/14. The disconnect killed keventd. I need to see the output of SysRq-T. Please kill all unnecessary processes before triggering it, in case dmesg overflows.
Pete, I'm attaching sysrq-t output from right before and after a pull. I put the delimiter "--- disconnect starts ---" in the file dmesg-postdisconnect to show the split between the first and second sysrq-t executions. Let me know if this isn't what you're looking for. Note, on this run I pulled the speaker cable and the keyboard still worked. So I plugged the speaker back in and unplugged it a second time, which disabled the keyboard.
Created attachment 321021 [details] dmesg w/ sysrq-t before pulling speaker cable
Created attachment 321022 [details] dmesg w/ sysrq-t after pulling speaker cable
Evan, the dmesg you attached wrapped, unfortunately. Please kill all unnecessary processes (from an ssh connection, of course, since the keyboard is dead) before triggering sysrq. I need to see the hanging keventd (process name is "events/0").
Created attachment 321041 [details] dmesg w/ sysrq-t Here's round two. It looks like "events/2" is in D state.
Created attachment 321104 [details] dmesg-SysRq-T My SysRq-T. Hope it helps.
Thanks for the SysRq-T. I think we essentially are done with diagnosing the problem (keventd is stuck in snd_card_free), now it's time to figure out a fix.
Hi Pete, Any update on this issue?
After we identified ALSA as the culprit (with the help of SysRq-T), I tried to get Jaroslav to look into this, since he wrote the whole subsystem in the first place. He gave me some suggestions, but did not want to take the bug. Therefore, I'll have to read up on that code and come up with a solution. It looks like ~3000 line patch, no kABI break. But please don't hold me responsible to this estimate. I'm not intimate with ALSA, but there's nothing impossible for me in kernel. Just takes some time. Currently I had to rush to work on some other bug for 4.8 release. This bug needs to be schedules for work since it's not as trivial as I hoped. I'll mention this to PM & my manager.
Created attachment 328220 [details] A test helper This is a program to help crashing the OS. It holds onto the file descriptor where normal ALSA programs exit.
After speaking to kernel engineering management, this patch looks to be very complex, breaks kABI, and could introduce regressions into a mature RHEL 4. We are finalizing what's going into RHEL 4.8 Beta on the kernel side and this just doesn't seem to be a candidate for inclusion.