Bug 227808
| Summary: | [RHEL-4] Reboot failure on AMD64 when no keyboard attached | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Leonard den Ottolander (Locomo) <leonard> | ||||||||||||||||||
| Component: | kernel | Assignee: | Aristeu Rozanski <arozansk> | ||||||||||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||
| Priority: | medium | ||||||||||||||||||||
| Version: | 4.4 | CC: | hiroto.shibuya, jbaron, jpyeron, leonard-rh-bugzilla | ||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||
| Last Closed: | 2012-06-20 16:03:00 UTC | Type: | --- | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Embargoed: | |||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
Leonard den Ottolander (Locomo)
2007-02-08 11:43:05 UTC
Created attachment 147643 [details]
messages during reboot (no snips in between)
Created attachment 147644 [details]
dmesg first machine
dmesg for the machine that failed to reboot today as the keyboard emulator was
removed. Reboot failure has also been observed on second machine, but not for
15 months after attaching (and gladly not removing again) a keyboard emulator.
Created attachment 147645 [details]
hwconf first machine
Created attachment 147646 [details]
lspci first machine
Created attachment 147647 [details]
dmesg second machine
Created attachment 147649 [details]
hwconf second machine
Created attachment 147650 [details]
lspci second machine
Leonard, can you please try the kernel on http://people.redhat.com/arozansk/reboot/ and see if it solves your problem? Thanks, I wonder if the patch I posted on the following bug is applicable here. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=231768 BTW, I tested http://people.redhat.com/arozansk/reboot/kernel-2.6.9-48.EL.227808.i686.rpm on the EPIA board I reported in Bug 231768 and it did not reboot without keyboard. Aristeu, This problem occurs on a production system that is located 800km from here. Although I am usually quite willing to test these kinds of things I am not in this particular case, as you can imagine. It would mean I have to hire hands from my ISP and schedule downtime for this production server. I hope you could test it on one of your test machines with similar specifications, see if you can reproduce the hang with an old kernel, and see if the new kernel fixes it. I suppose the issue Hiroto is having might well be the same as mine, as this is
the same chipset (VT8237), and according to my hosting provider this issue is
solved in kernel-2.6.13, which is consistent with what he reports.
However, provided patch in attachment 149791 [details] is for i386, and mine are x86_64
systems. (Maybe the subject should be changed to "Reboot failure on VT8237 when
no keyboard attached".)
Created attachment 149849 [details]
reboot without keyboard patch for x86_64
I found the exact same code in the difference place for x86.
This is an untested patch for x86_64 applying the same diff
as I did in i386, although in the different place.
Any progress on this issue? As indicated earlier I am unable/unwilling to test this on the live boxes I am experiencing the issue on, but that should not stop you to find an appropriate system/tester. Problem should be easily reproducible. Leonard, I still didn't found any system around with this chipset. For the patches posted here, none of them are upstream, so there's another fix for this issue in newer kernels. So, I'm afraid that there wasn't much progress in this issue yet. Sadly the issue still exists in 2.6.9-67.0.4.EL. Although this is a long shot, perhaps it is possible for you to contact my provider Hetzner (http://hetzner.de) and see if they are willing to help out by temporarily providing you a system with this chip set. Else someone in one of the other Red Hat offices might have such a system available. This thing keeps biting me in the rear. Had the network card burn out a little while ago so a LARA (remote java app KVM) was attached. Of course the technician forgot to put the keyboard emulator back after removing the LARA. Just rebooted after the last kernel upgrade and of course, the machine didn't come up, so as a fyi issue still exists in kernel-2.6.9-89.0.20.EL and assumingly in kernel-2.6.9-89.0.23.EL. I suppose I'll just have to live with this cause I don't see this issue fixed before EOL of either RHEL4 or my box :s . The proposed patch only fixed the issue in mach_reboot.c which might be the issue with bug 231768 but maybe not with this bug. The issue with i8042_command() negating a register value and writing this invalid value back in i8042_controller_init() just before the reboot is not addressed by the previous patch. Regarding the i8042 in http://groups.google.com/group/linux.kernel/browse_thread/thread/d2ccf3ce893561ea/0a28e581d1bd271d comment 26 Vojtech Pavlek concludes: "I suppose we can get rid of the checking of data source and negation and be done with it." This is what actually has happened in the main tree. Patching the relevant section vs. the current tree (where i8042_command was renamed to __i8042_command) leads to: --- i8042.c.000 2004-10-18 23:53:12.000000000 +0200 +++ i8042.c 2010-03-20 16:23:54.000000000 +0100 @@ -198,10 +198,12 @@ static int i8042_command(unsigned char * if (!retval) for (i = 0; i < ((command >> 8) & 0xf); i++) { if ((retval = i8042_wait_read())) break; - if (i8042_read_status() & I8042_STR_AUXDATA) - param[i] = ~i8042_read_data(); - else - param[i] = i8042_read_data(); + if (command == I8042_CMD_AUX_LOOP && + !(i8042_read_status() & I8042_STR_AUXDATA)) { + dbg(" -- i8042 (auxerr)"); + } + + param[i] = i8042_read_data(); dbg("%02x <- i8042 (return)", param[i]); } As you can see an extra test for command == I8042_CMD_AUX_LOOP was added as well, but the main issue is the negation of param[i] being removed regardless if I8042_STR_AUXDATA is set. The crude version of this patch (without the extra test in HEAD) would read: --- i8042.c.000 2004-10-18 23:53:12.000000000 +0200 +++ i8042.c 2010-03-20 16:30:02.000000000 +0100 @@ -198,10 +198,7 @@ static int i8042_command(unsigned char * if (!retval) for (i = 0; i < ((command >> 8) & 0xf); i++) { if ((retval = i8042_wait_read())) break; - if (i8042_read_status() & I8042_STR_AUXDATA) - param[i] = ~i8042_read_data(); - else - param[i] = i8042_read_data(); + param[i] = i8042_read_data(); dbg("%02x <- i8042 (return)", param[i]); } Haven't tested this (not willing to take my production server down for this) but from what I've read I'm quite sure this is the solution for my problem. But like I said, probably beating a dead horse here, just gotta make sure those technicians don't remove that keyboard (emulator). Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |