Red Hat Bugzilla – Bug 468915
[Stratus/NEC 5.3 bug] System can crash when removing input device
Last modified: 2009-06-20 03:54:32 EDT
Description of problem:
Under certain circumstances, unplugging an input device can cause the system to crash. In particular, reading /proc/bus/input/devices while a device removal is in progress can cause a NULL pointer dereference.
This happens because there is no locking around the list of input devices (input_dev_list) that is traversed while reading /proc/bus/input/devices. The code even admits as much:
static void *input_devices_seq_start(struct seq_file *seq, loff_t *pos)
/* acquire lock here ... Yes, we do need locking, I knowi, I know... */
return list_get_nth_element(&input_dev_list, pos);
If a device is removed while such a traversal is in progress, one could hit a dangling pointer.
Version-Release number of selected component (if applicable):
This bug is present in all 5.2 kernels as of the date of this report.
Easy, once you know how.
Steps to Reproduce:
We managed to reproduce this issue on a Stratus ftServer. It is particularly easy to reproduce on this system because the act of switching active consoles from one CRU to another removes and re-adds the input devices in very rapid succession.
1. Instal RHEL5.2 and Stratus ftSSS on an ftServer.
2. In one session, start a tight loop that cat's /proc/bus/input/devices
3. In another session, start a tight loop that does /opt/ft/bin/ftsmaint acSwitch
Typically within five minutes, the system will crash as follows:
10-21 18:08:07 Unable to handle kernel NULL pointer dereference at 0000000000000
10-21 18:08:07 <6>EVLOG: INFORMATION - 10 is now STATE_DUPLEX / REASON_PRIMARY
10-21 18:08:07 [<ffffffff80057a74>] kobject_get_path+0x81/0xc1
10-21 18:08:07 PGD 1564d4067 PUD 159dea067 PMD 0
The system should switch the active console back and forth every few seconds without incident.
Dmitry Torokhov submitted a patch entitled "implement proper locking in input core" (git id: 8006479c9b75fb6594a7b746af3d7f1fbb68f18f) on 8/30/2007. I cherry-picked the portions of this patch that deal with input_dev_list. When I apply this patch I can run the above reproduction scenario for hours and the system stays up.
Created attachment 322471 [details]
This patch fixes
You can download this test kernel from http://people.redhat.com/dzickus/el5
~~ Snapshot 3 is now available ~~
Snapshot 3 is now available for Partner Testing, which should contain a fix that resolves this bug. ISO's available as usual at ftp://partners.redhat.com. Your testing feedback is vital! Please let us know if you encounter any NEW issues (file a new bug) or if you have VERIFIED the fix is present and functioning as expected (add PartnerVerified Keyword).
Ping your Partner Manager with any additional questions. Thanks!
~~ Attention ~~ Snapshot 4 is now available for testing @ partners.redhat.com ~~
Partners, it is vital that we get your testing feedback on this important bug fix / feature request. If you are unable to test, please clearly indicate this in a comment to this bug or directly with your partner manager. If we do not receive your test feedback, this bug is at risk from being dropped from the release.
If you have VERIFIED the fix, please add PartnerVerified to the Bugzilla Keywords field, along with a description of the test results.
If you encounter a new bug, CLONE this bug and request from your Partner manager to review. We are no longer excepting new bugs into the release, bar critical regressions.
I have verified this fix. My old test case would crash the system within five iterations; with the latest kernel I have gone 1000 iterations without incident.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.