Bug 436659
Summary: | oops in evdev device close | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Jackson <ajax> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | rawhide | CC: | dwmw2, wtogami, wwoods | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-04-08 15:28:33 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 235705 | ||||||||
Attachments: |
|
Description
Adam Jackson
2008-03-08 20:54:41 UTC
Did it say "kernel bug at list_debug.h line NNN" above that? I poked around with evtest a little, no luck. I modified the Vojtech's evtest to perform grabbing (the backtrace could only happen if the device was grabbed). Tried to open the same device twice, close while getting events, disconnect. This does not seem trivially reproducible... Sorry. Chuck: no, that was the entirety of the traceback. This is trivial to use reproduce with X. Remove a plugged device while the server is running, then exit X. This is making a lot of my F9Beta tests "crash" when I switch machines on the KVM. Makes it reeeally hard to test things. Strangely it only seems to happen while in anaconda. Created attachment 297861 [details]
Hacked-up evtest with a grab and an exit
Created attachment 297862 [details]
Full dmesg with crash
Looks like I'm reproducing something that is similar to Ajax' crash. The recipy is: - run 2 evtests (I used the same device, one grabs one fails) in a while true loop. - Disconnect several times, as seen in the dmesg. Oopses here in lib/list_debug.c::__list_add(): if (unlikely(prev->next != next)) { Corrupted linked list in the mutex code... Just before it oopsed it printed: WARNING: at kernel/mutex.c:134 mutex_lock_nested+0xca/0x295() (Not tainted) which is: spin_lock_mutex(&lock->wait_lock, flags); #define spin_lock_mutex(lock, flags) \ do { \ struct mutex *l = container_of(lock, struct mutex, wait_lock); \ \ DEBUG_LOCKS_WARN_ON(in_interrupt()); \ local_irq_save(flags); \ __raw_spin_lock(&(lock)->raw_lock); \ DEBUG_LOCKS_WARN_ON(l->magic != l); \ } while (0) We aren't in an interrupt, so it must be the lock magic that is bad. Right, I too think it's some kind of use-after-free. Oh god, drivers/input/input.c, I cannot unsee it. This should do it: diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c index 0727b0a..c0874a3 100644 --- a/drivers/input/evdev.c +++ b/drivers/input/evdev.c @@ -155,7 +155,8 @@ static int evdev_ungrab(struct evdev *evdev, struct evdev_client *client) rcu_assign_pointer(evdev->grab, NULL); synchronize_rcu(); - input_release_device(&evdev->handle); + if (evdev->exist) + input_release_device(&evdev->handle); return 0; } We no longer use evdev for keyboards, are you still seeing this crash? Should we move this off the beta blocker list? xserver-1.5.0-no-evdev-keyboards-kthnx.patch (in xorg-x11-server 1.4.99.901-10.20080314.fc9) works around the crash. Fix would still be nice to have, so I'm moving this to F9Target. I'm comminucating with upstream about this. Dmitry prefers evdev to pin down input_dev with its refcount instead of checking for existence. Regarding Jesse's comment #12, perhaps it would be better to throw my patch into the kernel instead of saddling X11 with workarounds. It's clear that the bug is in kernel. Dmitry said that he "does not oppose" my patch, but wants "to understand it better". If X11 people think that keyboard is ok with old API, then sure, no problem. I still have my reproducer. Jesse's comment #12 was only to justify moving this bug off the Beta blocker list. I won't consider the bug closed until we have this fix (or a similar one) in the kernel. IIRC we're not going to be using evdev keyboards for F9 final but we still hope to move to evdev-managed keyboards in F10 rawhide - that is to say, a few weeks from now. There's also a few circumstances where evdev will still claim keyboard devices in F9b. Avoiding oopses in those situations would be A Very Good Thing. But I don't believe it's *urgent* anymore; if you need more time to write a better patch that's fine. We'll also still be using evdev for (most) mice. According to comment #2, this only happens when the device is grabbed, but the evdev driver will _always_ grab. Given that my reproducer in comment #3 was tested by removing a mouse, this still definitely needs a fix for F9. I'd be happy with carrying Pete's fix for F9 until an upstream fix happens. I am unable to reproduce this with 2.6.25-0.204.rc8.git4.fc9. I don't see any evdev-related patches in our patchset, so I'll assume this was resolved upstream. Thanks all! Dmitry used a different approach: he prefers pinning the parent device. But whatever works. Commit a7097ff89c3204737a07eecbc83f9ae6002cc534. |