Bug 722472
Summary: | [ INFO: possible recursive locking detected ] 3.0-0.rc7.git0.1.fc16.i686 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | sd.domrep | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 16 | CC: | andrewclarkii, aquini, awilliam, che666, dmach, gansalmon, gbauman, gtmkramer, itamar, jonathan, jwboyer, karo1170, kernel-maint, lpoetter, madhu.chinakonda, mads, mailings, mikko.tiihonen, pebolle, rjones, samuel-rhbugs, sankarshan, satellitgo, smooge, tflink, vonbrand | ||||||
Target Milestone: | --- | Keywords: | CommonBugs | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | https://fedoraproject.org/wiki/Common_F16_bugs#kernel-lockdep | ||||||||
Fixed In Version: | kernel-3.1.0-0.rc3.git0.0.fc16 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-08-18 02:33:01 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
sd.domrep
2011-07-15 13:10:43 UTC
*** Bug 722401 has been marked as a duplicate of this bug. *** I have been trying to track down a set of crashes with 3.0 systems... the system prints this and gets slower and slower until it "locks" up completely. Problem does not occur on 2.6.38.8-32 kernel. Not sure what it causing this between the two. Going through rawhide kernels from 2.6.39 onward.. a "deadlock" issue similar to this seems to start happening between 2.6.39-1 and 3.0.0-rc1. Not sure how to debug beyond that. 0) On 2.6.39 I got only a warning: ============================================= [ INFO: possible recursive locking detected ] 2.6.39-0.local9.fc16.i686 #1 --------------------------------------------- systemd-logind/807 is trying to acquire lock: (&ep->mtx){+.+.+.}, at: [<c0524a05>] ep_scan_ready_list+0x32/0x154 but task is already holding lock: (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481 other info that might help us debug this: 2 locks held by systemd-logind/807: #0: (epmutex){+.+.+.}, at: [<c0524e5f>] sys_epoll_ctl+0xbe/0x481 #1: (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481 stack backtrace: Pid: 807, comm: systemd-logind Not tainted 2.6.39-0.local9.fc16.i686 #1 Call Trace: [<c080af85>] ? printk+0x2d/0x2f [<c04690b7>] __lock_acquire+0x78f/0xae1 [<c040790c>] ? sched_clock+0x8/0xb [<c045b858>] ? sched_clock_local+0x10/0x18b [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c046981e>] lock_acquire+0xbc/0xdc [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c08127f3>] __mutex_lock_common+0x4a/0x2f0 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c0432502>] ? __might_sleep+0x29/0xfb [<c0466a50>] ? trace_hardirqs_off+0xb/0xd [<c0812b4e>] mutex_lock_nested+0x39/0x3e [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c0524a05>] ep_scan_ready_list+0x32/0x154 [<c05245df>] ? ep_remove+0x9b/0x9b [<c0524b3b>] ep_poll_readyevents_proc+0x14/0x16 [<c05247ea>] ep_call_nested.constprop.2+0x6d/0x9a [<c0524b27>] ? ep_scan_ready_list+0x154/0x154 [<c05248e6>] ep_eventpoll_poll+0x45/0x55 [<c0524fa0>] sys_epoll_ctl+0x1ff/0x481 [<c052470f>] ? ep_send_events_proc+0xd5/0xd5 [<c0819fdf>] sysenter_do_call+0x12/0x38 1) I'm not sure what causes the difference between an actual deadlock and (just) the above warning. Could it the inherent hit or miss aspect of a race condition? 2) Anyhow, what seems to trigger this is the recent upgrade to a new version of systemd (systemd-30-1.fc16.i686), as I do not recall seeing this before. I'm seeing this on x86_64. Unfortunately it makes the system unusable as I'm unable to log in or ssh to the machine. *** Bug 724952 has been marked as a duplicate of this bug. *** I verified that you can workaround this bug by downgrading to systemd-29 from: http://koji.fedoraproject.org/koji/buildinfo?buildID=247951 (It doesn't appear to be a systemd bug, this is just a workaround). (In reply to comment #7) > I verified that you can workaround this bug by downgrading > to systemd-29 from: 0) Thanks. > (It doesn't appear to be a systemd bug, this is just a workaround). 1) I've always thought that kernel bugs triggered by user space are still kernel bugs. Or is that too simplistic? Anyhow, perhaps the people working on systemd could somehow indicate what changed in their use of this epoll stuff (which I'm unfamiliar with). That might help in pinpointing the cause of this bug. No argument -- this is a kernel bug. It's just that systemd-29 doesn't happen to trigger it. CC-ing Lennart for any insight into what changed in the use of epoll between -29 and -30. 0) Trying to parse the code in fs/eventpoll.c I came up with this rough line of events that sort of matches the backtraces (all numbers are linenumbers in fs/eventpoll.c): 1347:sys_poll_ctl() 1412: mutex_lock() 1426: ep_insert() | 921: ep_insert() 957: tfile->f_op->poll() | 676: ep_eventpoll_poll() 690: ep_call_nested() 691: ep_poll_readyevents_proc() || 347: ep_call_nested() 671: ep_poll_readyevents_proc() 673: ep_scan_ready_list() | 470: ep_scan_ready_list() 484: mutex_lock() <= deadlock here 1) Does this make sense? 0) I've grepped through systemd's source a bit. 1) eg grep -n epoll_ctl src/logind* src/logind-session.c:819: if (epoll_ctl(s->manager->epoll_fd, EPOLL_CTL_ADD, s->fifo_fd, &ev) < 0) src/logind-session.c:836: assert_se(epoll_ctl(s->manager->epoll_fd, EPOLL_CTL_DEL, s->fifo_fd, NULL) == 0); src/logind.c:888: if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->bus_fd, &ev) < 0) src/logind.c:915: if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->console_active_fd, &ev) < 0) src/logind.c:954: if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_seat_fd, &ev) < 0) src/logind.c:975: if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_vcsa_fd, &ev) < 0) (Note that, for whatever reason, only s->fifo_fd will receive a EPOLL_CTL_DEL.) 2) s->fifo_fd seems to be an ordinary fd: src/logind-session.c:807: s->fifo_fd = open(s->fifo_path, O_RDONLY|O_CLOEXEC|O_NDELAY); 3) m->console_active_fd seems to be an ordinary fd too: src/logind.c:905: m->console_active_fd = open("/sys/class/tty/tty0/active", O_RDONLY|O_NOCTTY|O_CLOEXEC); 4) m->udev_seat_fd also seems to an ordinary fd (but for an udev related socket): src/logind.c:929: m->udev_seat_monitor = udev_monitor_new_from_netlink(m->udev, "udev"); src/logind.c:945: m->udev_seat_fd = udev_monitor_get_fd(m->udev_seat_monitor); 5) Ditto m->udev_vcsa_fd: src/logind.c:957: m->udev_vcsa_monitor = udev_monitor_new_from_netlink(m->udev, "udev"); src/logind.c:969: m->udev_vcsa_fd = udev_monitor_get_fd(m->udev_vcsa_monitor); 6) However, m->bus_fd seems to be a file descriptor referring to an epoll instance: src/logind.c:878: m->bus_fd = bus_loop_open(m->bus); Interesting is that bus_loop_open() returns this fd: src/dbus-loop.c:222: fd = epoll_create1(EPOLL_CLOEXEC); 7) It's not clear why systemd-logind wants to epoll on another epoll instance. Although I really know next to nothing about the epoll interface, that does look like something special to do. But, whatever are the reasons to do that, wouldn't epolling on another epoll instance generated by the same process walk into the deadlock I tried to describe in comment #10? (Since polling on an epoll instance seems to invoke ep_eventpoll_poll.) 8) Feedback from people familiar with epoll and/or systemd-logind is appreciated. 0) Note that the bus_fd code I described briefly in comment #11 was introduced in systemd's commit 2026308 logind: first version that compiles fine which was introduced after v29 (and so, was included in v30) which neatly fits the observation of comment #7 that a workaround for this issue is to downgrade to systemd-29. 1) Feedback from people familiar with systemd-logind would be very much appreciated. I'd really like to know whether I'm barking up the right tree here. (In reply to comment #11) > 7) It's not clear why systemd-logind wants to epoll on another epoll instance. > Although I really know next to nothing about the epoll interface, that does > look like something special to do. > > But, whatever are the reasons to do that, wouldn't epolling on another epoll > instance generated by the same process walk into the deadlock I tried to > describe in comment #10? (Since polling on an epoll instance seems to invoke > ep_eventpoll_poll.) Can this be escalated into some sort of DoS attack on the kernel? If so we may need to alert the security folk. Bit busy now but I'll play around with this on Monday. (In reply to comment #13) > Can this be escalated into some sort of DoS attack on > the kernel? 0) I wouldn't know. I haven't really been able to pinpoint the problem, let alone write the minimal case that reproduces this behaviour. 1) Currently I'm wondering whether src/dbus-loop.c:bus_loop_open() could be the source of all this: int bus_loop_open(DBusConnection *c) { int fd; [...]; fd = epoll_create1(EPOLL_CLOEXEC); if (fd < 0) return -errno; if (!dbus_connection_set_watch_functions(c, add_watch, remove_watch, toggle_watch, INT_TO_PTR(fd), NULL) || !dbus_connection_set_timeout_functions(c, add_timeout, remove_timeout, toggle_timeout, INT_TO_PTR(fd), NULL)) { close_nointr_nofail(fd); return -ENOMEM; } return fd; } The point is that add_watch(), remove_watch(), toggle_watch(), add_timeout(), and remove_timeout() here all are passed a pointer to "fd", which only is valid within bus_loop_open() itself. But these five functions all dereference that pointer to use it as the fd of an epoll instance in their calls of epoll_ctl(). Doesn't that risk using basically a random number instead of a valid fd of an epoll instance? 2) But I must admit that I hardly understand both systemd-logind and the epoll interface. 0) Public service announcement: just because the kernel prints *** DEADLOCK *** one shouldn't assume that logging in doesn't work any more (as it works just fine). That message is apparently added to stop easily fooled people, like me, from running that kernel. 1) Public service announcement: printf debugging of systemd-logind is easy. Basically one has to: - rebuild systemd's source rpm; - add log_info() calls wherever one likes (in our case mainly in src/loging* and src/dbus*); - do a make from the root of the build tree; - backup /lib/systemd/systemd-logind; - copy systemd-logind (from the root directory of the build tree) to /lib/systemd/systemd-logind. 2) Interesting part of the log: systemd-logind[771]: src/logind.c:1069: epoll_create1(EPOLL_CLOEXEC) systemd-logind[771]: src/logind.c:955: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_seat_fd, &ev) systemd-logind[771]: src/logind.c:977: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_vcsa_fd, &ev) systemd-logind[771]: src/logind.c:915: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->console_active_fd, &ev) systemd-logind[771]: src/dbus-loop.c:228: epoll_create1(EPOLL_CLOEXEC) systemd-logind[771]: src/dbus-loop.c:66: epoll_ctl(PTR_TO_INT(data), EPOLL_CTL_ADD, e->fd, &ev) systemd-logind[771]: src/dbus-loop.c:66: epoll_ctl(PTR_TO_INT(data), EPOLL_CTL_ADD, e->fd, &ev) systemd-logind[771]: src/dbus-loop.c:85: epoll_ctl(PTR_TO_INT(data), EPOLL_CTL_ADD, e->fd, &ev) systemd-logind[771]: src/logind.c:887: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->bus_fd, &ev) kernel: [ 46.645737] kernel: [ 46.645743] ============================================= kernel: [ 46.646034] [ INFO: possible recursive locking detected ] kernel: [ 46.646034] 3.0.0-1.fc16.i686 #1 kernel: [ 46.646034] --------------------------------------------- kernel: [ 46.646034] systemd-logind/771 is trying to acquire lock: kernel: [ 46.646034] (&ep->mtx){+.+.+.}, at: [<c052a355>] ep_scan_ready_list+0x32/0x154 kernel: [ 46.646034] kernel: [ 46.646034] but task is already holding lock: kernel: [ 46.646034] (&ep->mtx){+.+.+.}, at: [<c052a7f4>] sys_epoll_ctl+0x103/0x481 kernel: [ 46.646034] kernel: [ 46.646034] other info that might help us debug this: kernel: [ 46.646034] Possible unsafe locking scenario: kernel: [ 46.646034] kernel: [ 46.646034] CPU0 kernel: [ 46.646034] ---- kernel: [ 46.646034] lock(&ep->mtx); kernel: [ 46.646034] lock(&ep->mtx); kernel: [ 46.646034] kernel: [ 46.646034] *** DEADLOCK *** [...] Full log will be attached. 3) This seems to confirm my idea (in comment #11) of the code that triggers this issue. But fs/eventpoll.c has code to handle nested cases of epolling. Why does it fail here? Created attachment 514989 [details]
systemd-logind debugging with log_info()
This is an excerpt from /var/log/messages, containing:
- systemd-logind's normal output
- systemd-logind's log_info() messages of all epoll_* calls
- the kernel lockdep _warning_ that systemd-logind somehow triggers
*** Bug 725669 has been marked as a duplicate of this bug. *** *** Bug 725788 has been marked as a duplicate of this bug. *** 0) Summary: ftrace suggests (like the log_info() debugging attached also did) that this issue occurs after the last (the 7th) call op epoll_ctl() and before the first call op epoll_wait(). So the deadlock seems to occur somewhere in this call chain: sysenter_do_call sys_epoll_ctl ep_call_nested ep_loop_check_proc ep_eventpoll_poll ep_ptable_queue_proc ep_call_nested ep_poll_readyevents_proc ep_scan_ready_list ep_read_events_proc 1) I put some code (copied from Documentation/trace/ftrace.txt) at the start of systemd-logind. It traces all calls of functions starting with ep_* and sys_epoll_c*. Started systemd-logind manually. 2) trace output: # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | systemd-logind-1304 [000] 97.715219: sys_epoll_create1 <-sysenter_do_call systemd-logind-1304 [000] 97.715965: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.715976: ep_ptable_queue_proc <-datagram_poll systemd-logind-1304 [000] 97.716238: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.716245: ep_ptable_queue_proc <-datagram_poll systemd-logind-1304 [000] 97.718532: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.718548: ep_ptable_queue_proc <-sysfs_poll systemd-logind-1304 [000] 97.730604: sys_epoll_create1 <-sysenter_do_call systemd-logind-1304 [000] 97.730798: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.730807: ep_ptable_queue_proc <-sock_poll_wait systemd-logind-1304 [000] 97.730972: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.731271: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.731279: ep_ptable_queue_proc <-sock_poll_wait systemd-logind-1304 [000] 97.731715: sys_epoll_ctl <-sysenter_do_call systemd-logind-1304 [000] 97.731720: ep_call_nested <-sys_epoll_ctl systemd-logind-1304 [000] 97.731992: ep_loop_check_proc <-ep_call_nested systemd-logind-1304 [000] 97.732006: ep_eventpoll_poll <-sys_epoll_ctl systemd-logind-1304 [000] 97.732006: ep_ptable_queue_proc <-ep_eventpoll_poll systemd-logind-1304 [000] 97.732058: ep_call_nested <-ep_eventpoll_poll systemd-logind-1304 [000] 97.732066: ep_poll_readyevents_proc <-ep_call_nested systemd-logind-1304 [000] 97.732067: ep_scan_ready_list <-ep_poll_readyevents_proc systemd-logind-1304 [000] 97.739796: ep_read_events_proc <-ep_scan_ready_list systemd-logind-1304 [000] 97.827249: ep_scan_ready_list <-sys_epoll_wait systemd-logind-1304 [000] 97.827253: ep_send_events_proc <-ep_scan_ready_list systemd-logind-1304 [000] 97.858937: ep_scan_ready_list <-sys_epoll_wait systemd-logind-1304 [000] 97.858941: ep_send_events_proc <-ep_scan_ready_list systemd-logind-1304 [000] 97.858941: ep_eventpoll_poll <-ep_send_events_proc systemd-logind-1304 [000] 97.858942: ep_call_nested <-ep_eventpoll_poll systemd-logind-1304 [000] 97.858943: ep_poll_readyevents_proc <-ep_call_nested systemd-logind-1304 [000] 97.858943: ep_scan_ready_list <-ep_poll_readyevents_proc systemd-logind-1304 [000] 97.858944: ep_read_events_proc <-ep_scan_ready_list systemd-logind-1304 [000] 97.858947: ep_poll_safewake <-ep_scan_ready_list systemd-logind-1304 [000] 97.858947: ep_call_nested <-ep_poll_safewake systemd-logind-1304 [000] 97.858948: ep_poll_wakeup_proc <-ep_call_nested systemd-logind-1304 [000] 97.858948: ep_poll_callback <-__wake_up_common systemd-logind-1304 [000] 97.859128: ep_scan_ready_list <-sys_epoll_wait systemd-logind-1304 [000] 97.859129: ep_send_events_proc <-ep_scan_ready_list systemd-logind-1304 [000] 97.859131: ep_poll_safewake <-ep_scan_ready_list systemd-logind-1304 [000] 97.859131: ep_call_nested <-ep_poll_safewake systemd-logind-1304 [000] 97.859132: ep_poll_wakeup_proc <-ep_call_nested systemd-logind-1304 [000] 97.859132: ep_poll_callback <-__wake_up_common systemd-logind-1304 [000] 97.859357: ep_scan_ready_list <-sys_epoll_wait systemd-logind-1304 [000] 97.859359: ep_send_events_proc <-ep_scan_ready_list systemd-logind-1304 [000] 97.859359: ep_eventpoll_poll <-ep_send_events_proc systemd-logind-1304 [000] 97.859360: ep_call_nested <-ep_eventpoll_poll systemd-logind-1304 [000] 97.859360: ep_poll_readyevents_proc <-ep_call_nested systemd-logind-1304 [000] 97.859361: ep_scan_ready_list <-ep_poll_readyevents_proc (Timestamps do not match dmesg's timestamps. Why?) 3) Reformatted output for clarity: sysenter_do_call sys_epoll_create1 sysenter_do_call sys_epoll_ctl datagram_poll ep_ptable_queue_proc sysenter_do_call sys_epoll_ctl datagram_poll ep_ptable_queue_proc sysenter_do_call sys_epoll_ctl sysfs_poll ep_ptable_queue_proc sysenter_do_call sys_epoll_create1 sysenter_do_call sys_epoll_ctl sock_poll_wait ep_ptable_queue_proc sysenter_do_call sys_epoll_ctl sysenter_do_call sys_epoll_ctl sock_poll_wait ep_ptable_queue_proc sysenter_do_call sys_epoll_ctl ep_call_nested ep_loop_check_proc ep_eventpoll_poll ep_ptable_queue_proc ep_call_nested ep_poll_readyevents_proc ep_scan_ready_list ep_read_events_proc [ *** DEADLOCK *** warning here? ] [sysenter_do_call] sys_epoll_wait ep_scan_ready_list ep_send_events_proc [sysenter_do_call] sys_epoll_wait ep_scan_ready_list ep_send_events_proc ep_eventpoll_poll ep_call_nested ep_poll_readyevents_proc ep_scan_ready_list ep_read_events_proc ep_poll_safewake ep_call_nested ep_poll_wakeup_proc __wake_up_common ep_poll_callback [...] 0) Well, my current feeling is that there's nothing wrong with the locking (otherwise we would have gotten an actual lockup, I suppose) but lockdep is only missing "annotations" to properly evaluate the locking here. 1) See, the flow of locking when going into a nested eventpoll situation is: sys_epoll_ctl <-sysenter_do_call mutex_lock_nested <-sys_epoll_ctl mutex_lock(&epmutex); ep_call_nested <-sys_epoll_ctl ep_loop_check_proc <-ep_call_nested mutex_lock_nested <-ep_loop_check_proc mutex_lock(&ep->mtx) mutex_unlock <-ep_loop_check_proc mutex_unlock(&ep->mtx) mutex_lock_nested <-sys_epoll_ctl mutex_lock(&ep->mtx) ep_eventpoll_poll <-sys_epoll_ctl ep_ptable_queue_proc <-ep_eventpoll_poll ep_call_nested <-ep_eventpoll_poll ep_poll_readyevents_proc <-ep_call_nested ep_scan_ready_list <-ep_poll_readyevents_proc mutex_lock_nested <-ep_scan_ready_list mutex_lock(&ep->mtx) ep_read_events_proc <-ep_scan_ready_list mutex_unlock <-ep_scan_ready_list mutex_unlock(&ep->mtx) mutex_unlock <-sys_epoll_ctl mutex_unlock(&ep->mtx); mutex_unlock <-sys_epoll_ctl mutex_unlock(&epmutex); So when ep_scan_ready_list() grabs &ep_>mtx for the second time (but now from the nested epoll instance we're adding) the code is holding three locks (epmutex once, ep->mtx twice) which apparently is enough to trigger all this lockdep noise. 2) Does anyone now how these lockdep annotations actually work? Created attachment 515683 [details]
Draft patch to add nesting level to mutex locking of "mtx"
0) This is not even compile tested!
1) Assuming this is indeed "only" a lockdep annotation problem, would a patch along these lines be acceptable? Other comments, etc.?
*** Bug 726458 has been marked as a duplicate of this bug. *** After the recent update to: kernel-3.0.0-2.fc16.i386 systemd-32-1.fc16 I do not have this locking message. I think it might be fixed. Thanks (In reply to comment #23) > After the recent update to: > kernel-3.0.0-2.fc16.i386 > systemd-32-1.fc16 > > I do not have this locking message. > I think it might be fixed. It's not fixed. The debug options were turned off starting with the -2.fc16 build, which means the mutex checking isn't done, and no trace is generated. The problem is just hidden now. (In reply to comment #24) > It's not fixed. The debug options were turned off starting with the -2.fc16 > build, which means the mutex checking isn't done, and no trace is generated. > The problem is just hidden now. Let's just record the (most) relevant kernel CONFIG options here: kernel-3.0-0.rc7.git0.1.fc16: $ rpm2cpio ~/Download/kernel-3.0-0.rc7.git0.1.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0-0.rc7.git0.1.fc16.i686 | grep PROVE_LOCKING CONFIG_PROVE_LOCKING=y $ rpm2cpio ~/Download/kernel-3.0-0.rc7.git0.1.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0-0.rc7.git0.1.fc16.i686 | grep DEBUG_LOCK_ALLOC CONFIG_DEBUG_LOCK_ALLOC=y kernel-3.0.0-2.fc16: $ rpm2cpio ~/Download/kernel-3.0.0-2.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0.0-2.fc16.i686 | grep PROVE_LOCKING # CONFIG_PROVE_LOCKING is not set $ rpm2cpio ~/Download/kernel-3.0.0-2.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0.0-2.fc16.i686 | grep DEBUG_LOCK_ALLOC # CONFIG_DEBUG_LOCK_ALLOC is not set Interesting ! And how do we suppose to check this bug? (In reply to comment #26) > Interesting ! > And how do we suppose to check this bug? It's been reported upstream and someone is theoretically working on a patch. When I see one come in, I'll do a local build with the debug options enabled again. It's easy to recreate, so I should be able to tell if the patch fixes the bug or not. there's also the kernel-debug package which still has them enabled. (In reply to comment #27) > It's been reported upstream and someone is theoretically working on a patch. > When I see one come in, I'll do a local build with the debug options enabled > again. 0) Perhaps I'm to blame here, as a few days ago I brought this issue back to LKML, as a result of which this patch was posted on linux-fsdevel: http://marc.info/?l=linux-fsdevel&m=131206504702437&w=2 1) That patch works for me. I suppose we're waiting for Al Viro's fiat or veto. (In reply to comment #29) > (In reply to comment #27) > > It's been reported upstream and someone is theoretically working on a patch. > > When I see one come in, I'll do a local build with the debug options enabled > > again. > > 0) Perhaps I'm to blame here, as a few days ago I brought this issue back to > LKML, as a result of which this patch was posted on linux-fsdevel: > > http://marc.info/?l=linux-fsdevel&m=131206504702437&w=2 > > 1) That patch works for me. I suppose we're waiting for Al Viro's fiat or veto. Ah. That's the patch I was watching for, I was just watching for it on lkml. kernel-3.0.0-3.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.0.0-3.fc16 *** Bug 727757 has been marked as a duplicate of this bug. *** Can confirm, that with kernel-3.0.0-3.fc16.i386 I do not have any locking messages. Thanks for the patch and debug: * Tue Aug 02 2011 Josh Boyer <jwboyer> - Fix epoll recursive lockdep warnings (rhbz 722472) - Turn debug builds back on Can confirm, that with kernel-3.0.0-3.fc16.i686 I do not have any locking messages. Thanks for the patch and debug: * Tue Aug 02 2011 Josh Boyer <jwboyer> - Fix epoll recursive lockdep warnings (rhbz 722472) - Turn debug builds back on Can confirm, that with kernel-3.0.0-3.fc16.i686 I do not have any locking messages. Thanks for the patch and debug: * Tue Aug 02 2011 Josh Boyer <jwboyer> - Fix epoll recursive lockdep warnings (rhbz 722472) - Turn debug builds back on Sorry for 33 and 34 comments. Do not know how to delete them. Package kernel-3.0.0-3.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.0.0-3.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/kernel-3.0.0-3.fc16 then log in and leave karma (feedback). (In reply to comment #35) > Can confirm, that with > kernel-3.0.0-3.fc16.i686 > > I do not have any locking messages. > > Thanks for the patch and debug: > * Tue Aug 02 2011 Josh Boyer <jwboyer> > - Fix epoll recursive lockdep warnings (rhbz 722472) > - Turn debug builds back on I get the locking messages with kernel-3.1.0-0.rc0.git17.1.fc17.x86_64 and systemd-32-1.fc17.x86_64 (Samsung netbook N210, dual-core atom). *** Bug 728005 has been marked as a duplicate of this bug. *** (In reply to comment #38) > (In reply to comment #35) > > Can confirm, that with > > kernel-3.0.0-3.fc16.i686 > > > > I do not have any locking messages. > > > > Thanks for the patch and debug: > > * Tue Aug 02 2011 Josh Boyer <jwboyer> > > - Fix epoll recursive lockdep warnings (rhbz 722472) > > - Turn debug builds back on > > I get the locking messages with kernel-3.1.0-0.rc0.git17.1.fc17.x86_64 and > systemd-32-1.fc17.x86_64 (Samsung netbook N210, dual-core atom). I can also confirm that the same setup but with kernel-2.6.40-4.fc15.x86_64 (never kernels didn't complete boot, had to resort to desperate measures...) the backtrace *does not* show up. (In reply to comment #38) > (In reply to comment #35) > > Can confirm, that with > > kernel-3.0.0-3.fc16.i686 > > > > I do not have any locking messages. > > > > Thanks for the patch and debug: > > * Tue Aug 02 2011 Josh Boyer <jwboyer> > > - Fix epoll recursive lockdep warnings (rhbz 722472) > > - Turn debug builds back on > > I get the locking messages with kernel-3.1.0-0.rc0.git17.1.fc17.x86_64 and > systemd-32-1.fc17.x86_64 (Samsung netbook N210, dual-core atom). This bug is for F16. You're using a rawhide kernel. The rawhide kernel should get the fix picked up soon. kernel-2.6.40-<whatever> won't show this bug because the debug options are turned off in that. You'd need to use kernel-debug-2.6.40-<whatever>. However, the same patch has been applied to F15, so it should be fixed there as well. kernel-2.6.40.3-0.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15 *** Bug 731188 has been marked as a duplicate of this bug. *** *** Bug 731276 has been marked as a duplicate of this bug. *** I can confirm that on F16 (alpha rc5 with updates) with kernel-3.0.1-3 (x86_64) I do not see the lockdep issue any more. kernel-2.6.40.3-0.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. kernel-3.1.0-0.rc3.git0.0.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc3.git0.0.fc16 kernel-3.1.0-0.rc3.git0.0.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. I am seeing the warning under Fedora 16 with kernel 3.1.0-0.rc8.git0.0.fc16.x86_64, booting from EFI. This is a fresh installation of fc16 from livecd after running yum upgrade. Graphical environment doesn't start either, I'm stuck on the console. (In reply to comment #49) > I am seeing the warning under Fedora 16 with kernel > 3.1.0-0.rc8.git0.0.fc16.x86_64, booting from EFI. This is a fresh installation > of fc16 from livecd after running yum upgrade. > > Graphical environment doesn't start either, I'm stuck on the console. Please open a new bug and paste the full text output of what you're seeing. The patch applied to fix this should still be applied, so we want to be sure it is identical. |