Bug 722472

Summary: [ INFO: possible recursive locking detected ] 3.0-0.rc7.git0.1.fc16.i686
Product: [Fedora] Fedora Reporter: sd.domrep
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: andrewclarkii, aquini, awilliam, che666, dmach, gansalmon, gbauman, gtmkramer, itamar, jonathan, jwboyer, karo1170, kernel-maint, lpoetter, madhu.chinakonda, mads, mailings, mikko.tiihonen, pebolle, rjones, samuel-rhbugs, sankarshan, satellitgo, smooge, tflink, vonbrand
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: https://fedoraproject.org/wiki/Common_F16_bugs#kernel-lockdep
Fixed In Version: kernel-3.1.0-0.rc3.git0.0.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-18 02:33:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
systemd-logind debugging with log_info()
none
Draft patch to add nesting level to mutex locking of "mtx" none

Description sd.domrep 2011-07-15 13:10:43 UTC
Description of problem:

Updated and got the following in dmesg:

[   37.653381] =============================================
[   37.654015] [ INFO: possible recursive locking detected ]
[   37.654015] 3.0-0.rc7.git0.1.fc16.i686 #1
[   37.654015] ---------------------------------------------
[   37.654015] systemd-logind/651 is trying to acquire lock:
[   37.654015]  (&ep->mtx){+.+.+.}, at: [<c05285f1>] ep_scan_ready_list+0x32/0x154
[   37.654015]
[   37.654015] but task is already holding lock:
[   37.654015]  (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
[   37.654015]
[   37.654015] other info that might help us debug this:
[   37.654015]  Possible unsafe locking scenario:
[   37.654015]
[   37.654015]        CPU0
[   37.654015]        ----
[   37.654015]   lock(&ep->mtx);
[   37.654015]   lock(&ep->mtx);
[   37.654015]
[   37.654015]  *** DEADLOCK ***
[   37.654015]
[   37.654015]  May be due to missing lock nesting notation
[   37.654015]
[   37.654015] 2 locks held by systemd-logind/651:
[   37.654015]  #0:  (epmutex){+.+.+.}, at: [<c0528a4b>] sys_epoll_ctl+0xbe/0x481
[   37.654015]  #1:  (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
[   37.654015]
[   37.654015] stack backtrace:
[   37.654015] Pid: 651, comm: systemd-logind Not tainted 3.0-0.rc7.git0.1.fc16.i686 #1
[   37.654015] Call Trace:
[   37.654015]  [<c08490fe>] ? printk+0x2d/0x2f
[   37.654015]  [<c046b2ef>] __lock_acquire+0x811/0xb63
[   37.654015]  [<c0407c77>] ? sched_clock+0x8/0xb
[   37.654015]  [<c045d190>] ? sched_clock_local+0x10/0x18b
[   37.654015]  [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
[   37.654015]  [<c046ba5e>] lock_acquire+0xad/0xe4
[   37.654015]  [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
[   37.654015]  [<c08506bd>] __mutex_lock_common+0x49/0x2ee
[   37.654015]  [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
[   37.654015]  [<c04332e6>] ? __might_sleep+0x29/0xfb
[   37.654015]  [<c046a912>] ? mark_lock+0x26/0x1f2
[   37.654015]  [<c0850a7c>] mutex_lock_nested+0x43/0x49
[   37.654015]  [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
[   37.654015]  [<c05285f1>] ep_scan_ready_list+0x32/0x154
[   37.654015]  [<c05281cb>] ? ep_remove+0x9b/0x9b
[   37.654015]  [<c0528727>] ep_poll_readyevents_proc+0x14/0x16
[   37.654015]  [<c05283d6>] ep_call_nested.constprop.2+0x6d/0x9a
[   37.654015]  [<c0528713>] ? ep_scan_ready_list+0x154/0x154
[   37.654015]  [<c05284d2>] ep_eventpoll_poll+0x45/0x55
[   37.654015]  [<c0528b8c>] sys_epoll_ctl+0x1ff/0x481
[   37.654015]  [<c05282fb>] ? ep_send_events_proc+0xd5/0xd5
[   37.654015]  [<c08521ac>] syscall_call+0x7/0xb

Version-Release number of selected component (if applicable):

kernel 3.0-0.rc7.git0.1.fc16.i686 

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Michal Schmidt 2011-07-18 11:53:19 UTC
*** Bug 722401 has been marked as a duplicate of this bug. ***

Comment 2 Stephen John Smoogen 2011-07-18 15:42:47 UTC
I have been trying to track down a set of crashes with 3.0 systems... the system prints this and gets slower and slower until it "locks" up completely. Problem does not occur on 2.6.38.8-32 kernel. Not sure what it causing this between the two.

Comment 3 Stephen John Smoogen 2011-07-18 16:43:54 UTC
Going through rawhide kernels from 2.6.39 onward.. a "deadlock" issue similar to this seems to start happening between 2.6.39-1 and 3.0.0-rc1. Not sure how to debug beyond that.

Comment 4 Paul Bolle 2011-07-21 12:07:54 UTC
0) On 2.6.39 I got only a warning:

=============================================
[ INFO: possible recursive locking detected ]
2.6.39-0.local9.fc16.i686 #1
---------------------------------------------
systemd-logind/807 is trying to acquire lock:
 (&ep->mtx){+.+.+.}, at: [<c0524a05>] ep_scan_ready_list+0x32/0x154

but task is already holding lock:
 (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481

other info that might help us debug this:
2 locks held by systemd-logind/807:
 #0:  (epmutex){+.+.+.}, at: [<c0524e5f>] sys_epoll_ctl+0xbe/0x481
 #1:  (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481

stack backtrace:
Pid: 807, comm: systemd-logind Not tainted 2.6.39-0.local9.fc16.i686 #1
Call Trace:
 [<c080af85>] ? printk+0x2d/0x2f
 [<c04690b7>] __lock_acquire+0x78f/0xae1
 [<c040790c>] ? sched_clock+0x8/0xb
 [<c045b858>] ? sched_clock_local+0x10/0x18b
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c046981e>] lock_acquire+0xbc/0xdc
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c08127f3>] __mutex_lock_common+0x4a/0x2f0
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c0432502>] ? __might_sleep+0x29/0xfb
 [<c0466a50>] ? trace_hardirqs_off+0xb/0xd
 [<c0812b4e>] mutex_lock_nested+0x39/0x3e
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c0524a05>] ep_scan_ready_list+0x32/0x154
 [<c05245df>] ? ep_remove+0x9b/0x9b
 [<c0524b3b>] ep_poll_readyevents_proc+0x14/0x16
 [<c05247ea>] ep_call_nested.constprop.2+0x6d/0x9a
 [<c0524b27>] ? ep_scan_ready_list+0x154/0x154
 [<c05248e6>] ep_eventpoll_poll+0x45/0x55
 [<c0524fa0>] sys_epoll_ctl+0x1ff/0x481
 [<c052470f>] ? ep_send_events_proc+0xd5/0xd5
 [<c0819fdf>] sysenter_do_call+0x12/0x38

1) I'm not sure what causes the difference between an actual deadlock and (just) the above warning. Could it the inherent hit or miss aspect of a race condition?

2) Anyhow, what seems to trigger this is the recent upgrade to a new version
of systemd (systemd-30-1.fc16.i686), as I do not recall seeing this before.

Comment 5 Richard W.M. Jones 2011-07-22 12:29:07 UTC
I'm seeing this on x86_64.

Unfortunately it makes the system unusable as I'm unable to
log in or ssh to the machine.

Comment 6 Michal Schmidt 2011-07-22 15:06:28 UTC
*** Bug 724952 has been marked as a duplicate of this bug. ***

Comment 7 Richard W.M. Jones 2011-07-22 15:20:46 UTC
I verified that you can workaround this bug by downgrading
to systemd-29 from:

http://koji.fedoraproject.org/koji/buildinfo?buildID=247951

(It doesn't appear to be a systemd bug, this is just a workaround).

Comment 8 Paul Bolle 2011-07-22 15:36:44 UTC
(In reply to comment #7)
> I verified that you can workaround this bug by downgrading
> to systemd-29 from:

0) Thanks.

> (It doesn't appear to be a systemd bug, this is just a workaround).

1) I've always thought that kernel bugs triggered by user space are still kernel bugs. Or is that too simplistic?

Anyhow, perhaps the people working on systemd could somehow indicate what changed in their use of this epoll stuff (which I'm unfamiliar with). That might help in pinpointing the cause of this bug.

Comment 9 Richard W.M. Jones 2011-07-22 15:57:05 UTC
No argument -- this is a kernel bug.  It's just that systemd-29
doesn't happen to trigger it.

CC-ing Lennart for any insight into what changed in the use
of epoll between -29 and -30.

Comment 10 Paul Bolle 2011-07-22 21:46:44 UTC
0) Trying to parse the code in fs/eventpoll.c I came up with this rough line of events that sort of matches the backtraces (all numbers are linenumbers in fs/eventpoll.c):

1347:sys_poll_ctl()
1412:    mutex_lock()
1426:    ep_insert()
             |
 921:    ep_insert()
 957:        tfile->f_op->poll()
                 |
 676:        ep_eventpoll_poll()
 690:            ep_call_nested()
 691:                ep_poll_readyevents_proc()
                     ||
 347:            ep_call_nested()
 671:                ep_poll_readyevents_proc()
 673:                    ep_scan_ready_list()
                             |
 470:                    ep_scan_ready_list()
 484:                        mutex_lock()       <= deadlock here

1) Does this make sense?

Comment 11 Paul Bolle 2011-07-23 18:53:11 UTC
0) I've grepped through systemd's source a bit.

1) eg grep -n epoll_ctl src/logind*
src/logind-session.c:819:                if (epoll_ctl(s->manager->epoll_fd, EPOLL_CTL_ADD, s->fifo_fd, &ev) < 0)
src/logind-session.c:836:                assert_se(epoll_ctl(s->manager->epoll_fd, EPOLL_CTL_DEL, s->fifo_fd, NULL) == 0);
src/logind.c:888:        if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->bus_fd, &ev) < 0)
src/logind.c:915:        if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->console_active_fd, &ev) < 0)
src/logind.c:954:        if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_seat_fd, &ev) < 0)
src/logind.c:975:        if (epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_vcsa_fd, &ev) < 0)

(Note that, for whatever reason, only s->fifo_fd will receive a EPOLL_CTL_DEL.)

2) s->fifo_fd seems to be an ordinary fd:
src/logind-session.c:807:                s->fifo_fd = open(s->fifo_path, O_RDONLY|O_CLOEXEC|O_NDELAY);

3) m->console_active_fd seems to be an ordinary fd too:
src/logind.c:905:        m->console_active_fd = open("/sys/class/tty/tty0/active", O_RDONLY|O_NOCTTY|O_CLOEXEC);

4) m->udev_seat_fd also seems to an ordinary fd (but for an udev related socket):
src/logind.c:929:        m->udev_seat_monitor = udev_monitor_new_from_netlink(m->udev, "udev");
src/logind.c:945:        m->udev_seat_fd = udev_monitor_get_fd(m->udev_seat_monitor);

5) Ditto m->udev_vcsa_fd:
src/logind.c:957:        m->udev_vcsa_monitor = udev_monitor_new_from_netlink(m->udev, "udev");
src/logind.c:969:        m->udev_vcsa_fd = udev_monitor_get_fd(m->udev_vcsa_monitor);

6) However, m->bus_fd seems to be a file descriptor referring to an epoll instance:
src/logind.c:878:        m->bus_fd = bus_loop_open(m->bus);

Interesting is that bus_loop_open() returns this fd:
src/dbus-loop.c:222:        fd = epoll_create1(EPOLL_CLOEXEC);

7) It's not clear why systemd-logind wants to epoll on another epoll instance. Although I really know next to nothing about the epoll interface, that does look like something special to do.

But, whatever are the reasons to do that, wouldn't epolling on another epoll instance generated by the same process walk into the deadlock I tried to describe in comment #10? (Since polling on an epoll instance seems to invoke ep_eventpoll_poll.)

8) Feedback from people familiar with epoll and/or systemd-logind is appreciated.

Comment 12 Paul Bolle 2011-07-23 19:35:09 UTC
0) Note that the bus_fd code I described briefly in comment #11 was introduced in systemd's commit
    2026308 logind: first version that compiles fine

which was introduced after v29 (and so, was included in v30) which neatly fits the observation of comment #7 that a workaround for this issue is to downgrade to systemd-29.

1) Feedback from people familiar with systemd-logind would be very much appreciated. I'd really like to know whether I'm barking up the right tree here.

Comment 13 Richard W.M. Jones 2011-07-24 09:23:58 UTC
(In reply to comment #11)
> 7) It's not clear why systemd-logind wants to epoll on another epoll instance.
> Although I really know next to nothing about the epoll interface, that does
> look like something special to do.
> 
> But, whatever are the reasons to do that, wouldn't epolling on another epoll
> instance generated by the same process walk into the deadlock I tried to
> describe in comment #10? (Since polling on an epoll instance seems to invoke
> ep_eventpoll_poll.)

Can this be escalated into some sort of DoS attack on
the kernel?  If so we may need to alert the security folk.
Bit busy now but I'll play around with this on Monday.

Comment 14 Paul Bolle 2011-07-24 11:46:05 UTC
(In reply to comment #13)
> Can this be escalated into some sort of DoS attack on
> the kernel?

0) I wouldn't know. I haven't really been able to pinpoint the problem, let alone write the minimal case that reproduces this behaviour.

1) Currently I'm wondering whether src/dbus-loop.c:bus_loop_open() could be the source of all this:

int bus_loop_open(DBusConnection *c) {
        int fd;

        [...];

        fd = epoll_create1(EPOLL_CLOEXEC);
        if (fd < 0)
                return -errno;

        if (!dbus_connection_set_watch_functions(c, add_watch, remove_watch, toggle_watch, INT_TO_PTR(fd), NULL) ||
            !dbus_connection_set_timeout_functions(c, add_timeout, remove_timeout, toggle_timeout, INT_TO_PTR(fd), NULL)) {
                close_nointr_nofail(fd);
                return -ENOMEM;
        }

        return fd;
}

The point is that add_watch(), remove_watch(), toggle_watch(), add_timeout(), and remove_timeout() here all are passed a pointer to "fd", which only is valid within bus_loop_open() itself. But these five functions all dereference that pointer to use it as the fd of an epoll instance in their calls of epoll_ctl(). Doesn't that risk using basically a random number instead of a valid fd of an epoll instance?

2) But I must admit that I hardly understand both systemd-logind and the epoll interface.

Comment 15 Paul Bolle 2011-07-25 08:49:43 UTC
0) Public service announcement: just because the kernel prints
    *** DEADLOCK ***

one shouldn't assume that logging in doesn't work any more (as it works just fine). That message is apparently added to stop easily fooled people, like me, from running that kernel.

1) Public service announcement: printf debugging of systemd-logind is easy. Basically one has to:
- rebuild systemd's source rpm;
- add log_info() calls wherever one likes (in our case mainly in src/loging* and src/dbus*);
- do a make from the root of the build tree;
- backup /lib/systemd/systemd-logind;
- copy systemd-logind (from the root directory of the build tree) to /lib/systemd/systemd-logind.

2) Interesting part of the log:
systemd-logind[771]: src/logind.c:1069: epoll_create1(EPOLL_CLOEXEC)
systemd-logind[771]: src/logind.c:955: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_seat_fd, &ev)
systemd-logind[771]: src/logind.c:977: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->udev_vcsa_fd, &ev)
systemd-logind[771]: src/logind.c:915: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->console_active_fd, &ev)
systemd-logind[771]: src/dbus-loop.c:228: epoll_create1(EPOLL_CLOEXEC)
systemd-logind[771]: src/dbus-loop.c:66: epoll_ctl(PTR_TO_INT(data), EPOLL_CTL_ADD, e->fd, &ev)
systemd-logind[771]: src/dbus-loop.c:66: epoll_ctl(PTR_TO_INT(data), EPOLL_CTL_ADD, e->fd, &ev)
systemd-logind[771]: src/dbus-loop.c:85: epoll_ctl(PTR_TO_INT(data), EPOLL_CTL_ADD, e->fd, &ev)
systemd-logind[771]: src/logind.c:887: epoll_ctl(m->epoll_fd, EPOLL_CTL_ADD, m->bus_fd, &ev)
kernel: [   46.645737] 
kernel: [   46.645743] =============================================
kernel: [   46.646034] [ INFO: possible recursive locking detected ]
kernel: [   46.646034] 3.0.0-1.fc16.i686 #1
kernel: [   46.646034] ---------------------------------------------
kernel: [   46.646034] systemd-logind/771 is trying to acquire lock:
kernel: [   46.646034]  (&ep->mtx){+.+.+.}, at: [<c052a355>] ep_scan_ready_list+0x32/0x154
kernel: [   46.646034] 
kernel: [   46.646034] but task is already holding lock:
kernel: [   46.646034]  (&ep->mtx){+.+.+.}, at: [<c052a7f4>] sys_epoll_ctl+0x103/0x481
kernel: [   46.646034] 
kernel: [   46.646034] other info that might help us debug this:
kernel: [   46.646034]  Possible unsafe locking scenario:
kernel: [   46.646034] 
kernel: [   46.646034]        CPU0
kernel: [   46.646034]        ----
kernel: [   46.646034]   lock(&ep->mtx);
kernel: [   46.646034]   lock(&ep->mtx);
kernel: [   46.646034] 
kernel: [   46.646034]  *** DEADLOCK ***
[...]

Full log will be attached.

3) This seems to confirm my idea (in comment #11) of the code that triggers this issue. But fs/eventpoll.c has code to handle nested cases of epolling. Why does it fail here?

Comment 16 Paul Bolle 2011-07-25 08:55:29 UTC
Created attachment 514989 [details]
systemd-logind debugging with log_info()

This is an excerpt from /var/log/messages, containing:
    - systemd-logind's normal output
    - systemd-logind's log_info() messages of all epoll_* calls
    - the kernel lockdep _warning_ that systemd-logind somehow triggers

Comment 17 Michal Schmidt 2011-07-26 09:01:13 UTC
*** Bug 725669 has been marked as a duplicate of this bug. ***

Comment 18 Harald Hoyer 2011-07-27 10:38:16 UTC
*** Bug 725788 has been marked as a duplicate of this bug. ***

Comment 19 Paul Bolle 2011-07-27 16:36:00 UTC
0) Summary: ftrace suggests (like the log_info() debugging attached also did) that this issue occurs after the last (the 7th) call op epoll_ctl() and before the first call op epoll_wait(). So the deadlock seems to occur somewhere in this call chain:

sysenter_do_call
    sys_epoll_ctl
        ep_call_nested
            ep_loop_check_proc
        ep_eventpoll_poll
            ep_ptable_queue_proc
            ep_call_nested
                ep_poll_readyevents_proc
                    ep_scan_ready_list
                        ep_read_events_proc

1) I put some code (copied from Documentation/trace/ftrace.txt) at the start of systemd-logind. It traces all calls of functions starting with ep_* and sys_epoll_c*. Started systemd-logind manually.

2) trace output:

# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
  systemd-logind-1304  [000]    97.715219: sys_epoll_create1 <-sysenter_do_call
  systemd-logind-1304  [000]    97.715965: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.715976: ep_ptable_queue_proc <-datagram_poll
  systemd-logind-1304  [000]    97.716238: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.716245: ep_ptable_queue_proc <-datagram_poll
  systemd-logind-1304  [000]    97.718532: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.718548: ep_ptable_queue_proc <-sysfs_poll
  systemd-logind-1304  [000]    97.730604: sys_epoll_create1 <-sysenter_do_call
  systemd-logind-1304  [000]    97.730798: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.730807: ep_ptable_queue_proc <-sock_poll_wait
  systemd-logind-1304  [000]    97.730972: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.731271: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.731279: ep_ptable_queue_proc <-sock_poll_wait
  systemd-logind-1304  [000]    97.731715: sys_epoll_ctl <-sysenter_do_call
  systemd-logind-1304  [000]    97.731720: ep_call_nested <-sys_epoll_ctl
  systemd-logind-1304  [000]    97.731992: ep_loop_check_proc <-ep_call_nested
  systemd-logind-1304  [000]    97.732006: ep_eventpoll_poll <-sys_epoll_ctl
  systemd-logind-1304  [000]    97.732006: ep_ptable_queue_proc <-ep_eventpoll_poll
  systemd-logind-1304  [000]    97.732058: ep_call_nested <-ep_eventpoll_poll
  systemd-logind-1304  [000]    97.732066: ep_poll_readyevents_proc <-ep_call_nested
  systemd-logind-1304  [000]    97.732067: ep_scan_ready_list <-ep_poll_readyevents_proc
  systemd-logind-1304  [000]    97.739796: ep_read_events_proc <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.827249: ep_scan_ready_list <-sys_epoll_wait
  systemd-logind-1304  [000]    97.827253: ep_send_events_proc <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.858937: ep_scan_ready_list <-sys_epoll_wait
  systemd-logind-1304  [000]    97.858941: ep_send_events_proc <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.858941: ep_eventpoll_poll <-ep_send_events_proc
  systemd-logind-1304  [000]    97.858942: ep_call_nested <-ep_eventpoll_poll
  systemd-logind-1304  [000]    97.858943: ep_poll_readyevents_proc <-ep_call_nested
  systemd-logind-1304  [000]    97.858943: ep_scan_ready_list <-ep_poll_readyevents_proc
  systemd-logind-1304  [000]    97.858944: ep_read_events_proc <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.858947: ep_poll_safewake <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.858947: ep_call_nested <-ep_poll_safewake
  systemd-logind-1304  [000]    97.858948: ep_poll_wakeup_proc <-ep_call_nested
  systemd-logind-1304  [000]    97.858948: ep_poll_callback <-__wake_up_common
  systemd-logind-1304  [000]    97.859128: ep_scan_ready_list <-sys_epoll_wait
  systemd-logind-1304  [000]    97.859129: ep_send_events_proc <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.859131: ep_poll_safewake <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.859131: ep_call_nested <-ep_poll_safewake
  systemd-logind-1304  [000]    97.859132: ep_poll_wakeup_proc <-ep_call_nested
  systemd-logind-1304  [000]    97.859132: ep_poll_callback <-__wake_up_common
  systemd-logind-1304  [000]    97.859357: ep_scan_ready_list <-sys_epoll_wait
  systemd-logind-1304  [000]    97.859359: ep_send_events_proc <-ep_scan_ready_list
  systemd-logind-1304  [000]    97.859359: ep_eventpoll_poll <-ep_send_events_proc
  systemd-logind-1304  [000]    97.859360: ep_call_nested <-ep_eventpoll_poll
  systemd-logind-1304  [000]    97.859360: ep_poll_readyevents_proc <-ep_call_nested
  systemd-logind-1304  [000]    97.859361: ep_scan_ready_list <-ep_poll_readyevents_proc

(Timestamps do not match dmesg's timestamps. Why?)

3) Reformatted output for clarity:

sysenter_do_call
    sys_epoll_create1
sysenter_do_call
    sys_epoll_ctl
datagram_poll
    ep_ptable_queue_proc
sysenter_do_call
    sys_epoll_ctl
datagram_poll
    ep_ptable_queue_proc
sysenter_do_call
    sys_epoll_ctl
sysfs_poll
    ep_ptable_queue_proc
sysenter_do_call
    sys_epoll_create1
sysenter_do_call
    sys_epoll_ctl
sock_poll_wait
    ep_ptable_queue_proc
sysenter_do_call
    sys_epoll_ctl
sysenter_do_call
    sys_epoll_ctl
sock_poll_wait
    ep_ptable_queue_proc
sysenter_do_call
    sys_epoll_ctl
        ep_call_nested
            ep_loop_check_proc
        ep_eventpoll_poll
            ep_ptable_queue_proc
            ep_call_nested
                ep_poll_readyevents_proc
                    ep_scan_ready_list
                        ep_read_events_proc

[ *** DEADLOCK *** warning here? ] 

[sysenter_do_call]
    sys_epoll_wait
        ep_scan_ready_list
            ep_send_events_proc
[sysenter_do_call]
    sys_epoll_wait
        ep_scan_ready_list
            ep_send_events_proc
                ep_eventpoll_poll
                    ep_call_nested
                        ep_poll_readyevents_proc
                            ep_scan_ready_list
                                ep_read_events_proc
                                ep_poll_safewake
                                    ep_call_nested
                                        ep_poll_wakeup_proc
__wake_up_common
    ep_poll_callback
[...]

Comment 20 Paul Bolle 2011-07-27 22:13:59 UTC
0) Well, my current feeling is that there's nothing wrong with the locking (otherwise we would have gotten an actual lockup, I suppose) but lockdep is only missing "annotations" to properly evaluate the locking here.

1) See, the flow of locking when going into a nested eventpoll situation is:

sys_epoll_ctl <-sysenter_do_call
mutex_lock_nested <-sys_epoll_ctl            mutex_lock(&epmutex);
ep_call_nested <-sys_epoll_ctl
ep_loop_check_proc <-ep_call_nested
mutex_lock_nested <-ep_loop_check_proc           mutex_lock(&ep->mtx)
mutex_unlock <-ep_loop_check_proc                mutex_unlock(&ep->mtx)
mutex_lock_nested <-sys_epoll_ctl                mutex_lock(&ep->mtx)
ep_eventpoll_poll <-sys_epoll_ctl
ep_ptable_queue_proc <-ep_eventpoll_poll
ep_call_nested <-ep_eventpoll_poll
ep_poll_readyevents_proc <-ep_call_nested
ep_scan_ready_list <-ep_poll_readyevents_proc
mutex_lock_nested <-ep_scan_ready_list               mutex_lock(&ep->mtx)
ep_read_events_proc <-ep_scan_ready_list
mutex_unlock <-ep_scan_ready_list                    mutex_unlock(&ep->mtx)
mutex_unlock <-sys_epoll_ctl                     mutex_unlock(&ep->mtx);
mutex_unlock <-sys_epoll_ctl                 mutex_unlock(&epmutex);

So when ep_scan_ready_list() grabs &ep_>mtx for the second time (but now from the nested epoll instance we're adding) the code is holding three locks (epmutex once, ep->mtx twice) which apparently is enough to trigger all this lockdep noise.

2) Does anyone now how these lockdep annotations actually work?

Comment 21 Paul Bolle 2011-07-28 10:28:50 UTC
Created attachment 515683 [details]
Draft patch to add nesting level to mutex locking of "mtx"

0) This is not even compile tested!

1) Assuming this is indeed "only" a lockdep annotation problem, would a patch along these lines be acceptable? Other comments, etc.?

Comment 22 Harald Hoyer 2011-07-29 10:34:11 UTC
*** Bug 726458 has been marked as a duplicate of this bug. ***

Comment 23 sd.domrep 2011-08-02 14:54:08 UTC
After the recent update to:
kernel-3.0.0-2.fc16.i386
systemd-32-1.fc16

I do not have this locking message.
I think it might be fixed.

Thanks

Comment 24 Josh Boyer 2011-08-02 16:08:43 UTC
(In reply to comment #23)
> After the recent update to:
> kernel-3.0.0-2.fc16.i386
> systemd-32-1.fc16
> 
> I do not have this locking message.
> I think it might be fixed.

It's not fixed.  The debug options were turned off starting with the -2.fc16 build, which means the mutex checking isn't done, and no trace is generated.  The problem is just hidden now.

Comment 25 Paul Bolle 2011-08-02 16:19:46 UTC
(In reply to comment #24)
> It's not fixed.  The debug options were turned off starting with the -2.fc16
> build, which means the mutex checking isn't done, and no trace is generated. 
> The problem is just hidden now.

Let's just record the (most) relevant kernel CONFIG options here:

kernel-3.0-0.rc7.git0.1.fc16:
$ rpm2cpio ~/Download/kernel-3.0-0.rc7.git0.1.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0-0.rc7.git0.1.fc16.i686 | grep PROVE_LOCKING
CONFIG_PROVE_LOCKING=y
$ rpm2cpio ~/Download/kernel-3.0-0.rc7.git0.1.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0-0.rc7.git0.1.fc16.i686 | grep DEBUG_LOCK_ALLOC
CONFIG_DEBUG_LOCK_ALLOC=y

kernel-3.0.0-2.fc16:
$ rpm2cpio ~/Download/kernel-3.0.0-2.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0.0-2.fc16.i686 | grep PROVE_LOCKING
# CONFIG_PROVE_LOCKING is not set
$ rpm2cpio ~/Download/kernel-3.0.0-2.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-3.0.0-2.fc16.i686 | grep DEBUG_LOCK_ALLOC
# CONFIG_DEBUG_LOCK_ALLOC is not set

Comment 26 sd.domrep 2011-08-02 16:40:40 UTC
Interesting !
And how do we suppose to check this bug?

Comment 27 Josh Boyer 2011-08-02 17:03:04 UTC
(In reply to comment #26)
> Interesting !
> And how do we suppose to check this bug?

It's been reported upstream and someone is theoretically working on a patch.  When I see one come in, I'll do a local build with the debug options enabled again.  It's easy to recreate, so I should be able to tell if the patch fixes the bug or not.

Comment 28 Dave Jones 2011-08-02 17:26:16 UTC
there's also the kernel-debug package which still has them enabled.

Comment 29 Paul Bolle 2011-08-02 17:26:57 UTC
(In reply to comment #27)
> It's been reported upstream and someone is theoretically working on a patch. 
> When I see one come in, I'll do a local build with the debug options enabled
> again.  

0) Perhaps I'm to blame here, as a few days ago I brought this issue back to LKML, as a result of which this patch was posted on linux-fsdevel:

http://marc.info/?l=linux-fsdevel&m=131206504702437&w=2

1) That patch works for me. I suppose we're waiting for Al Viro's fiat or veto.

Comment 30 Josh Boyer 2011-08-02 18:10:24 UTC
(In reply to comment #29)
> (In reply to comment #27)
> > It's been reported upstream and someone is theoretically working on a patch. 
> > When I see one come in, I'll do a local build with the debug options enabled
> > again.  
> 
> 0) Perhaps I'm to blame here, as a few days ago I brought this issue back to
> LKML, as a result of which this patch was posted on linux-fsdevel:
> 
> http://marc.info/?l=linux-fsdevel&m=131206504702437&w=2
> 
> 1) That patch works for me. I suppose we're waiting for Al Viro's fiat or veto.

Ah.  That's the patch I was watching for, I was just watching for it on lkml.

Comment 31 Fedora Update System 2011-08-02 20:38:29 UTC
kernel-3.0.0-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.0.0-3.fc16

Comment 32 Josh Boyer 2011-08-03 12:35:42 UTC
*** Bug 727757 has been marked as a duplicate of this bug. ***

Comment 33 sd.domrep 2011-08-03 14:27:49 UTC
Can confirm, that with 
kernel-3.0.0-3.fc16.i386

I do not have any locking messages.

Thanks for the patch and debug:
* Tue Aug 02 2011 Josh Boyer <jwboyer>
- Fix epoll recursive lockdep warnings (rhbz 722472)
- Turn debug builds back on

Comment 34 sd.domrep 2011-08-03 14:29:08 UTC
Can confirm, that with 
kernel-3.0.0-3.fc16.i686

I do not have any locking messages.

Thanks for the patch and debug:
* Tue Aug 02 2011 Josh Boyer <jwboyer>
- Fix epoll recursive lockdep warnings (rhbz 722472)
- Turn debug builds back on

Comment 35 sd.domrep 2011-08-03 14:30:31 UTC
Can confirm, that with 
kernel-3.0.0-3.fc16.i686

I do not have any locking messages.

Thanks for the patch and debug:
* Tue Aug 02 2011 Josh Boyer <jwboyer>
- Fix epoll recursive lockdep warnings (rhbz 722472)
- Turn debug builds back on

Comment 36 sd.domrep 2011-08-03 14:32:49 UTC
Sorry for 33 and 34 comments. Do not know how to delete them.

Comment 37 Fedora Update System 2011-08-03 19:13:49 UTC
Package kernel-3.0.0-3.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.0.0-3.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-3.0.0-3.fc16
then log in and leave karma (feedback).

Comment 38 Horst H. von Brand 2011-08-03 22:09:32 UTC
(In reply to comment #35)
> Can confirm, that with 
> kernel-3.0.0-3.fc16.i686
> 
> I do not have any locking messages.
> 
> Thanks for the patch and debug:
> * Tue Aug 02 2011 Josh Boyer <jwboyer>
> - Fix epoll recursive lockdep warnings (rhbz 722472)
> - Turn debug builds back on

I get the locking messages with kernel-3.1.0-0.rc0.git17.1.fc17.x86_64 and systemd-32-1.fc17.x86_64 (Samsung netbook N210, dual-core atom).

Comment 39 Horst H. von Brand 2011-08-03 22:10:30 UTC
*** Bug 728005 has been marked as a duplicate of this bug. ***

Comment 40 Horst H. von Brand 2011-08-03 22:14:36 UTC
(In reply to comment #38)
> (In reply to comment #35)
> > Can confirm, that with 
> > kernel-3.0.0-3.fc16.i686
> > 
> > I do not have any locking messages.
> > 
> > Thanks for the patch and debug:
> > * Tue Aug 02 2011 Josh Boyer <jwboyer>
> > - Fix epoll recursive lockdep warnings (rhbz 722472)
> > - Turn debug builds back on
> 
> I get the locking messages with kernel-3.1.0-0.rc0.git17.1.fc17.x86_64 and
> systemd-32-1.fc17.x86_64 (Samsung netbook N210, dual-core atom).

I can also confirm that the same setup but with kernel-2.6.40-4.fc15.x86_64
(never kernels didn't complete boot, had to resort to desperate measures...)
the backtrace *does not* show up.

Comment 41 Josh Boyer 2011-08-04 01:03:54 UTC
(In reply to comment #38)
> (In reply to comment #35)
> > Can confirm, that with 
> > kernel-3.0.0-3.fc16.i686
> > 
> > I do not have any locking messages.
> > 
> > Thanks for the patch and debug:
> > * Tue Aug 02 2011 Josh Boyer <jwboyer>
> > - Fix epoll recursive lockdep warnings (rhbz 722472)
> > - Turn debug builds back on
> 
> I get the locking messages with kernel-3.1.0-0.rc0.git17.1.fc17.x86_64 and
> systemd-32-1.fc17.x86_64 (Samsung netbook N210, dual-core atom).

This bug is for F16.  You're using a rawhide kernel.  The rawhide kernel should get the fix picked up soon.

kernel-2.6.40-<whatever> won't show this bug because the debug options are turned off in that.  You'd need to use kernel-debug-2.6.40-<whatever>.  However, the same patch has been applied to F15, so it should be fixed there as well.

Comment 42 Fedora Update System 2011-08-16 12:47:18 UTC
kernel-2.6.40.3-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15

Comment 43 Josh Boyer 2011-08-16 23:02:26 UTC
*** Bug 731188 has been marked as a duplicate of this bug. ***

Comment 44 Josh Boyer 2011-08-17 11:59:29 UTC
*** Bug 731276 has been marked as a duplicate of this bug. ***

Comment 45 Jurgen Kramer 2011-08-17 13:11:09 UTC
I can confirm that on F16 (alpha rc5 with updates) with kernel-3.0.1-3 (x86_64) I do not see the lockdep issue any more.

Comment 46 Fedora Update System 2011-08-18 02:30:11 UTC
kernel-2.6.40.3-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 47 Fedora Update System 2011-08-23 11:05:15 UTC
kernel-3.1.0-0.rc3.git0.0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc3.git0.0.fc16

Comment 48 Fedora Update System 2011-08-28 05:37:00 UTC
kernel-3.1.0-0.rc3.git0.0.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 49 Gabriel 2011-10-04 21:55:33 UTC
I am seeing the warning under Fedora 16 with kernel 3.1.0-0.rc8.git0.0.fc16.x86_64, booting from EFI. This is a fresh installation of fc16 from livecd after running yum upgrade.

Graphical environment doesn't start either, I'm stuck on the console.

Comment 50 Josh Boyer 2011-10-04 22:01:21 UTC
(In reply to comment #49)
> I am seeing the warning under Fedora 16 with kernel
> 3.1.0-0.rc8.git0.0.fc16.x86_64, booting from EFI. This is a fresh installation
> of fc16 from livecd after running yum upgrade.
> 
> Graphical environment doesn't start either, I'm stuck on the console.

Please open a new bug and paste the full text output of what you're seeing.  The patch applied to fix this should still be applied, so we want to be sure it is identical.