Description of problem: This may be a libdbus issue, but I'm starting by reporting to the crashing application. Dec 2 00:27:12 aspen kernel: INFO: task nepomukservices:1926 blocked for more than 120 seconds. Dec 2 00:27:12 aspen kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 2 00:27:12 aspen kernel: nepomukservic D e5955dd4 0 1926 1875 0x00000080 Dec 2 00:27:12 aspen kernel: e5955e0c 00000086 f572c880 e5955dd4 c0a7ed40 c0a7ed40 c0a7ed40 c0a7ed40 Dec 2 00:27:12 aspen kernel: eaa582ac c0a7ed40 c0a7ed40 12450d90 00006829 00000000 00006829 eaa58000 Dec 2 00:27:12 aspen kernel: 00000001 eaa58000 c2048d40 c1ed3ffc e5955e1c c07a1b66 e5955e4c 00000014 Dec 2 00:27:12 aspen kernel: Call Trace: Dec 2 00:27:12 aspen kernel: [<c07a1b66>] io_schedule+0x5f/0x98 Dec 2 00:27:12 aspen kernel: [<c04a7f7f>] sync_page+0x3f/0x43 Dec 2 00:27:12 aspen kernel: [<c07a1f71>] __wait_on_bit_lock+0x39/0x75 Dec 2 00:27:12 aspen kernel: [<c04a7f40>] ? sync_page+0x0/0x43 Dec 2 00:27:12 aspen kernel: [<c04a7f27>] __lock_page+0x71/0x79 Dec 2 00:27:12 aspen kernel: [<c045473d>] ? wake_bit_function+0x0/0x3c Dec 2 00:27:12 aspen kernel: [<c04a8b25>] lock_page+0x33/0x36 Dec 2 00:27:12 aspen kernel: [<c04a8b4b>] find_lock_page+0x23/0x3f Dec 2 00:27:12 aspen kernel: [<c04a8f2e>] filemap_fault+0x186/0x2ea Dec 2 00:27:12 aspen kernel: [<c04bc0f4>] __do_fault+0x4f/0x442 Dec 2 00:27:12 aspen kernel: [<c042c985>] ? kmap_atomic_prot+0x10f/0x111 Dec 2 00:27:12 aspen kernel: [<c04bd265>] handle_mm_fault+0x561/0xb48 Dec 2 00:27:12 aspen kernel: [<c04e1210>] ? path_put+0x1a/0x1d Dec 2 00:27:12 aspen kernel: [<c07a5f31>] do_page_fault+0x2b4/0x2ca Dec 2 00:27:12 aspen kernel: [<c07a5c7d>] ? do_page_fault+0x0/0x2ca Dec 2 00:27:12 aspen kernel: [<c07a3a6f>] error_code+0x73/0x78 Dec 2 00:27:12 aspen kernel: [<c07a0000>] ? hrtimer_cpu_notify+0x3a/0x162 Dec 2 00:34:22 aspen sssd[be[default]]: Shutting down Dec 2 00:34:29 aspen sssd[be[default]]: Starting up Dec 2 00:40:02 aspen kernel: sssd_nss[1293]: segfault at 40 ip 00803c27 sp bf9369dc error 4 in libdbus-1.so.3.4.0[7e0000+45000] Dec 2 00:40:04 aspen abrt[24042]: saved core dump of pid 1293 (/usr/libexec/sssd/sssd_nss) to /var/spool/abrt/ccpp-1291275603-1293.new/coredump (995328 bytes) Dec 2 00:40:04 aspen abrtd: Directory 'ccpp-1291275603-1293' creation detected Dec 2 00:40:06 aspen sssd[nss]: Starting up Dec 2 00:40:08 aspen abrtd: New crash /var/spool/abrt/ccpp-1291275603-1293, processing # grep '^Dec 2 04' /var/log/messages Dec 2 04:26:32 aspen automount[1429]: key "man" not found in map source(s). Dec 2 04:26:32 aspen automount[1429]: key "man1" not found in map source(s). Dec 2 04:26:32 aspen automount[1429]: key "man8" not found in map source(s). Dec 2 04:28:07 aspen kernel: usb 1-3: USB disconnect, address 3 Dec 2 04:29:56 aspen kernel: usb 1-3: new high speed USB device using ehci_hcd and address 4 Dec 2 04:29:56 aspen kernel: usb 1-3: New USB device found, idVendor=0409, idProduct=0058 Dec 2 04:29:56 aspen kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Dec 2 04:29:56 aspen kernel: usb 1-3: Product: USB2.0 Hub Controller Dec 2 04:29:56 aspen kernel: usb 1-3: Manufacturer: NEC Corporation Dec 2 04:29:56 aspen kernel: hub 1-3:1.0: USB hub found Dec 2 04:29:56 aspen kernel: hub 1-3:1.0: 4 ports detected Dec 2 04:30:07 aspen kernel: sssd_pam[1294]: segfault at c ip 007ef927 sp bfde7020 error 4 in libdbus-1.so.3.4.0[7e0000+45000] Dec 2 04:30:08 aspen abrt[478]: saved core dump of pid 1294 (/usr/libexec/sssd/sssd_pam) to /var/spool/abrt/ccpp-1291289407-1294.new/coredump (974848 bytes) Dec 2 04:30:08 aspen abrtd: Directory 'ccpp-1291289407-1294' creation detected Dec 2 04:30:08 aspen kcheckpass[477]: Authentication failure for lindsey (invoked by uid 1040) Dec 2 04:30:08 aspen abrtd: New crash /var/spool/abrt/ccpp-1291289407-1294, processing Dec 2 04:30:08 aspen abrtd: RunApp('/var/spool/abrt/ccpp-1291289407-1294','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log .') Dec 2 04:30:09 aspen sssd[pam]: Starting up Dec 2 04:35:38 aspen kernel: usb 1-3: USB disconnect, address 4 (gdb) bt #0 _dbus_transport_get_is_connected (transport=0x0) at dbus-transport.c:511 #1 0x007ef944 in _dbus_connection_get_is_connected_unlocked (connection=0x93ae5c8, message=0x93ae940, pending_return=0xbf936a5c, timeout_milliseconds=150000) at dbus-connection.c:2855 #2 dbus_connection_send_with_reply (connection=0x93ae5c8, message=0x93ae940, pending_return=0xbf936a5c, timeout_milliseconds=150000) at dbus-connection.c:3234 #3 0x0806a501 in sbus_conn_send (conn=0x93ae4d0, msg=0x93ae940, timeout_ms=150000, reply_handler=0x80780d0 <sss_dp_send_acct_callback>, pvt=0x93bba38, pending=0xbf936b14) at sbus/sssd_dbus_connection.c:711 #4 0x08077c24 in sss_dp_send_acct_req_create (rctx=0x93aa5b8, callback_memctx=0x93c1280, callback=0x8055860 <nss_cmd_getgrnam_dp_callback>, callback_ctx=0x93c1470, timeout=150000, domain=0x93abda0 "default", fast_reply=true, type=2, opt_name=0x93ba630 "pulse-rt", opt_id=0) at responder/common/responder_dp.c:464 #5 sss_dp_send_acct_req (rctx=0x93aa5b8, callback_memctx=0x93c1280, callback=0x8055860 <nss_cmd_getgrnam_dp_callback>, callback_ctx=0x93c1470, timeout=150000, domain=0x93abda0 "default", fast_reply=true, type=2, opt_name=0x93ba630 "pulse-rt", opt_id=0) at responder/common/responder_dp.c:357 #6 0x0804d44c in check_cache (dctx=0x93c1470, nctx=0x93aa558, res=0x93c1340, req_type=2, opt_name=0x93ba630 "pulse-rt", opt_id=0, callback=0x8055860 <nss_cmd_getgrnam_dp_callback>) at responder/nss/nsssrv_cmd.c:468 #7 0x0804fdd3 in nss_cmd_getgrnam_search (dctx=0x93c1470) at responder/nss/nsssrv_cmd.c:1659 #8 0x0805576d in nss_cmd_getgrnam (cctx=0x93c15b8) at responder/nss/nsssrv_cmd.c:1784 #9 0x08074227 in client_recv (ev=0x93a9a48, fde=0x93c1970, flags=1, ptr=0x93c15b8) at responder/common/responder_common.c:154 #10 client_fd_handler (ev=0x93a9a48, fde=0x93c1970, flags=1, ptr=0x93c15b8) at responder/common/responder_common.c:192 #11 0x00c8c3d3 in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at tevent_standard.c:309 #12 std_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at tevent_standard.c:544 #13 0x00c88fb8 in _tevent_loop_once (ev=<value optimized out>, location=<value optimized out>) at tevent.c:490 #14 0x00c8907f in tevent_common_loop_wait (ev=<value optimized out>, location=<value optimized out>) at tevent.c:591 #15 0x00c88cf9 in _tevent_loop_wait (ev=<value optimized out>, location=<value optimized out>) at tevent.c:610 #16 0x0806e0bd in server_loop (main_ctx=0x93a9b00) at util/server.c:494 #17 0x0804ca92 in main (argc=4, argv=0xbf937094) at responder/nss/nsssrv.c:264 (gdb) up #1 0x007ef944 in _dbus_connection_get_is_connected_unlocked (connection=0x93ae5c8, message=0x93ae940, pending_return=0xbf936a5c, timeout_milliseconds=150000) at dbus-connection.c:2855 2855 return _dbus_transport_get_is_connected (connection->transport); (gdb) print connection $1 = (DBusConnection *) 0x93ae5c8 (gdb) print *connection $2 = {refcount = {value = 1732079203}, mutex = 0x70756f72, dispatch_mutex = 0x6e632c73, dispatch_cond = 0x6665643d, io_path_mutex = 0x746c7561, io_path_cond = 0x3d6e632c, outgoing_messages = 0x64737973, incoming_messages = 0x62, message_borrowed = 0x93c1938, n_outgoing = 97, n_incoming = 154935000, outgoing_counter = 0x93c17e8, transport = 0x0, watches = 0x0, timeouts = 0x0, filter_list = 0x0, slot_list = {slots = 0x1388e5, n_slots = 38}, pending_replies = 0xe8150c73, client_serial = 0, disconnect_message_link = 0x0, wakeup_main_function = 0, wakeup_main_data = 0x6f282628, free_wakeup_main_data = 0x63656a62, dispatch_status_function = 0x616c6374, dispatch_status_data = 0x673d7373, free_dispatch_status_data = 0x70756f72, last_dispatch_status = 1634609193, link_cache = 0x703d656d, objects = 0x65736c75, server_guid = 0x2974722d <Address 0x2974722d out of bounds>, dispatch_acquired = 41, io_path_acquired = 184, shareable = 0, exit_on_disconnect = 0, route_peer_messages = 0, disconnected_message_arrived = 0, disconnected_message_processed = 1, have_connection_lock = 1, generation = 12386352} Version-Release number of selected component (if applicable): sssd-1.3.0-38.fc13.i686 dbus-libs-1.2.24-1.fc13.i686 How reproducible: First time I've seen this. This is a desktop machine with nfs mounted home directory, may have been some nfs issues leading to the "hung task" message.
Hmm, I'm not sure what could be causing this. server_guid = 0x2974722d <Address 0x2974722d out of bounds> That definitely looks suspect. But it would imply that somehow our connection object had been corrupted. Can you reproduce this issue? I'm not sure where to even begin. The related code here has been stable since SSSD pre-1.0 and hasn't been touched in all that time. The only thing I can come up with is that we have an overrun or an early free happening somewhere. Without being able to reproduce it, I doubt I'll be able to track this down.
At the moment I have no idea how to reproduce it either. I'll see if it happens again.
Could you share your (sanitized) sssd.conf?
Created attachment 464636 [details] sssd.conf
I'm closing this bug for now until and unless you hit it again. Feel free to reopen it if you can reproduce the issue.