Bug 659774

Summary: sssd_nss and sssd_pam segfault in libdbus
Product: [Fedora] Fedora Reporter: Orion Poplawski <orion>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-27 20:34:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sssd.conf none

Description Orion Poplawski 2010-12-03 16:28:03 UTC
Description of problem:

This may be a libdbus issue, but I'm starting by reporting to the crashing application.

Dec  2 00:27:12 aspen kernel: INFO: task nepomukservices:1926 blocked for more than 120 seconds.
Dec  2 00:27:12 aspen kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  2 00:27:12 aspen kernel: nepomukservic D e5955dd4     0  1926   1875 0x00000080
Dec  2 00:27:12 aspen kernel: e5955e0c 00000086 f572c880 e5955dd4 c0a7ed40 c0a7ed40 c0a7ed40 c0a7ed40
Dec  2 00:27:12 aspen kernel: eaa582ac c0a7ed40 c0a7ed40 12450d90 00006829 00000000 00006829 eaa58000
Dec  2 00:27:12 aspen kernel: 00000001 eaa58000 c2048d40 c1ed3ffc e5955e1c c07a1b66 e5955e4c 00000014
Dec  2 00:27:12 aspen kernel: Call Trace:
Dec  2 00:27:12 aspen kernel: [<c07a1b66>] io_schedule+0x5f/0x98
Dec  2 00:27:12 aspen kernel: [<c04a7f7f>] sync_page+0x3f/0x43
Dec  2 00:27:12 aspen kernel: [<c07a1f71>] __wait_on_bit_lock+0x39/0x75
Dec  2 00:27:12 aspen kernel: [<c04a7f40>] ? sync_page+0x0/0x43
Dec  2 00:27:12 aspen kernel: [<c04a7f27>] __lock_page+0x71/0x79
Dec  2 00:27:12 aspen kernel: [<c045473d>] ? wake_bit_function+0x0/0x3c
Dec  2 00:27:12 aspen kernel: [<c04a8b25>] lock_page+0x33/0x36
Dec  2 00:27:12 aspen kernel: [<c04a8b4b>] find_lock_page+0x23/0x3f
Dec  2 00:27:12 aspen kernel: [<c04a8f2e>] filemap_fault+0x186/0x2ea
Dec  2 00:27:12 aspen kernel: [<c04bc0f4>] __do_fault+0x4f/0x442
Dec  2 00:27:12 aspen kernel: [<c042c985>] ? kmap_atomic_prot+0x10f/0x111
Dec  2 00:27:12 aspen kernel: [<c04bd265>] handle_mm_fault+0x561/0xb48
Dec  2 00:27:12 aspen kernel: [<c04e1210>] ? path_put+0x1a/0x1d
Dec  2 00:27:12 aspen kernel: [<c07a5f31>] do_page_fault+0x2b4/0x2ca
Dec  2 00:27:12 aspen kernel: [<c07a5c7d>] ? do_page_fault+0x0/0x2ca
Dec  2 00:27:12 aspen kernel: [<c07a3a6f>] error_code+0x73/0x78
Dec  2 00:27:12 aspen kernel: [<c07a0000>] ? hrtimer_cpu_notify+0x3a/0x162
Dec  2 00:34:22 aspen sssd[be[default]]: Shutting down
Dec  2 00:34:29 aspen sssd[be[default]]: Starting up
Dec  2 00:40:02 aspen kernel: sssd_nss[1293]: segfault at 40 ip 00803c27 sp bf9369dc error 4 in libdbus-1.so.3.4.0[7e0000+45000]
Dec  2 00:40:04 aspen abrt[24042]: saved core dump of pid 1293 (/usr/libexec/sssd/sssd_nss) to /var/spool/abrt/ccpp-1291275603-1293.new/coredump (995328 bytes)
Dec  2 00:40:04 aspen abrtd: Directory 'ccpp-1291275603-1293' creation detected
Dec  2 00:40:06 aspen sssd[nss]: Starting up
Dec  2 00:40:08 aspen abrtd: New crash /var/spool/abrt/ccpp-1291275603-1293, processing

# grep '^Dec  2 04' /var/log/messages
Dec  2 04:26:32 aspen automount[1429]: key "man" not found in map source(s).
Dec  2 04:26:32 aspen automount[1429]: key "man1" not found in map source(s).
Dec  2 04:26:32 aspen automount[1429]: key "man8" not found in map source(s).
Dec  2 04:28:07 aspen kernel: usb 1-3: USB disconnect, address 3
Dec  2 04:29:56 aspen kernel: usb 1-3: new high speed USB device using ehci_hcd and address 4
Dec  2 04:29:56 aspen kernel: usb 1-3: New USB device found, idVendor=0409, idProduct=0058
Dec  2 04:29:56 aspen kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Dec  2 04:29:56 aspen kernel: usb 1-3: Product: USB2.0 Hub Controller
Dec  2 04:29:56 aspen kernel: usb 1-3: Manufacturer: NEC Corporation
Dec  2 04:29:56 aspen kernel: hub 1-3:1.0: USB hub found
Dec  2 04:29:56 aspen kernel: hub 1-3:1.0: 4 ports detected
Dec  2 04:30:07 aspen kernel: sssd_pam[1294]: segfault at c ip 007ef927 sp bfde7020 error 4 in libdbus-1.so.3.4.0[7e0000+45000]
Dec  2 04:30:08 aspen abrt[478]: saved core dump of pid 1294 (/usr/libexec/sssd/sssd_pam) to /var/spool/abrt/ccpp-1291289407-1294.new/coredump (974848 bytes)
Dec  2 04:30:08 aspen abrtd: Directory 'ccpp-1291289407-1294' creation detected
Dec  2 04:30:08 aspen kcheckpass[477]: Authentication failure for lindsey (invoked by uid 1040)
Dec  2 04:30:08 aspen abrtd: New crash /var/spool/abrt/ccpp-1291289407-1294, processing
Dec  2 04:30:08 aspen abrtd: RunApp('/var/spool/abrt/ccpp-1291289407-1294','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log .')
Dec  2 04:30:09 aspen sssd[pam]: Starting up
Dec  2 04:35:38 aspen kernel: usb 1-3: USB disconnect, address 4


(gdb) bt
#0  _dbus_transport_get_is_connected (transport=0x0) at dbus-transport.c:511
#1  0x007ef944 in _dbus_connection_get_is_connected_unlocked (connection=0x93ae5c8, 
    message=0x93ae940, pending_return=0xbf936a5c, timeout_milliseconds=150000)
    at dbus-connection.c:2855
#2  dbus_connection_send_with_reply (connection=0x93ae5c8, message=0x93ae940, 
    pending_return=0xbf936a5c, timeout_milliseconds=150000) at dbus-connection.c:3234
#3  0x0806a501 in sbus_conn_send (conn=0x93ae4d0, msg=0x93ae940, timeout_ms=150000, 
    reply_handler=0x80780d0 <sss_dp_send_acct_callback>, pvt=0x93bba38, pending=0xbf936b14)
    at sbus/sssd_dbus_connection.c:711
#4  0x08077c24 in sss_dp_send_acct_req_create (rctx=0x93aa5b8, callback_memctx=0x93c1280, 
    callback=0x8055860 <nss_cmd_getgrnam_dp_callback>, callback_ctx=0x93c1470, timeout=150000, 
    domain=0x93abda0 "default", fast_reply=true, type=2, opt_name=0x93ba630 "pulse-rt", 
    opt_id=0) at responder/common/responder_dp.c:464
#5  sss_dp_send_acct_req (rctx=0x93aa5b8, callback_memctx=0x93c1280, 
    callback=0x8055860 <nss_cmd_getgrnam_dp_callback>, callback_ctx=0x93c1470, timeout=150000, 
    domain=0x93abda0 "default", fast_reply=true, type=2, opt_name=0x93ba630 "pulse-rt", 
    opt_id=0) at responder/common/responder_dp.c:357
#6  0x0804d44c in check_cache (dctx=0x93c1470, nctx=0x93aa558, res=0x93c1340, req_type=2, 
    opt_name=0x93ba630 "pulse-rt", opt_id=0, callback=0x8055860 <nss_cmd_getgrnam_dp_callback>)
    at responder/nss/nsssrv_cmd.c:468
#7  0x0804fdd3 in nss_cmd_getgrnam_search (dctx=0x93c1470) at responder/nss/nsssrv_cmd.c:1659
#8  0x0805576d in nss_cmd_getgrnam (cctx=0x93c15b8) at responder/nss/nsssrv_cmd.c:1784
#9  0x08074227 in client_recv (ev=0x93a9a48, fde=0x93c1970, flags=1, ptr=0x93c15b8)
    at responder/common/responder_common.c:154
#10 client_fd_handler (ev=0x93a9a48, fde=0x93c1970, flags=1, ptr=0x93c15b8)
    at responder/common/responder_common.c:192
#11 0x00c8c3d3 in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>)
    at tevent_standard.c:309
#12 std_event_loop_once (ev=<value optimized out>, location=<value optimized out>)
    at tevent_standard.c:544
#13 0x00c88fb8 in _tevent_loop_once (ev=<value optimized out>, location=<value optimized out>)
    at tevent.c:490
#14 0x00c8907f in tevent_common_loop_wait (ev=<value optimized out>, 
    location=<value optimized out>) at tevent.c:591
#15 0x00c88cf9 in _tevent_loop_wait (ev=<value optimized out>, location=<value optimized out>)
    at tevent.c:610
#16 0x0806e0bd in server_loop (main_ctx=0x93a9b00) at util/server.c:494
#17 0x0804ca92 in main (argc=4, argv=0xbf937094) at responder/nss/nsssrv.c:264
(gdb) up
#1  0x007ef944 in _dbus_connection_get_is_connected_unlocked (connection=0x93ae5c8, 
    message=0x93ae940, pending_return=0xbf936a5c, timeout_milliseconds=150000)
    at dbus-connection.c:2855
2855      return _dbus_transport_get_is_connected (connection->transport);

(gdb) print connection
$1 = (DBusConnection *) 0x93ae5c8

(gdb) print *connection
$2 = {refcount = {value = 1732079203}, mutex = 0x70756f72, dispatch_mutex = 0x6e632c73, 
  dispatch_cond = 0x6665643d, io_path_mutex = 0x746c7561, io_path_cond = 0x3d6e632c, 
  outgoing_messages = 0x64737973, incoming_messages = 0x62, message_borrowed = 0x93c1938, 
  n_outgoing = 97, n_incoming = 154935000, outgoing_counter = 0x93c17e8, transport = 0x0, 
  watches = 0x0, timeouts = 0x0, filter_list = 0x0, slot_list = {slots = 0x1388e5, 
    n_slots = 38}, pending_replies = 0xe8150c73, client_serial = 0, 
  disconnect_message_link = 0x0, wakeup_main_function = 0, wakeup_main_data = 0x6f282628, 
  free_wakeup_main_data = 0x63656a62, dispatch_status_function = 0x616c6374, 
  dispatch_status_data = 0x673d7373, free_dispatch_status_data = 0x70756f72, 
  last_dispatch_status = 1634609193, link_cache = 0x703d656d, objects = 0x65736c75, 
  server_guid = 0x2974722d <Address 0x2974722d out of bounds>, dispatch_acquired = 41, 
  io_path_acquired = 184, shareable = 0, exit_on_disconnect = 0, route_peer_messages = 0, 
  disconnected_message_arrived = 0, disconnected_message_processed = 1, 
  have_connection_lock = 1, generation = 12386352}

Version-Release number of selected component (if applicable):
sssd-1.3.0-38.fc13.i686
dbus-libs-1.2.24-1.fc13.i686


How reproducible:
First time I've seen this.

This is a desktop machine with nfs mounted home directory, may have been some nfs issues leading to the "hung task" message.

Comment 1 Stephen Gallagher 2010-12-03 16:49:27 UTC
Hmm, I'm not sure what could be causing this.
server_guid = 0x2974722d <Address 0x2974722d out of bounds>

That definitely looks suspect. But it would imply that somehow our connection object had been corrupted.

Can you reproduce this issue? I'm not sure where to even begin. The related code here has been stable since SSSD pre-1.0 and hasn't been touched in all that time. The only thing I can come up with is that we have an overrun or an early free happening somewhere.

Without being able to reproduce it, I doubt I'll be able to track this down.

Comment 2 Orion Poplawski 2010-12-03 18:07:38 UTC
At the moment I have no idea how to reproduce it either.  I'll see if it happens again.

Comment 3 Stephen Gallagher 2010-12-03 18:59:20 UTC
Could you share your (sanitized) sssd.conf?

Comment 4 Orion Poplawski 2010-12-03 19:55:48 UTC
Created attachment 464636 [details]
sssd.conf

Comment 5 Stephen Gallagher 2011-01-27 20:34:21 UTC
I'm closing this bug for now until and unless you hit it again. Feel free to reopen it if you can reproduce the issue.