State the problem 1. Provide time and date of problem Allways 2. Provide clear and concise problem description as it is understood at the time of escalation In RHEL4U6 you can use the option nss_initgroups_ignoreusers in /etc/ldap.conf to not to look for groups for the specified users. For some reason, if you espeficy the root user in the list of users the service messagebus won't start and won't fail, stopping the boot process if no one cancels it. 3. State specific action requested of SEG The issue could be a deadlock or something similar. I haven't been able to determine it with gdb or sysrq-t, please analyse it see if any workaround can be done. 4. State whether or not a defect in the product is suspected * Provide Bugzilla if one already exists There are many bugzillas regarding this and even a kbase but none of them apply to this particular problem, here a list: https://bugzilla.redhat.com/show_bug.cgi?id=186448 https://bugzilla.redhat.com/show_bug.cgi?id=186527 http://kbase.redhat.com/faq/FAQ_91_10666.shtm Provide supporting info 3. Attach other supporting data Here a sysrq-t: Dec 12 12:00:20 dhcp-1-209 kernel: messagebus S ffffff801f1c0148 0 3286 3283 3289 (NOTLB) Dec 12 12:00:20 dhcp-1-209 kernel: ffffff80154e9eb8 0000000000000282 ffffff801e741a30 000000751d0de040 Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801f1c0030 000000000000d8a5 0003a36af1c9f02d ffffff801ec5a7f0 Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801f1c02c8 ffffff80154a1018 Dec 12 12:00:20 dhcp-1-209 kernel: Call Trace:<ffffffff8011979b>{do_page_fault+616} <ffffffff80133f95>{do_wait+3298} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8012b040>{default_wake_function+0} <ffffffff8013d451>{sys_rt_sigaction+133} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8012b040>{default_wake_function+0} <ffffffff8010d636>{system_call+134} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8010d5b0>{system_call+0} Dec 12 12:00:20 dhcp-1-209 kernel: initlog X ffffff801ee2f640 0 3109 3103 (L-TLB) Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801b43fef8 0000000000000246 ffffff801ee2f640 000000751f398030 Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801f398030 00000000000042c3 0003a3781ad5e6d4 ffffff801ed147f0 Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801f3982c8 ffffff8000000009 Dec 12 12:00:20 dhcp-1-209 kernel: Call Trace:<ffffffff8010ba38>{exit_thread+30} <ffffffff80132efd>{do_exit+3172} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8013300e>{sys_exit_group+0} <ffffffff8010d824>{tracesys+167} Dec 12 12:00:20 dhcp-1-209 kernel: Dec 12 12:00:20 dhcp-1-209 kernel: initlog R running task 0 3289 3286 3290 (NOTLB) Dec 12 12:00:20 dhcp-1-209 kernel: dbus-daemon-1 S ffffffffff5fd000 0 3290 3289 (NOTLB) Dec 12 12:00:20 dhcp-1-209 kernel: ffffff8016733d78 0000000000000282 0000000100000001 0000007400000001 Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801efa47f0 0000000000002ca4 0003a43b0a0cc3e6 ffffff801e9a97f0 Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801efa4a88 ffffffff8012b15b Dec 12 12:00:20 dhcp-1-209 kernel: Call Trace:<ffffffff8012b15b>{__wake_up_sync+74} <ffffffff802947fc>{thread_return+130} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff80295226>{schedule_timeout+252} <ffffffff80167e4b>{find_extend_vma+22} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8012d8f0>{add_wait_queue+18} <ffffffff80145ea3>{do_futex+531} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8012b040>{default_wake_function+0} <ffffffff801462f9>{sys_futex+203} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8010d765>{sysret_signal+56} <ffffffff8010d636>{system_call+134} Dec 12 12:00:20 dhcp-1-209 kernel: <ffffffff8010d5b0>{system_call+0} Here the process tree: service(3283)???messagebus(3286)???initlog(3289)???dbus-daemon-1(3290) 4. Provide issue repro information: 1 - Install a RHEL4U6 2 - Configure /etc/ldap.conf to use a valid ldap server, for instance: host dhcp-1-45 base dc=ver,dc=ifi,dc=cat nss_initgroups_ignoreusers ldap,named,avahi,haldaemon,dbus,root 3 - Configure the /etc/nsswitch.conf to have: passwd: files ldap group: files ldap 4 - Start the service messagebus. I have a machine with this config: dhcp-1-209.fab.redhat.com with root:redhat I have attached a sosreport of this machine but I have the sosreport of my customer with the same issue if needed. Thanks, Ramon Acedo This event sent from IssueTracker by jnansi [SEG - Base OS] issue 142304
Hi, Doing some analysis, the issue doesn't seem to reside on the dbus code, but in the nss_ldap. In fact, the code that was backported for the "nss_initgroups_ignoreusers" option for nss_ldap doesn't seem to be complete. This option just tells nss_ldap to not contact the LDAP server for some users using that option, and return NSS_NOTFOUND; now, the problem comes from when enumerating the groups of these users, we enable a mutex lock but when returning, we don't release it. This means that when using that option, in some specific cases like the dbus system, it will just wait for the lock to be returned... this can be seen in the following stack trace: #1 0x005a53de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0 #2 0x005a200b in _L_mutex_lock_35 () from /lib/tls/libpthread.so.0 #3 0xbfeec84c in ?? () #4 0x00bd4750 in ?? () from /lib/libnss_ldap.so.2 #5 0xbfeec848 in ?? () #6 0x094c1870 in ?? () #7 0xbfeec8e0 in ?? () #8 0x009c950d in _nss_ldap_enter () from /lib/libnss_ldap.so.2 #9 0x009c950d in _nss_ldap_enter () from /lib/libnss_ldap.so.2 #10 0x009ce0ed in _nss_ldap_initgroups_dyn () from /lib/libnss_ldap.so.2 #11 0x003b44ec in internal_getgrouplist () from /lib/tls/libc.so.6 #12 0x003b4695 in getgrouplist () from /lib/tls/libc.so.6 #13 0x080a0c43 in fill_user_info (info=0x94c2f80, uid=Variable "uid" is not available. ) at dbus-sysdeps.c:1511 #14 0x080a3147 in _dbus_user_database_lookup (db=0x94c38d8, uid=4294967295, username=0xbfeecba0, error=0x0) at dbus-userdb.c:135 #15 0x080a3520 in _dbus_user_database_get_username (db=0xfffffffc, username=0x5a9ff4, info=0x5a9ff4, error=0x5a9ff4) at dbus-userdb.c:859 The reason this was happening was that dbus with hal need to/do limit in their policies the hal daemon to be run under either "root" or "haldaemon" If you remove these from the nss_initgroups_ignoreusers or you remove the policies from the hal.conf from the dbus configuration, dbus will run flawlessly. the function that seems to fail is _nss_ldap_initgroups_dyn() as it performs the following: _nss_ldap_initgroups_dyn | | _nss_ldap_enter(); | _nss_ldap_init(); | | if (_nss_ldap_test_initgroups_ignoreuser (user)) { -----> _nss_ldap_leave(); | return NSS_STATUS_NOTFOUND; | } The lock wasn't cleared before leaving the function. Please have a try to the package and let me know... Kind regards, Jose Internal Status set to 'Waiting on Support' This event sent from IssueTracker by jplans issue 142304
Created attachment 294564 [details] nss_ldap-ignoregrp_lock.patch
http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/nss_ldap/nss_ldap/bz429101/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0715.html