Bug 429101 - dbus-daemon-1 hangs when using the option nss_initgroups_ignoreusers in /etc/ldap.conf with the user root
Summary: dbus-daemon-1 hangs when using the option nss_initgroups_ignoreusers in /etc/...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nss_ldap
Version: 4.6
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Nalin Dahyabhai
QA Contact:
URL:
Whiteboard: GSSApproved
Depends On:
Blocks: 439215
TreeView+ depends on / blocked
 
Reported: 2008-01-17 10:44 UTC by Issue Tracker
Modified: 2018-10-19 22:00 UTC (History)
4 users (show)

Fixed In Version: RHSA-2008-0715
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:55:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nss_ldap-ignoregrp_lock.patch (519 bytes, patch)
2008-02-11 14:55 UTC, Jose Plans
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0715 0 normal SHIPPED_LIVE Low: nss_ldap security and bug fix update 2008-07-24 16:57:16 UTC

Comment 1 Issue Tracker 2008-01-17 10:44:46 UTC
State the problem

   1. Provide time and date of problem

Allways

   2. Provide clear and concise problem description as it is understood at the time of escalation

In RHEL4U6 you can use the option nss_initgroups_ignoreusers in /etc/ldap.conf to not to look for groups for the specified users. For some reason, if you espeficy the root user in the list of users the service messagebus won't start and won't fail, stopping the boot process if no one cancels it.


   3. State specific action requested of SEG

The issue could be a deadlock or something similar. I haven't been able to determine it with gdb or sysrq-t, please analyse it see if any workaround can be done.

   4. State whether or not a defect in the product is suspected
          * Provide Bugzilla if one already exists

There are many bugzillas regarding this and even a kbase but none of them apply to this particular problem, here a list:

 https://bugzilla.redhat.com/show_bug.cgi?id=186448
 https://bugzilla.redhat.com/show_bug.cgi?id=186527
 http://kbase.redhat.com/faq/FAQ_91_10666.shtm

Provide supporting info

   3. Attach other supporting data

Here a sysrq-t:

Dec 12 12:00:20 dhcp-1-209 kernel: messagebus    S ffffff801f1c0148     0  3286   3283  3289               (NOTLB)
Dec 12 12:00:20 dhcp-1-209 kernel: ffffff80154e9eb8 0000000000000282 ffffff801e741a30 000000751d0de040
Dec 12 12:00:20 dhcp-1-209 kernel:        ffffff801f1c0030 000000000000d8a5 0003a36af1c9f02d ffffff801ec5a7f0
Dec 12 12:00:20 dhcp-1-209 kernel:        ffffff801f1c02c8 ffffff80154a1018
Dec 12 12:00:20 dhcp-1-209 kernel: Call Trace:<ffffffff8011979b>{do_page_fault+616} <ffffffff80133f95>{do_wait+3298}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8012b040>{default_wake_function+0} <ffffffff8013d451>{sys_rt_sigaction+133}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8012b040>{default_wake_function+0} <ffffffff8010d636>{system_call+134}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8010d5b0>{system_call+0}
Dec 12 12:00:20 dhcp-1-209 kernel: initlog       X ffffff801ee2f640     0  3109   3103                     (L-TLB)
Dec 12 12:00:20 dhcp-1-209 kernel: ffffff801b43fef8 0000000000000246 ffffff801ee2f640 000000751f398030
Dec 12 12:00:20 dhcp-1-209 kernel:        ffffff801f398030 00000000000042c3 0003a3781ad5e6d4 ffffff801ed147f0
Dec 12 12:00:20 dhcp-1-209 kernel:        ffffff801f3982c8 ffffff8000000009
Dec 12 12:00:20 dhcp-1-209 kernel: Call Trace:<ffffffff8010ba38>{exit_thread+30} <ffffffff80132efd>{do_exit+3172}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8013300e>{sys_exit_group+0} <ffffffff8010d824>{tracesys+167}
Dec 12 12:00:20 dhcp-1-209 kernel:
Dec 12 12:00:20 dhcp-1-209 kernel: initlog       R  running task       0  3289   3286  3290               (NOTLB)
Dec 12 12:00:20 dhcp-1-209 kernel: dbus-daemon-1 S ffffffffff5fd000     0  3290   3289                     (NOTLB)
Dec 12 12:00:20 dhcp-1-209 kernel: ffffff8016733d78 0000000000000282 0000000100000001 0000007400000001
Dec 12 12:00:20 dhcp-1-209 kernel:        ffffff801efa47f0 0000000000002ca4 0003a43b0a0cc3e6 ffffff801e9a97f0
Dec 12 12:00:20 dhcp-1-209 kernel:        ffffff801efa4a88 ffffffff8012b15b
Dec 12 12:00:20 dhcp-1-209 kernel: Call Trace:<ffffffff8012b15b>{__wake_up_sync+74} <ffffffff802947fc>{thread_return+130}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff80295226>{schedule_timeout+252} <ffffffff80167e4b>{find_extend_vma+22}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8012d8f0>{add_wait_queue+18} <ffffffff80145ea3>{do_futex+531}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8012b040>{default_wake_function+0} <ffffffff801462f9>{sys_futex+203}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8010d765>{sysret_signal+56} <ffffffff8010d636>{system_call+134}
Dec 12 12:00:20 dhcp-1-209 kernel:        <ffffffff8010d5b0>{system_call+0}

Here the process tree:

 service(3283)???messagebus(3286)???initlog(3289)???dbus-daemon-1(3290)


   4. Provide issue repro information:

 1 - Install a RHEL4U6
 2 - Configure /etc/ldap.conf to use a valid ldap server, for instance:

        host dhcp-1-45
        base dc=ver,dc=ifi,dc=cat
        nss_initgroups_ignoreusers ldap,named,avahi,haldaemon,dbus,root

 3 - Configure the /etc/nsswitch.conf to have:

        passwd:     files ldap
        group:      files ldap

 4 - Start the service messagebus.

 I have a machine with this config: dhcp-1-209.fab.redhat.com with root:redhat

I have attached a sosreport of this machine but I have the sosreport of my customer with the same issue if needed.

Thanks,

Ramon Acedo

This event sent from IssueTracker by jnansi  [SEG - Base OS]
 issue 142304

Comment 7 Issue Tracker 2008-02-11 14:54:51 UTC
Hi,

Doing some analysis, the issue doesn't seem to reside on the dbus code,
but in the nss_ldap. In fact, the code that was backported for the 
"nss_initgroups_ignoreusers" option for nss_ldap doesn't seem to be
complete.
This option just tells nss_ldap to not contact the LDAP server for some
users using that option, and return NSS_NOTFOUND; now, the problem comes
from when enumerating the groups of these users, we enable a mutex lock
but when returning, we don't release it.
This means that when using that option, in some specific cases like the
dbus system, it will just wait for the lock to be returned... this can be
seen in the following stack trace:

#1  0x005a53de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2  0x005a200b in _L_mutex_lock_35 () from /lib/tls/libpthread.so.0
#3  0xbfeec84c in ?? ()
#4  0x00bd4750 in ?? () from /lib/libnss_ldap.so.2
#5  0xbfeec848 in ?? ()
#6  0x094c1870 in ?? ()
#7  0xbfeec8e0 in ?? ()
#8  0x009c950d in _nss_ldap_enter () from /lib/libnss_ldap.so.2
#9  0x009c950d in _nss_ldap_enter () from /lib/libnss_ldap.so.2
#10 0x009ce0ed in _nss_ldap_initgroups_dyn () from /lib/libnss_ldap.so.2
#11 0x003b44ec in internal_getgrouplist () from /lib/tls/libc.so.6
#12 0x003b4695 in getgrouplist () from /lib/tls/libc.so.6
#13 0x080a0c43 in fill_user_info (info=0x94c2f80, uid=Variable "uid" is
not available.
) at dbus-sysdeps.c:1511
#14 0x080a3147 in _dbus_user_database_lookup (db=0x94c38d8,
uid=4294967295, username=0xbfeecba0, error=0x0) at dbus-userdb.c:135
#15 0x080a3520 in _dbus_user_database_get_username (db=0xfffffffc,
username=0x5a9ff4, info=0x5a9ff4, error=0x5a9ff4)
    at dbus-userdb.c:859
    
The reason this was happening was that dbus with hal need to/do limit in
their policies the hal daemon to be run under either "root" or
"haldaemon"
If you remove these from the nss_initgroups_ignoreusers or you remove the
policies from the hal.conf from the dbus configuration, dbus will run
flawlessly.

the function that seems to fail is _nss_ldap_initgroups_dyn() as it
performs the following:

    _nss_ldap_initgroups_dyn
      |
      | _nss_ldap_enter();
      | _nss_ldap_init();
      |
      | if (_nss_ldap_test_initgroups_ignoreuser (user)) {
----->      _nss_ldap_leave();
      |     return NSS_STATUS_NOTFOUND;
      | }

The lock wasn't cleared before leaving the function.
Please have a try to the package and let me know...

Kind regards,

     Jose

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by jplans 
 issue 142304

Comment 8 Jose Plans 2008-02-11 14:55:41 UTC
Created attachment 294564 [details]
nss_ldap-ignoregrp_lock.patch

Comment 27 errata-xmlrpc 2008-07-24 19:55:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0715.html


Note You need to log in before you can comment on or make changes to this bug.