Bug 426155

Summary: nscd takes a long time to decide
Product: Red Hat Enterprise Linux 4 Reporter: Kaj J. Niemi <kajtzu>
Component: nss_ldapAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 4.6CC: dpal, drepper, eijiro.sumii, jplans, mark, schlegel, tao, vanhoof
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-02-22 16:37:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kaj J. Niemi 2007-12-18 21:27:42 UTC
Description of problem:
On RHEL 4.6, x86_64, nscd sometimes (frequently) locks and makes the user wait 5 seconds (first 
timeout) until returning. This does not seem to happen with glibc/nscd 2.3.4-2.36 (from RHEL 4.5).

nsswitch.conf has been configured to use ldap. There is nothing wrong with the ldap servers nor is 
there anything wrong with the connectivity to the same servers. Both have been ruled out as problems.


Version-Release number of selected component (if applicable):
nscd 2.3.4-2.39

How reproducible:
Semireliably

Steps to Reproduce:
1. start nscd
2. time id foo < 0.01s
3. wait until it decides to misbehave
4. time id foo > 5.0s always
  
Actual results:
% time id kajtzu
uid=501(kajtzu) gid=501(kajtzu) groups=500(basen),501(kajtzu)
id kajtzu  0.00s user 0.00s system 0% cpu 5.032 total

Expected results:
% time id kajtzu
uid=501(kajtzu) gid=501(kajtzu) groups=500(basen),501(kajtzu)
id kajtzu  0.00s user 0.00s system 0% cpu 0.003 total

Additional info:
nscd -d simply says:

13289: handle_request: request received (Version = 2) from PID 8071
13289:  GETFDPW
13289: provide access to FD 5, for passwd
13289: handle_request: request received (Version = 2) from PID 8071
13289:  GETPWBYNAME (kajtzu)
13289: Haven't found "kajtzu" in password cache!
13289: handle_request: request received (Version = 2) from PID 8071
13289:  GETFDHST   
13289: provide access to FD 9, for hosts
13289: handle_request: request received (Version = 2) from PID 8071
13289:  GETFDGR
13289: provide access to FD 7, for group

during that same time stracing it returns:

Process 13289 attached - interrupt to quit
time(NULL)                              = 1198012747
epoll_wait(11, {{EPOLLRDNORM, {u32=10, u64=10}}}, 100, 29988) = 1
time(NULL)                              = 1198012748
accept(10, 0, NULL)                     = 19
epoll_ctl(11, EPOLL_CTL_ADD, 19, {EPOLLRDNORM, {u32=19, u64=19}}) = 0
epoll_wait(11, {{EPOLLRDNORM, {u32=19, u64=19}}}, 100, 29976) = 1
time(NULL)                              = 1198012748
epoll_ctl(11, EPOLL_CTL_DEL, 19, {0, {u32=0, u64=0}}) = 0
futex(0x552abc25a4, FUTEX_WAKE, 1)      = 1
epoll_wait(11, {{EPOLLRDNORM, {u32=10, u64=10}}}, 100, 29988) = 1
time(NULL)                              = 1198012748
accept(10, 0, NULL)                     = 19
epoll_ctl(11, EPOLL_CTL_ADD, 19, {EPOLLRDNORM, {u32=19, u64=19}}) = 0
epoll_wait(11, {{EPOLLRDNORM, {u32=19, u64=19}}}, 100, 29976) = 1
time(NULL)                              = 1198012748
epoll_ctl(11, EPOLL_CTL_DEL, 19, {0, {u32=0, u64=0}}) = 0
futex(0x552abc25a4, FUTEX_WAKE, 1)      = 1
epoll_wait(11, {{EPOLLRDNORM, {u32=10, u64=10}}}, 100, 29988) = 1
time(NULL)                              = 1198012752
accept(10, 0, NULL)                     = 19
epoll_ctl(11, EPOLL_CTL_ADD, 19, {EPOLLRDNORM, {u32=19, u64=19}}) = 0
epoll_wait(11, {{EPOLLRDNORM, {u32=19, u64=19}}}, 100, 29976) = 1
time(NULL)                              = 1198012752
epoll_ctl(11, EPOLL_CTL_DEL, 19, {0, {u32=0, u64=0}}) = 0
futex(0x552abc25a4, FUTEX_WAKE, 1)      = 1
futex(0x552abc25a0, FUTEX_WAKE, 1)      = 1
epoll_wait(11, {{EPOLLRDNORM, {u32=10, u64=10}}}, 100, 29988) = 1
time(NULL)                              = 1198012752
accept(10, 0, NULL)                     = 19
epoll_ctl(11, EPOLL_CTL_ADD, 19, {EPOLLRDNORM, {u32=19, u64=19}}) = 0
epoll_wait(11, {{EPOLLRDNORM, {u32=19, u64=19}}}, 100, 29976) = 1
time(NULL)                              = 1198012752
epoll_ctl(11, EPOLL_CTL_DEL, 19, {0, {u32=0, u64=0}}) = 0
futex(0x552abc25a4, FUTEX_WAKE, 1)      = 1
futex(0x552abc25a0, FUTEX_WAKE, 1)      = 1
epoll_wait(11, {{EPOLLRDNORM, {u32=10, u64=10}}}, 100, 29988) = 1
time(NULL)                              = 1198012757
accept(10, 0, NULL)                     = 20
epoll_ctl(11, EPOLL_CTL_ADD, 20, {EPOLLRDNORM, {u32=20, u64=20}}) = 0
epoll_wait(11, {{EPOLLRDNORM, {u32=20, u64=20}}}, 100, 29976) = 1
time(NULL)                              = 1198012757
epoll_ctl(11, EPOLL_CTL_DEL, 20, {0, {u32=0, u64=0}}) = 0
mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) 
= 0x41209000
mprotect(0x41209000, 4096, PROT_NONE)   = 0
clone(child_stack=0x41409270, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_S
ETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, 
parent_tidptr=0x414099f0, tls=0x41409960, child_tidptr=0x414099f0) = 8072
epoll_wait(11, {{EPOLLRDNORM, {u32=10, u64=10}}}, 100, 29988) = 1
time(NULL)                              = 1198012757
accept(10, 0, NULL)                     = 20
epoll_ctl(11, EPOLL_CTL_ADD, 20, {EPOLLRDNORM, {u32=20, u64=20}}) = 0
epoll_wait(11, {{EPOLLRDNORM, {u32=20, u64=20}}}, 100, 29976) = 1
time(NULL)                              = 1198012757
epoll_ctl(11, EPOLL_CTL_DEL, 20, {0, {u32=0, u64=0}}) = 0
futex(0x552abc25a4, FUTEX_WAKE, 1)      = 1
epoll_wait(11,  <unfinished ...>
Process 13289 detached

Comment 1 Eijiro Sumii 2008-01-02 05:02:21 UTC
Ping?  Exactly the same is happening on my RHEL 4.6, 32-bit x86 box.  It is also
running nscd-2.3.4-2.39 with nss_ldap-226-20.  The LDAP server is working
completely well.  Any idea?

Comment 2 Mark D. Foster 2008-01-15 23:37:05 UTC
This is also happening to us, on 32bit.
Please, update the hardware field?


Comment 3 Mark D. Foster 2008-01-17 16:01:50 UTC
We have identified this as a problem with nss_ldap-226-20 and a downgrade to
nss_ldap-226-18 fixes it for us.
It seems the newer version of nss_ldap has a problem vis-a-vis nscd and
nonexistent users/groups in LDAP.