Bug 173019 - Nscd invalidates INITGROUPS very quickly
Nscd invalidates INITGROUPS very quickly
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
: Reopened
Depends On:
Blocks: 145044
  Show dependency treegraph
 
Reported: 2005-11-12 11:33 EST by W. Michael Petullo
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version: 2.4.90-17
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-03 02:45:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Program to test initgroups() and nscd (117 bytes, text/x-csrc)
2005-11-27 20:23 EST, W. Michael Petullo
no flags Details
Backtrace of su during hang (6.32 KB, text/plain)
2005-11-29 20:17 EST, W. Michael Petullo
no flags Details
Backtrace of nscd threads during hang of su (11.68 KB, text/plain)
2005-11-29 20:19 EST, W. Michael Petullo
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Sourceware 2098 None None None Never

  None (edit)
Description W. Michael Petullo 2005-11-12 11:33:00 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.12) Gecko/20051018 Epiphany/1.8.2

Description of problem:
I have a test laptop that usually uses LDAP for NSS and kerberos for authentication.  The laptop uses nscd and pam_ccreds to operate when disconnected from the network.  This setup has worked fine for quite some time.  Recently, I updated to the newest Rawhide and nscd seems to be broken.

I use a PowerPC kernel/glibc.

Version-Release number of selected component (if applicable):
nscd-2.3.90-15

How reproducible:
Always

Steps to Reproduce:
1.  Gdb nscd.
2.  gdb> run -d
3.  id -ng
4.  Disconnect laptop from LDAP server.
5.  id -ng
  

Actual Results:  After step 2, the user's group name is printed.

After step 4, nscd crashes.

The id command hangs in step 5 because nscd is gone and the LDAP server is not available.

7226: handle_request: request received (Version = 2) from PID 7237
7226:   GETFDGR
7226: provide access to FD 12, for group
7226: handle_request: request received (Version = 2) from PID 7239
7226:   GETFDPW
7226: provide access to FD 10, for passwd

Program received signal SIGTERM, Terminated.
[Switching to Thread 805432992 (LWP 7226)]
0x07e64148 in epoll_wait () from /lib/libc.so.6
(gdb) ba
#0  0x07e64148 in epoll_wait () from /lib/libc.so.6
#1  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#2  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#3  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#4  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#5  0x08006d30 in sighup_handler () from /usr/sbin/nscd
Previous frame inner to this frame (corrupt stack?)


Expected Results:  Nscd should allow the id command to print even when the LDAP server is not available.

Additional info:

When disconnected:

Logins fail, hanging on the initgroups function.
The system message bus will not start, hangs on the getgrouplist function.
Comment 1 W. Michael Petullo 2005-11-27 20:23:12 EST
Created attachment 121528 [details]
Program to test initgroups() and nscd

As of nscd-2.3.90-18, the daemon no longer crashes (that I have seen.) 
However, the original symptoms remain.

The attached program may be used to test nscd.	Here are some scenarios:

1.  Execute program while attached to network/LDAP server, the nscd daemon
says:

31166: handle_request: request received (Version = 2) from PID 3904
31166:	GETFDGR
31166: provide access to FD 9, for group
31166: handle_request: request received (Version = 2) from PID 3904
31166:	INITGROUPS (mike)
31166: Haven't found "mike" in group cache!

2.  Wait 10 seconds, the nscd daemon says (why removed so soon?):

31166: remove INITGROUPS entry "mike"

3.  Disconnect from network, execute program, nscd daemon says:

31166: handle_request: request received (Version = 2) from PID 5090
31166:	GETFDGR
31166: provide access to FD 9, for group
31166: handle_request: request received (Version = 2) from PID 5090
31166:	INITGROUPS (mike)
31166: Haven't found "mike" in group cache!

Program hangs, trying to make LDAP request.

NOTE: if you disconnect and execute program before "remove INITGROUPS" message,
then program will NOT hang.

I also see this message printed by the daemon: "31166: short write in
addinitgroupsX: Broken pipe."
Comment 2 Jakub Jelinek 2005-11-29 07:08:54 EST
Can you please:
1) install glibc-debuginfo* corresponding to glibc/nscd you have installed
2) when you reproduce the hang in some application, as root
   gdb /usr/sbin/nscd `/sbin/pidof nscd`
   and get backtraces of all threads to see where exactly is it hang?
It might very well be a nss_ldap bug, which is a separate package.
Comment 3 W. Michael Petullo 2005-11-29 20:17:53 EST
Created attachment 121618 [details]
Backtrace of su during hang
Comment 4 W. Michael Petullo 2005-11-29 20:19:21 EST
Created attachment 121619 [details]
Backtrace of nscd threads during hang of su
Comment 5 W. Michael Petullo 2005-11-29 20:27:12 EST
It seems that nscd is prematurely invalidating its cache of initgroups data. 
See in comment #1, "31166: remove INITGROUPS entry 'mike'."

Why is nscd invalidating this cache entry so soon after it has been entered
(within seconds, according to comment #1?)
Comment 6 W. Michael Petullo 2005-12-27 23:02:17 EST
See also http://sources.redhat.com/bugzilla/show_bug.cgi?id=2098.
Comment 7 Ulrich Drepper 2006-08-01 21:59:38 EDT
You didn't explain what kind of entries are evacuated to early.  I think it's an
entry without auxiliary groups.  For this I checked in a patch.  The entries are
now added with the usual timeout value.  Should be in the next rawhide build.
Comment 8 Ulrich Drepper 2006-08-01 22:16:08 EDT
Why bz closed the bug I don't know.  Until a new rawhide release is out it
should remain open.
Comment 9 Jakub Jelinek 2006-08-03 02:45:17 EDT
The changes are in nscd-2.4.90-17 in rawhide.
Comment 10 W. Michael Petullo 2006-08-18 20:50:33 EDT
I tested nscd-2.4.90-21 and this seems fixed.  Thank you.

Note You need to log in before you can comment on or make changes to this bug.