Bug 173019 - Nscd invalidates INITGROUPS very quickly
Summary: Nscd invalidates INITGROUPS very quickly
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 145044
TreeView+ depends on / blocked
 
Reported: 2005-11-12 16:33 UTC by W. Michael Petullo
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version: 2.4.90-17
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-03 06:45:17 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Program to test initgroups() and nscd (117 bytes, text/x-csrc)
2005-11-28 01:23 UTC, W. Michael Petullo
no flags Details
Backtrace of su during hang (6.32 KB, text/plain)
2005-11-30 01:17 UTC, W. Michael Petullo
no flags Details
Backtrace of nscd threads during hang of su (11.68 KB, text/plain)
2005-11-30 01:19 UTC, W. Michael Petullo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 2098 0 None None None Never

Description W. Michael Petullo 2005-11-12 16:33:00 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.12) Gecko/20051018 Epiphany/1.8.2

Description of problem:
I have a test laptop that usually uses LDAP for NSS and kerberos for authentication.  The laptop uses nscd and pam_ccreds to operate when disconnected from the network.  This setup has worked fine for quite some time.  Recently, I updated to the newest Rawhide and nscd seems to be broken.

I use a PowerPC kernel/glibc.

Version-Release number of selected component (if applicable):
nscd-2.3.90-15

How reproducible:
Always

Steps to Reproduce:
1.  Gdb nscd.
2.  gdb> run -d
3.  id -ng
4.  Disconnect laptop from LDAP server.
5.  id -ng
  

Actual Results:  After step 2, the user's group name is printed.

After step 4, nscd crashes.

The id command hangs in step 5 because nscd is gone and the LDAP server is not available.

7226: handle_request: request received (Version = 2) from PID 7237
7226:   GETFDGR
7226: provide access to FD 12, for group
7226: handle_request: request received (Version = 2) from PID 7239
7226:   GETFDPW
7226: provide access to FD 10, for passwd

Program received signal SIGTERM, Terminated.
[Switching to Thread 805432992 (LWP 7226)]
0x07e64148 in epoll_wait () from /lib/libc.so.6
(gdb) ba
#0  0x07e64148 in epoll_wait () from /lib/libc.so.6
#1  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#2  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#3  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#4  0x08006d30 in sighup_handler () from /usr/sbin/nscd
#5  0x08006d30 in sighup_handler () from /usr/sbin/nscd
Previous frame inner to this frame (corrupt stack?)


Expected Results:  Nscd should allow the id command to print even when the LDAP server is not available.

Additional info:

When disconnected:

Logins fail, hanging on the initgroups function.
The system message bus will not start, hangs on the getgrouplist function.

Comment 1 W. Michael Petullo 2005-11-28 01:23:12 UTC
Created attachment 121528 [details]
Program to test initgroups() and nscd

As of nscd-2.3.90-18, the daemon no longer crashes (that I have seen.) 
However, the original symptoms remain.

The attached program may be used to test nscd.	Here are some scenarios:

1.  Execute program while attached to network/LDAP server, the nscd daemon
says:

31166: handle_request: request received (Version = 2) from PID 3904
31166:	GETFDGR
31166: provide access to FD 9, for group
31166: handle_request: request received (Version = 2) from PID 3904
31166:	INITGROUPS (mike)
31166: Haven't found "mike" in group cache!

2.  Wait 10 seconds, the nscd daemon says (why removed so soon?):

31166: remove INITGROUPS entry "mike"

3.  Disconnect from network, execute program, nscd daemon says:

31166: handle_request: request received (Version = 2) from PID 5090
31166:	GETFDGR
31166: provide access to FD 9, for group
31166: handle_request: request received (Version = 2) from PID 5090
31166:	INITGROUPS (mike)
31166: Haven't found "mike" in group cache!

Program hangs, trying to make LDAP request.

NOTE: if you disconnect and execute program before "remove INITGROUPS" message,
then program will NOT hang.

I also see this message printed by the daemon: "31166: short write in
addinitgroupsX: Broken pipe."

Comment 2 Jakub Jelinek 2005-11-29 12:08:54 UTC
Can you please:
1) install glibc-debuginfo* corresponding to glibc/nscd you have installed
2) when you reproduce the hang in some application, as root
   gdb /usr/sbin/nscd `/sbin/pidof nscd`
   and get backtraces of all threads to see where exactly is it hang?
It might very well be a nss_ldap bug, which is a separate package.

Comment 3 W. Michael Petullo 2005-11-30 01:17:53 UTC
Created attachment 121618 [details]
Backtrace of su during hang

Comment 4 W. Michael Petullo 2005-11-30 01:19:21 UTC
Created attachment 121619 [details]
Backtrace of nscd threads during hang of su

Comment 5 W. Michael Petullo 2005-11-30 01:27:12 UTC
It seems that nscd is prematurely invalidating its cache of initgroups data. 
See in comment #1, "31166: remove INITGROUPS entry 'mike'."

Why is nscd invalidating this cache entry so soon after it has been entered
(within seconds, according to comment #1?)

Comment 6 W. Michael Petullo 2005-12-28 04:02:17 UTC
See also http://sources.redhat.com/bugzilla/show_bug.cgi?id=2098.

Comment 7 Ulrich Drepper 2006-08-02 01:59:38 UTC
You didn't explain what kind of entries are evacuated to early.  I think it's an
entry without auxiliary groups.  For this I checked in a patch.  The entries are
now added with the usual timeout value.  Should be in the next rawhide build.

Comment 8 Ulrich Drepper 2006-08-02 02:16:08 UTC
Why bz closed the bug I don't know.  Until a new rawhide release is out it
should remain open.

Comment 9 Jakub Jelinek 2006-08-03 06:45:17 UTC
The changes are in nscd-2.4.90-17 in rawhide.

Comment 10 W. Michael Petullo 2006-08-19 00:50:33 UTC
I tested nscd-2.4.90-21 and this seems fixed.  Thank you.


Note You need to log in before you can comment on or make changes to this bug.