645434 – NSS responder dies if DP dies during a request

Bug 645434 - NSS responder dies if DP dies during a request

Summary: NSS responder dies if DP dies during a request

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	sssd
Sub Component:
Version:	14
Hardware:	All
OS:	All
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Stephen Gallagher
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	645437 645438
TreeView+	depends on / blocked

Reported:	2010-10-21 13:44 UTC by Sumit Bose
Modified:	2010-11-16 23:19 UTC (History)
CC List:	4 users (show)
Fixed In Version:	sssd-1.4.1-1.fc14
Clone Of:
Clones:	645437 645438 (view as bug list)
Environment:
Last Closed:	2010-11-16 23:19:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sumit Bose 2010-10-21 13:44:33 UTC

Description of problem:
If a data provider dies during a NSS request the NSS responder dies if the timeout of the open and unhandled requests is reached.

Version-Release number of selected component (if applicable):
At least sssd-1.2 and above

How reproducible:
There is no know error in the LDAP provider which can be used to trigger this issue, so the sssd_be process must be killed manually. 

Steps to Reproduce:
1. Configure sssd with id_provider=ldap.
2. Choose a slow LDAP server and a very large group or find some other way to make the LDAP request last long.
3. getent group very_large_group
4. kill sssd_be immediatly after calling getent
5. wait until the timeout is reached (couple of minutes)
  
Actual results:
NSS responder dies.

Expected results:
NSS responder returns an error to the client.

Additional info:
The upstream bug can be found here: https://fedorahosted.org/sssd/ticket/654

Comment 1 Sumit Bose 2010-10-22 11:45:59 UTC

Based on an idea from Jan Zelený <jzeleny> I found an easier way to reproduce this issue:

Steps to Reproduce:
1. Configure sssd with id_provider=ldap, any LDAP server is ok
2. Start sssd, preferably with an empty cache (rm -f /var/lib/sss/db/*)
3. Find the pid of sssd_nss
   pgrep sssd_nss
4. Define a delay on the interface which is used to contact the LDAP server. If the LDAP server runs locally use lo, e.g.
   tc qdisc add dev lo root netem delay 3s
5. run
   while /bin/true; do if pgrep getent 1> /dev/null; then killall -9 /usr/libexec/sssd/sssd_be; break; fi; sleep 1; done
   (this will kill sssd_be as soon as a getent command is running
6. in a differrent shell call
   getent group some_group_which_is_not_in_the_cache
7. Wait until the getent call returns, this call last up to 5 minutes 
8. Call
   pgrep sssd_nss
   again
9. remove the delay
   tc qdisc del dev lo root

Actual results:
The two PIDs differ, i.e. sssd_nss dies and was restarted

Expected results:
The two PIDs are the same, i.e. sssd_nss didn't die

Comment 2 Fedora Update System 2010-11-05 18:34:42 UTC

sssd-1.4.1-1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/sssd-1.4.1-1.fc14

Comment 3 Fedora Update System 2010-11-06 23:40:59 UTC

sssd-1.4.1-1.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update sssd'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/sssd-1.4.1-1.fc14

Comment 4 Fedora Update System 2010-11-16 23:19:33 UTC

sssd-1.4.1-1.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.