+++ This bug was initially created as a clone of Bug #645434 +++ Description of problem: If a data provider dies during a NSS request the NSS responder dies if the timeout of the open and unhandled requests is reached. Version-Release number of selected component (if applicable): At least sssd-1.2 and above How reproducible: There is no know error in the LDAP provider which can be used to trigger this issue, so the sssd_be process must be killed manually. Steps to Reproduce: 1. Configure sssd with id_provider=ldap. 2. Choose a slow LDAP server and a very large group or find some other way to make the LDAP request last long. 3. getent group very_large_group 4. kill sssd_be immediatly after calling getent 5. wait until the timeout is reached (couple of minutes) Actual results: NSS responder dies. Expected results: NSS responder returns an error to the client. Additional info: The upstream bug can be found here: https://fedorahosted.org/sssd/ticket/654
QE and Dev has spent days trying to reproduce and verify this bug. It is extremely hard to do. Development has been successful a few times, but very inconsistently. Given the nature of the bug, It's not a very complicated fix, and from an engineering perspective it's more or less obvious. Since, this fix has not caused any regressions in all automated and manual testing, will mark bug verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0044.html