Bug 1060380

Summary: sssd_nss segfaulting sssd-1.9.2-129
Product: Red Hat Enterprise Linux 6 Reporter: hgraham
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED UPSTREAM QA Contact: Kaushik Banerjee <kbanerje>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.5CC: ddas, grajaiya, hgraham, jgalipea, lslebodn, mkosek, pbrezina, preichl
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-15 11:54:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1061410    

Description hgraham 2014-02-01 00:11:22 UTC
Description of problem:
kernel: sssd_nss[15721]: segfault at 94 ip 0000003124024fbc sp 00007fff401c4070 error 4 in libdbus-1.so.3.4.0[3124000000+40000]

Version-Release number of selected component (if applicable):
sssd-1.9.2-129.el6_5.4.x86_64

How reproducible:
unknown

Steps to Reproduce:
1.N/A

Actual results:
sssd_nss segfaults

Expected results:
no segfaults

Additional info:
will also provide coredump, bt and sosreport

Comment 3 hgraham 2014-02-01 00:30:24 UTC
The customer said this began occurring after the server was patched on the 26th to the latest version of sssd.

Jan 26 09:08:25 Updated: sssd-1.9.2-129.el6_5.4.x86_64

Comment 4 Lukas Slebodnik 2014-02-03 09:08:32 UTC
According to log files and coredump, it looks like sssd_nss crashed at the same time when sssd_be was forced to restart because it did not respond to the main sssd process.

I can see that enumeration is enabled and it can be root cause why sssd_be is not responsive. (sssd_be was restarted 50 times per 30 hours and sssd_nss crashed 8 times). Even if we solve crash in sss_nss sssd will not return correct response. I would suggest either to turn off enumeration or increase default value of option "timeout" from 10 seconds to 20 seconds

Comment 5 hgraham 2014-02-03 17:03:12 UTC
Lukas, would that be the "timeout" option under the [domain/default] section of the configuration or under the [nss] ?

Comment 6 Jakub Hrozek 2014-02-03 17:10:09 UTC
(In reply to hgraham from comment #5)
> Lukas, would that be the "timeout" option under the [domain/default] section
> of the configuration or under the [nss] ?

Under [domain]

Comment 7 Lukas Slebodnik 2014-02-03 17:13:51 UTC
(In reply to hgraham from comment #5)
> Lukas, would that be the "timeout" option under the [domain/default] section
> of the configuration or under the [nss] ?

I should have been more concrete in the previous comment
They should put "timeout = 20" into [domain/default] section.

I think 15 seconds can be also sufficient and I would not recommend to use higher value then 20 seconds.

Comment 8 Jakub Hrozek 2014-02-13 12:03:40 UTC
Henry, do you know if the workaround helped the customer?

Comment 9 Jakub Hrozek 2014-02-13 12:04:26 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/2245

Comment 10 Jakub Hrozek 2014-03-05 16:23:27 UTC
Ping, any news?

Comment 11 hgraham 2014-03-05 18:06:04 UTC
nothing yet Jakub, I asked the customer again if the workarounds worked. thanks

Comment 12 RHEL Program Management 2014-03-19 08:37:00 UTC
Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 14 Jakub Hrozek 2014-05-15 11:52:42 UTC
The customer case is closed. Since the problem was caused by enumeration taking too long which we track already in several other bugzillas, I'm going to close this BZ as UPSTREAM. We need to solve the enumeration performance rather than put band-aids all around.

Comment 15 Jakub Hrozek 2014-05-15 11:53:37 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1729