Bug 1060380

Summary:	sssd_nss segfaulting sssd-1.9.2-129
Product:	Red Hat Enterprise Linux 6	Reporter:	hgraham
Component:	sssd	Assignee:	Jakub Hrozek <jhrozek>
Status:	CLOSED UPSTREAM	QA Contact:	Kaushik Banerjee <kbanerje>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.5	CC:	ddas, grajaiya, hgraham, jgalipea, lslebodn, mkosek, pbrezina, preichl
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-05-15 11:54:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1061410

Description hgraham 2014-02-01 00:11:22 UTC

Description of problem:
kernel: sssd_nss[15721]: segfault at 94 ip 0000003124024fbc sp 00007fff401c4070 error 4 in libdbus-1.so.3.4.0[3124000000+40000]

Version-Release number of selected component (if applicable):
sssd-1.9.2-129.el6_5.4.x86_64

How reproducible:
unknown

Steps to Reproduce:
1.N/A

Actual results:
sssd_nss segfaults

Expected results:
no segfaults

Additional info:
will also provide coredump, bt and sosreport

Comment 3 hgraham 2014-02-01 00:30:24 UTC

The customer said this began occurring after the server was patched on the 26th to the latest version of sssd.

Jan 26 09:08:25 Updated: sssd-1.9.2-129.el6_5.4.x86_64

Comment 4 Lukas Slebodnik 2014-02-03 09:08:32 UTC

According to log files and coredump, it looks like sssd_nss crashed at the same time when sssd_be was forced to restart because it did not respond to the main sssd process.

I can see that enumeration is enabled and it can be root cause why sssd_be is not responsive. (sssd_be was restarted 50 times per 30 hours and sssd_nss crashed 8 times). Even if we solve crash in sss_nss sssd will not return correct response. I would suggest either to turn off enumeration or increase default value of option "timeout" from 10 seconds to 20 seconds

Comment 5 hgraham 2014-02-03 17:03:12 UTC

Lukas, would that be the "timeout" option under the [domain/default] section of the configuration or under the [nss] ?

Comment 6 Jakub Hrozek 2014-02-03 17:10:09 UTC

(In reply to hgraham from comment #5)
> Lukas, would that be the "timeout" option under the [domain/default] section
> of the configuration or under the [nss] ?

Under [domain]

Comment 7 Lukas Slebodnik 2014-02-03 17:13:51 UTC

(In reply to hgraham from comment #5)
> Lukas, would that be the "timeout" option under the [domain/default] section
> of the configuration or under the [nss] ?

I should have been more concrete in the previous comment
They should put "timeout = 20" into [domain/default] section.

I think 15 seconds can be also sufficient and I would not recommend to use higher value then 20 seconds.

Comment 8 Jakub Hrozek 2014-02-13 12:03:40 UTC

Henry, do you know if the workaround helped the customer?

Comment 9 Jakub Hrozek 2014-02-13 12:04:26 UTC

Upstream ticket:
https://fedorahosted.org/sssd/ticket/2245

Comment 10 Jakub Hrozek 2014-03-05 16:23:27 UTC

Ping, any news?

Comment 11 hgraham 2014-03-05 18:06:04 UTC

nothing yet Jakub, I asked the customer again if the workarounds worked. thanks

Comment 12 RHEL Program Management 2014-03-19 08:37:00 UTC

Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 14 Jakub Hrozek 2014-05-15 11:52:42 UTC

The customer case is closed. Since the problem was caused by enumeration taking too long which we track already in several other bugzillas, I'm going to close this BZ as UPSTREAM. We need to solve the enumeration performance rather than put band-aids all around.

Comment 15 Jakub Hrozek 2014-05-15 11:53:37 UTC

Upstream ticket:
https://fedorahosted.org/sssd/ticket/1729