Bug 1285066

Summary: pam_sss.so event causing delayed response after received result from idm server.
Product: Red Hat Enterprise Linux 6 Reporter: jdang <jdang>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED WORKSFORME QA Contact: Namita Soman <nsoman>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.6CC: grajaiya, jdang, jhrozek, jstephen, lslebodn, mkosek, mzidek, pbrezina, pwayper, rharwood, ssekidde, tmraz
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-10 12:21:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269194    
Attachments:
Description Flags
krb5kdc log none

Description jdang 2015-11-24 19:33:08 UTC
Description of problem:
 
Delay issues on pam_sss.so.
 "Postgresql is calling pam_sss.so via pam stack. Sometimes, response time is longer than 3s. It happened randomly. But happened on both RHEL5 and RHEL6. On both OS, we are using service record for load balancing.

Verified in logs on idm server side, server returns result in subsecond.
Something happened inside pam_sss.so that delayed response after received result from idm server"

Version-Release number of selected component (if applicable):
RHEL5 and RHEL6

How reproducible:
Yes
- It appears it is reproducible on the customer side.  On the Red Hat side, per comment #31 in the case (Justin Stephenson  (11/17/2015 2:29 PM))

Steps to Reproduce:
Unsure

Actual results:
Unsure

Expected results:
Unsure

Additional info:
We're attempting to debug pam_sss callouts at a granular level. Something like this, it could benefit to have something like a stap script in place. But that's likely the next step is to look at the callouts coming from the pam libraries and seeing where the delays are counted at.

Comment 2 Lukas Slebodnik 2015-11-25 08:45:43 UTC
Could you provide log files from sssd? 
We would need to increase debug_level in domain and pam section.
https://fedorahosted.org/sssd/wiki/Troubleshooting#SSSDdebuglogs

Could you also provide log file /var/log/secure?

You might also use tips for trubleshooting authentication.
https://fedorahosted.org/sssd/wiki/Troubleshooting#TroubleshootingAuthenticationPasswordChangeandAccessControl

Comment 3 Jakub Hrozek 2015-11-25 08:52:53 UTC
When you attach those logs, please also make sure they are from a RHEL-6 machine because a) this is a performance issue and in RHEL-5 we no longer fix those and b) RHEL-6 would have Kerberos tracing info in krb5_child.logs.

Comment 8 jstephen 2015-12-07 22:42:56 UTC
Created attachment 1103387 [details]
krb5kdc log