Bug 729905

Summary: SSSD hangs at 99% cpu
Product: Red Hat Enterprise Linux 6 Reporter: Kemot1000 <kemot1000>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: benl, grajaiya, jgalipea, jhrozek, kbanerje, prc
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-25 11:26:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
sssd_pam.jpg none

Description Kemot1000 2011-08-11 08:20:04 UTC
Created attachment 517753 [details]
sssd_pam.jpg

I found this bug and from the description it looks that I have the same problem on one of my servers (https://bugzilla.redhat.com/show_bug.cgi?id=661163). 

For no reason every few hours (it can happen twice in one hour or every few hours) I see that authentication to my server is not working. When I login with local user and check top I see that sssd_pam is using 99% of CPU. When I restart SSSD everything goes back to normal. 


I setup debug to 9 to collect some data but was at this point I do not have it yet.  

Attached is prinscreen from top command showing CPU usage at 99% for sssd_pam module

Is there any way to setup debug only for sssd_pam and leave sssd_nss information out of the debug file ?

Version-Release number of selected component (if applicable):
rpm -aq |grep sssd
sssd-1.5.1-34.el6_1.2.x86_64
sssd-client-1.5.1-34.el6_1.2.x86_64

How reproducible:
N/A

Steps to Reproduce:
N/A
  
Additional info: 

this is what I get when I reset SSSD in /var/log/secure:

Aug  9 11:21:12 PROD db2ckpwd 0[3623]: pam_sss(db2:auth): Request to sssd failed. Bad address

Comment 2 Jakub Hrozek 2011-08-11 09:34:26 UTC
(In reply to comment #0)
> Is there any way to setup debug only for sssd_pam and leave sssd_nss
> information out of the debug file ?
> 

Yes, open the file /etc/sssd/sssd.conf and put "debug_level = 9" into the "[pam]" section. Then restart SSSD. The log file we're interested in is /var/log/sssd/sssd_pam.log

This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=727041 -- I just ran a quick test and saw sssd_pam running at 99% when it depleted the file descriptors as well. The logs would tell us for certain (or you can grab the released RPMs that fix #727041 and check if that fixes the issue for you).

Comment 3 Kemot1000 2011-08-11 10:49:36 UTC
this is the only I got in sssd_pam.log (its like 20 the same lines) there. 

(Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not
reconnect to DOMAINNAME provider.


I saw this command in one of the posts 
check_user2
where can I get it to test if this is also problem on my system ?

Comment 4 Jakub Hrozek 2011-08-11 11:38:08 UTC
(In reply to comment #3)
> this is the only I got in sssd_pam.log (its like 20 the same lines) there. 
> 
> (Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not
> reconnect to DOMAINNAME provider.
> 
> 

OK, this is a new bug, different than PAM leaking file descriptors. That code is only reachable when the back end crashes and the pam process is trying to reconnect.

Can you check /var/log/messages for signs of sssd_be crashing, or, if you use abrt, try running "abrt -l" ?

Comment 5 Kemot1000 2011-08-11 11:47:32 UTC
This is from /var/log/messages This is the only log like that 

Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in libtevent.so.0.9.8[7fe71fb64000+9000]

This is the only time when it crashed. I think this is related to debug being set on 9.

Comment 6 Kemot1000 2011-08-11 11:51:33 UTC
FYI

I tested on QA this problem with file descriptors. And I had this issue also. I tried to authenticate user in the loop and it crashed sssd_pam when File Descriptors reached 1024 

With 1.3 patch of I have same amount of file descriptors all the time (around 25)

Comment 7 Jakub Hrozek 2011-08-11 15:00:11 UTC
(In reply to comment #5)
> This is from /var/log/messages This is the only log like that 
> 
> Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip
> 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in
> libtevent.so.0.9.8[7fe71fb64000+9000]
> 
> This is the only time when it crashed. I think this is related to debug being
> set on 9.

Can you get a core dump from that crash and attach it here?

Comment 8 Kemot1000 2011-08-11 19:48:12 UTC
I see no core dump under root folder.

Comment 9 Jakub Hrozek 2011-08-12 10:38:37 UTC
(In reply to comment #8)
> I see no core dump under root folder.

Core dumps are probably not enabled on your machine. On RHEL6, the easiest way is installing and starting abrt:

https://access.redhat.com/kb/docs/DOC-5353

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt

Comment 10 Jakub Hrozek 2011-08-16 07:31:14 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > I see no core dump under root folder.
> 
> Core dumps are probably not enabled on your machine. On RHEL6, the easiest way
> is installing and starting abrt:
> 
> https://access.redhat.com/kb/docs/DOC-5353
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt

Hello,

did you have any luck getting the core dump? Do you need additional assistance getting it?

Comment 11 Kemot1000 2011-08-19 10:05:04 UTC
Hi,
Sorry was on vacation. 

After installing patch 1.3 I had no problems with sssd so I have no option to get this dump. 

I think that to get this error the best bet would be running 1.2 version and setup debug level to 9. Then hang sssd using all File descriptors. Did you try something like this ? 

T.

Comment 12 Jakub Hrozek 2011-08-25 11:26:10 UTC
(In reply to comment #11)
> Hi,
> Sorry was on vacation. 
> 
> After installing patch 1.3 I had no problems with sssd so I have no option to
> get this dump. 
> 
> I think that to get this error the best bet would be running 1.2 version and
> setup debug level to 9. Then hang sssd using all File descriptors. Did you try
> something like this ? 
> 
> T.

Yes and I could not reproduce the crash.

There is not much we can do without either a way to reproduce the crash or at least a core dump.

I'm going to close this bug with the INSUFFICIENT_DATA resolution, kindly reopen it if there is more data to help us with debugging.