Bug 729905 - SSSD hangs at 99% cpu
Summary: SSSD hangs at 99% cpu
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sssd
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Stephen Gallagher
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-11 08:20 UTC by Kemot1000
Modified: 2015-01-04 23:50 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-25 11:26:10 UTC


Attachments (Terms of Use)
sssd_pam.jpg (61.45 KB, image/jpeg)
2011-08-11 08:20 UTC, Kemot1000
no flags Details

Description Kemot1000 2011-08-11 08:20:04 UTC
Created attachment 517753 [details]
sssd_pam.jpg

I found this bug and from the description it looks that I have the same problem on one of my servers (https://bugzilla.redhat.com/show_bug.cgi?id=661163). 

For no reason every few hours (it can happen twice in one hour or every few hours) I see that authentication to my server is not working. When I login with local user and check top I see that sssd_pam is using 99% of CPU. When I restart SSSD everything goes back to normal. 


I setup debug to 9 to collect some data but was at this point I do not have it yet.  

Attached is prinscreen from top command showing CPU usage at 99% for sssd_pam module

Is there any way to setup debug only for sssd_pam and leave sssd_nss information out of the debug file ?

Version-Release number of selected component (if applicable):
rpm -aq |grep sssd
sssd-1.5.1-34.el6_1.2.x86_64
sssd-client-1.5.1-34.el6_1.2.x86_64

How reproducible:
N/A

Steps to Reproduce:
N/A
  
Additional info: 

this is what I get when I reset SSSD in /var/log/secure:

Aug  9 11:21:12 PROD db2ckpwd 0[3623]: pam_sss(db2:auth): Request to sssd failed. Bad address

Comment 2 Jakub Hrozek 2011-08-11 09:34:26 UTC
(In reply to comment #0)
> Is there any way to setup debug only for sssd_pam and leave sssd_nss
> information out of the debug file ?
> 

Yes, open the file /etc/sssd/sssd.conf and put "debug_level = 9" into the "[pam]" section. Then restart SSSD. The log file we're interested in is /var/log/sssd/sssd_pam.log

This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=727041 -- I just ran a quick test and saw sssd_pam running at 99% when it depleted the file descriptors as well. The logs would tell us for certain (or you can grab the released RPMs that fix #727041 and check if that fixes the issue for you).

Comment 3 Kemot1000 2011-08-11 10:49:36 UTC
this is the only I got in sssd_pam.log (its like 20 the same lines) there. 

(Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not
reconnect to DOMAINNAME provider.


I saw this command in one of the posts 
check_user2
where can I get it to test if this is also problem on my system ?

Comment 4 Jakub Hrozek 2011-08-11 11:38:08 UTC
(In reply to comment #3)
> this is the only I got in sssd_pam.log (its like 20 the same lines) there. 
> 
> (Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not
> reconnect to DOMAINNAME provider.
> 
> 

OK, this is a new bug, different than PAM leaking file descriptors. That code is only reachable when the back end crashes and the pam process is trying to reconnect.

Can you check /var/log/messages for signs of sssd_be crashing, or, if you use abrt, try running "abrt -l" ?

Comment 5 Kemot1000 2011-08-11 11:47:32 UTC
This is from /var/log/messages This is the only log like that 

Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in libtevent.so.0.9.8[7fe71fb64000+9000]

This is the only time when it crashed. I think this is related to debug being set on 9.

Comment 6 Kemot1000 2011-08-11 11:51:33 UTC
FYI

I tested on QA this problem with file descriptors. And I had this issue also. I tried to authenticate user in the loop and it crashed sssd_pam when File Descriptors reached 1024 

With 1.3 patch of I have same amount of file descriptors all the time (around 25)

Comment 7 Jakub Hrozek 2011-08-11 15:00:11 UTC
(In reply to comment #5)
> This is from /var/log/messages This is the only log like that 
> 
> Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip
> 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in
> libtevent.so.0.9.8[7fe71fb64000+9000]
> 
> This is the only time when it crashed. I think this is related to debug being
> set on 9.

Can you get a core dump from that crash and attach it here?

Comment 8 Kemot1000 2011-08-11 19:48:12 UTC
I see no core dump under root folder.

Comment 9 Jakub Hrozek 2011-08-12 10:38:37 UTC
(In reply to comment #8)
> I see no core dump under root folder.

Core dumps are probably not enabled on your machine. On RHEL6, the easiest way is installing and starting abrt:

https://access.redhat.com/kb/docs/DOC-5353

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt

Comment 10 Jakub Hrozek 2011-08-16 07:31:14 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > I see no core dump under root folder.
> 
> Core dumps are probably not enabled on your machine. On RHEL6, the easiest way
> is installing and starting abrt:
> 
> https://access.redhat.com/kb/docs/DOC-5353
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt

Hello,

did you have any luck getting the core dump? Do you need additional assistance getting it?

Comment 11 Kemot1000 2011-08-19 10:05:04 UTC
Hi,
Sorry was on vacation. 

After installing patch 1.3 I had no problems with sssd so I have no option to get this dump. 

I think that to get this error the best bet would be running 1.2 version and setup debug level to 9. Then hang sssd using all File descriptors. Did you try something like this ? 

T.

Comment 12 Jakub Hrozek 2011-08-25 11:26:10 UTC
(In reply to comment #11)
> Hi,
> Sorry was on vacation. 
> 
> After installing patch 1.3 I had no problems with sssd so I have no option to
> get this dump. 
> 
> I think that to get this error the best bet would be running 1.2 version and
> setup debug level to 9. Then hang sssd using all File descriptors. Did you try
> something like this ? 
> 
> T.

Yes and I could not reproduce the crash.

There is not much we can do without either a way to reproduce the crash or at least a core dump.

I'm going to close this bug with the INSUFFICIENT_DATA resolution, kindly reopen it if there is more data to help us with debugging.


Note You need to log in before you can comment on or make changes to this bug.