Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 729905

Summary: SSSD hangs at 99% cpu
Product: Red Hat Enterprise Linux 6 Reporter: Kemot1000 <kemot1000>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: benl, grajaiya, jgalipea, jhrozek, kbanerje, prc
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-25 11:26:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sssd_pam.jpg none

Description Kemot1000 2011-08-11 08:20:04 UTC
Created attachment 517753 [details]
sssd_pam.jpg

I found this bug and from the description it looks that I have the same problem on one of my servers (https://bugzilla.redhat.com/show_bug.cgi?id=661163). 

For no reason every few hours (it can happen twice in one hour or every few hours) I see that authentication to my server is not working. When I login with local user and check top I see that sssd_pam is using 99% of CPU. When I restart SSSD everything goes back to normal. 


I setup debug to 9 to collect some data but was at this point I do not have it yet.  

Attached is prinscreen from top command showing CPU usage at 99% for sssd_pam module

Is there any way to setup debug only for sssd_pam and leave sssd_nss information out of the debug file ?

Version-Release number of selected component (if applicable):
rpm -aq |grep sssd
sssd-1.5.1-34.el6_1.2.x86_64
sssd-client-1.5.1-34.el6_1.2.x86_64

How reproducible:
N/A

Steps to Reproduce:
N/A
  
Additional info: 

this is what I get when I reset SSSD in /var/log/secure:

Aug  9 11:21:12 PROD db2ckpwd 0[3623]: pam_sss(db2:auth): Request to sssd failed. Bad address

Comment 2 Jakub Hrozek 2011-08-11 09:34:26 UTC
(In reply to comment #0)
> Is there any way to setup debug only for sssd_pam and leave sssd_nss
> information out of the debug file ?
> 

Yes, open the file /etc/sssd/sssd.conf and put "debug_level = 9" into the "[pam]" section. Then restart SSSD. The log file we're interested in is /var/log/sssd/sssd_pam.log

This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=727041 -- I just ran a quick test and saw sssd_pam running at 99% when it depleted the file descriptors as well. The logs would tell us for certain (or you can grab the released RPMs that fix #727041 and check if that fixes the issue for you).

Comment 3 Kemot1000 2011-08-11 10:49:36 UTC
this is the only I got in sssd_pam.log (its like 20 the same lines) there. 

(Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not
reconnect to DOMAINNAME provider.


I saw this command in one of the posts 
check_user2
where can I get it to test if this is also problem on my system ?

Comment 4 Jakub Hrozek 2011-08-11 11:38:08 UTC
(In reply to comment #3)
> this is the only I got in sssd_pam.log (its like 20 the same lines) there. 
> 
> (Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not
> reconnect to DOMAINNAME provider.
> 
> 

OK, this is a new bug, different than PAM leaking file descriptors. That code is only reachable when the back end crashes and the pam process is trying to reconnect.

Can you check /var/log/messages for signs of sssd_be crashing, or, if you use abrt, try running "abrt -l" ?

Comment 5 Kemot1000 2011-08-11 11:47:32 UTC
This is from /var/log/messages This is the only log like that 

Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in libtevent.so.0.9.8[7fe71fb64000+9000]

This is the only time when it crashed. I think this is related to debug being set on 9.

Comment 6 Kemot1000 2011-08-11 11:51:33 UTC
FYI

I tested on QA this problem with file descriptors. And I had this issue also. I tried to authenticate user in the loop and it crashed sssd_pam when File Descriptors reached 1024 

With 1.3 patch of I have same amount of file descriptors all the time (around 25)

Comment 7 Jakub Hrozek 2011-08-11 15:00:11 UTC
(In reply to comment #5)
> This is from /var/log/messages This is the only log like that 
> 
> Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip
> 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in
> libtevent.so.0.9.8[7fe71fb64000+9000]
> 
> This is the only time when it crashed. I think this is related to debug being
> set on 9.

Can you get a core dump from that crash and attach it here?

Comment 8 Kemot1000 2011-08-11 19:48:12 UTC
I see no core dump under root folder.

Comment 9 Jakub Hrozek 2011-08-12 10:38:37 UTC
(In reply to comment #8)
> I see no core dump under root folder.

Core dumps are probably not enabled on your machine. On RHEL6, the easiest way is installing and starting abrt:

https://access.redhat.com/kb/docs/DOC-5353

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt

Comment 10 Jakub Hrozek 2011-08-16 07:31:14 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > I see no core dump under root folder.
> 
> Core dumps are probably not enabled on your machine. On RHEL6, the easiest way
> is installing and starting abrt:
> 
> https://access.redhat.com/kb/docs/DOC-5353
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt

Hello,

did you have any luck getting the core dump? Do you need additional assistance getting it?

Comment 11 Kemot1000 2011-08-19 10:05:04 UTC
Hi,
Sorry was on vacation. 

After installing patch 1.3 I had no problems with sssd so I have no option to get this dump. 

I think that to get this error the best bet would be running 1.2 version and setup debug level to 9. Then hang sssd using all File descriptors. Did you try something like this ? 

T.

Comment 12 Jakub Hrozek 2011-08-25 11:26:10 UTC
(In reply to comment #11)
> Hi,
> Sorry was on vacation. 
> 
> After installing patch 1.3 I had no problems with sssd so I have no option to
> get this dump. 
> 
> I think that to get this error the best bet would be running 1.2 version and
> setup debug level to 9. Then hang sssd using all File descriptors. Did you try
> something like this ? 
> 
> T.

Yes and I could not reproduce the crash.

There is not much we can do without either a way to reproduce the crash or at least a core dump.

I'm going to close this bug with the INSUFFICIENT_DATA resolution, kindly reopen it if there is more data to help us with debugging.