Hide Forgot
Created attachment 517753 [details] sssd_pam.jpg I found this bug and from the description it looks that I have the same problem on one of my servers (https://bugzilla.redhat.com/show_bug.cgi?id=661163). For no reason every few hours (it can happen twice in one hour or every few hours) I see that authentication to my server is not working. When I login with local user and check top I see that sssd_pam is using 99% of CPU. When I restart SSSD everything goes back to normal. I setup debug to 9 to collect some data but was at this point I do not have it yet. Attached is prinscreen from top command showing CPU usage at 99% for sssd_pam module Is there any way to setup debug only for sssd_pam and leave sssd_nss information out of the debug file ? Version-Release number of selected component (if applicable): rpm -aq |grep sssd sssd-1.5.1-34.el6_1.2.x86_64 sssd-client-1.5.1-34.el6_1.2.x86_64 How reproducible: N/A Steps to Reproduce: N/A Additional info: this is what I get when I reset SSSD in /var/log/secure: Aug 9 11:21:12 PROD db2ckpwd 0[3623]: pam_sss(db2:auth): Request to sssd failed. Bad address
(In reply to comment #0) > Is there any way to setup debug only for sssd_pam and leave sssd_nss > information out of the debug file ? > Yes, open the file /etc/sssd/sssd.conf and put "debug_level = 9" into the "[pam]" section. Then restart SSSD. The log file we're interested in is /var/log/sssd/sssd_pam.log This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=727041 -- I just ran a quick test and saw sssd_pam running at 99% when it depleted the file descriptors as well. The logs would tell us for certain (or you can grab the released RPMs that fix #727041 and check if that fixes the issue for you).
this is the only I got in sssd_pam.log (its like 20 the same lines) there. (Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not reconnect to DOMAINNAME provider. I saw this command in one of the posts check_user2 where can I get it to test if this is also problem on my system ?
(In reply to comment #3) > this is the only I got in sssd_pam.log (its like 20 the same lines) there. > > (Thu Aug 11 10:45:42 2011) [sssd[pam]] [pam_dp_reconnect_init] (0): Could not > reconnect to DOMAINNAME provider. > > OK, this is a new bug, different than PAM leaking file descriptors. That code is only reachable when the back end crashes and the pam process is trying to reconnect. Can you check /var/log/messages for signs of sssd_be crashing, or, if you use abrt, try running "abrt -l" ?
This is from /var/log/messages This is the only log like that Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in libtevent.so.0.9.8[7fe71fb64000+9000] This is the only time when it crashed. I think this is related to debug being set on 9.
FYI I tested on QA this problem with file descriptors. And I had this issue also. I tried to authenticate user in the loop and it crashed sssd_pam when File Descriptors reached 1024 With 1.3 patch of I have same amount of file descriptors all the time (around 25)
(In reply to comment #5) > This is from /var/log/messages This is the only log like that > > Aug 11 10:45:52 HOSTNAME kernel: sssd_pam[7766]: segfault at 7fe700000028 ip > 00007fe71fb6647d sp 00007fff75d7dc78 error 4 in > libtevent.so.0.9.8[7fe71fb64000+9000] > > This is the only time when it crashed. I think this is related to debug being > set on 9. Can you get a core dump from that crash and attach it here?
I see no core dump under root folder.
(In reply to comment #8) > I see no core dump under root folder. Core dumps are probably not enabled on your machine. On RHEL6, the easiest way is installing and starting abrt: https://access.redhat.com/kb/docs/DOC-5353 http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt
(In reply to comment #9) > (In reply to comment #8) > > I see no core dump under root folder. > > Core dumps are probably not enabled on your machine. On RHEL6, the easiest way > is installing and starting abrt: > > https://access.redhat.com/kb/docs/DOC-5353 > > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Deployment_Guide/index.html#ch-abrt Hello, did you have any luck getting the core dump? Do you need additional assistance getting it?
Hi, Sorry was on vacation. After installing patch 1.3 I had no problems with sssd so I have no option to get this dump. I think that to get this error the best bet would be running 1.2 version and setup debug level to 9. Then hang sssd using all File descriptors. Did you try something like this ? T.
(In reply to comment #11) > Hi, > Sorry was on vacation. > > After installing patch 1.3 I had no problems with sssd so I have no option to > get this dump. > > I think that to get this error the best bet would be running 1.2 version and > setup debug level to 9. Then hang sssd using all File descriptors. Did you try > something like this ? > > T. Yes and I could not reproduce the crash. There is not much we can do without either a way to reproduce the crash or at least a core dump. I'm going to close this bug with the INSUFFICIENT_DATA resolution, kindly reopen it if there is more data to help us with debugging.