Bug 1261186
Summary: | sssd_be general protection in libsss_idmap.so.0.4.0. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Todd H. Poole <toddhpoolework> |
Component: | sssd | Assignee: | Jakub Hrozek <jhrozek> |
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.1 | CC: | abokovoy, grajaiya, jgalipea, jhrozek, lslebodn, mkosek, mzidek, pbrezina, preichl, rharwood, sbose, ssorce, toddhpoolework |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-01-07 16:38:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Todd H. Poole
2015-09-08 19:59:34 UTC
(In reply to Todd H. Poole from comment #0) > Description of problem: > kernel: traps: sssd_be general protection in libsss_idmap.so.0.4.0. > > After several days of light use (30 or so users logging in and out several > times a day), sssd_be begins misbehaving with the following messages showing > up in /var/log/messages immediately after each login attempt: > > Sep 8 10:59:48 redacted systemd: Starting Session 698 of user toddhpoole. > Sep 8 10:59:48 redacted systemd: Started Session 698 of user toddhpoole. > Sep 8 10:59:48 redacted systemd-logind: New session 698 of user toddhpoole. > Sep 8 10:59:48 redacted systemd-logind: Removed session 698. > Sep 8 10:59:48 redacted kernel: traps: sssd_be[32615] general protection > ip:7f97fd56d331 sp:7fff0906baf0 error:0 in > libsss_idmap.so.0.4.0[7f97fd56a000+5000] > > This prevents users from successfully logging in. > > Neither restarting the sssd service (via 'systemctl restart sssd') nor > purging the sss cache (via 'sss_cache -E') and then trying to reconnect via > SSH appear to resolve the issue: > > Sep 8 11:15:35 redacted systemd: Started System Security Services Daemon. > Sep 8 11:15:51 redacted kernel: traps: sssd_be[792] general protection > ip:7fac644d2331 sp:7fff195e5c30 error:0 in > libsss_idmap.so.0.4.0[7fac644cf000+5000] > Sep 8 11:15:51 redacted sssd[be[redacted.domain.local]]: Starting up > Sep 8 11:17:17 redacted kernel: traps: sssd_be[800] general protection > ip:7ffa2b176331 sp:7fff60e65370 error:0 in > libsss_idmap.so.0.4.0[7ffa2b173000+5000] > Sep 8 11:17:17 redacted sssd[be[redacted.domain.local]]: Starting up > Sep 8 11:17:53 redacted kernel: traps: sssd_be[814] general protection > ip:7fb30dab4331 sp:7fff9038f100 error:0 in > libsss_idmap.so.0.4.0[7fb30dab1000+5000] > Sep 8 11:17:53 redacted sssd[be[redacted.domain.local]]: Starting up > Sep 8 11:19:06 redacted kernel: traps: sssd_be[822] general protection > ip:7f4030cc0331 sp:7fff4a949aa0 error:0 in > libsss_idmap.so.0.4.0[7f4030cbd000+5000] > > Deleting the cache database file (via 'rm -f > /var/lib/sss/db/cache_DOMAIN_redacted.ldb') and then restarting sssd (via > 'systemctl restart sssd') does appear to resolve the issue. > > Version-Release number of selected component (if applicable): > [root@redacted ~]# sssd --version > 1.12.2 > [root@redacted ~]# rpm -qa | grep "sssd" > sssd-ldap-1.12.2-58.el7_1.6.x86_64 > sssd-common-pac-1.12.2-58.el7_1.6.x86_64 > sssd-ad-1.12.2-58.el7_1.6.x86_64 > python-sssdconfig-1.12.2-58.el7_1.6.noarch > sssd-krb5-common-1.12.2-58.el7_1.6.x86_64 > sssd-krb5-1.12.2-58.el7_1.6.x86_64 > sssd-ipa-1.12.2-58.el7_1.6.x86_64 > sssd-1.12.2-58.el7_1.6.x86_64 > sssd-debuginfo-1.12.2-58.el7_1.6.x86_64 > sssd-client-1.12.2-58.el7_1.6.x86_64 > sssd-common-1.12.2-58.el7_1.6.x86_64 > sssd-proxy-1.12.2-58.el7_1.6.x86_64 > This BZ is filed against fedora 22. Which has sssd-1.12.5. So please try to test with this version. If your plan was to failed against el7.1 then It would be better to use different component in Bugzilla. BTW There were some updates in el7. You might try to reproduce with 1.12.2-58.el7_1.14. If there will be still a problem then you might want to test with backported version from fedora 22. https://copr.fedoraproject.org/coprs/lslebodn/sssd-1-12/ Thank you for the feedback Lukas, I'll give those a shot. Unfortunately, given the relatively infrequent nature of these failures, it'll be several days (if not weeks) before I'm able to report back again. Expect an update no later than the 24th of September. For the sake of correctness, I've also gone ahead and updated this report's components to more accurately reflect the actual environment and installed packages. Thank you for changing the bug version. I'll set the needinfo flag to make it clear we need the corefile or logs (ideally with a totally up-to-date version) to help with the issue. It's been 15 days since the latest patch/bugfix release was applied to one of our test/staging clusters, and I'm pleased to report we've not seen the issue return. We've begun deploying these changes to our production environment, which will have a significantly higher number of users exercising these services. If these failures do not return after a few weeks of heavy load in our production environment, then I think we can consider this issue resolved. I'll report back if anything changes. Thanks gentlemen. (In reply to Todd H. Poole from comment #5) > It's been 15 days since the latest patch/bugfix release was applied to one > of our test/staging clusters, and I'm pleased to report we've not seen the > issue return. > > We've begun deploying these changes to our production environment, which > will have a significantly higher number of users exercising these services. > If these failures do not return after a few weeks of heavy load in our > production environment, then I think we can consider this issue resolved. > > I'll report back if anything changes. Thanks gentlemen. Thank you very much for coming back. Please just close this bugzilla if the issue doesn't hit you in the production environment. We haven't heard on this issue for quite some time, closing. |