Bug 1631813
| Summary: | LDAP back end: if parsing an entry fails, the whole back end goes offline | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Dave <dsimes> |
| Component: | sssd | Assignee: | Paweł Poławski <ppolawsk> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Jakub Vavra <jvavra> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | aboscatt, atikhono, dsimes, ekeck, grajaiya, jhrozek, lslebodn, mkosek, mzidek, pbrezina, pkettman, thalman, tscherf |
| Target Milestone: | rc | Keywords: | Reopened, Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-21 20:30:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dave
2018-09-21 15:38:18 UTC
Please provide logs and steps to reproduce. See https://docs.pagure.org/SSSD.sssd/users/troubleshooting.html and https://docs.pagure.org/SSSD.sssd/users/reporting_bugs.html https://access.redhat.com/support/cases/#/case/02107258 has sssd debug logs & sosreport from the customer incident (In reply to Dave from comment #3) > https://access.redhat.com/support/cases/#/case/02107258 > > has sssd debug logs & sosreport from the customer incident The latest debug logs in the case are from May. I briefly checked all three of them, but couldn't find any crash in the logs. Could you point me to the exact tarball and ideally also the timestamp of the crash? Since there was no reply for the needinfo in about a month and no new logs etc in the case, I'm closing the bug as INSUFFICIENT_DATA. Please reopen if you can provide the information requested in comment #4. additionally, if you look on page 2 of the case in the private messages/discussion, there is some further detail/analysis of the crash.. specifically these posts: PRIVATE MESSAGE (ASSOCIATE) Heverley, Alan on May 29 2018 at 04:42 PM -04:00 PRIVATE MESSAGE (ASSOCIATE) Stephenson, Justin on May 29 2018 at 05:00 PM -04:00 (they are rather long log output, or I would have posted them here) if 'something' is not working for the customer, then these messages are more indicative of a problem: (Fri May 25 11:00:04 2018) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline] But with bug report as vague as this, I don't know how to help further, sorry. It would be best to start with what is the problem the customer is seeing and document the problem with configuration and log files. (In reply to Jakub Hrozek from comment #9) > if 'something' is not working for the customer, then these messages are more > indicative of a problem: > > (Fri May 25 11:00:04 2018) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data > Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline] > > But with bug report as vague as this, I don't know how to help further, > sorry. It would be best to start with what is the problem the customer is > seeing and document the problem with configuration and log files. The Red Hat customer was asking about the constant sssd_be crashes/restarts, (as mentioned in the case), that was not seen in their other production env (under 7.4, this new/problematic env was under 7.5). The issue was seen while attempting to troubleshoot kerberos logins (also in the case). Sorry I keep referring to the case, there is just so much info/detail there, it is alot to attempt to repaste all that into here (May 29 timeframe is a good source for all this detail/answers). > This is not a crash, but a graceful shutdown, SIGTERM is used as a signal from the main sssd process to any of the worker processes. so why is sssd constantly restarting sssd_be? but as mentioned in the case description, this bug had boiled down to this (not sure if this helps point closer to the issue - the customer pointed this out: did not happen in existing envs on 7.4, why does 7.5 have processes that keep crashing under a new/clean 7.5 install): Description of problem: sssd_be periodically crashing on 7.5 (IdM w/AD trust), did not occur on 7.4 (same IdM w/AD trust configuration) Actual results: periodic crashes from sssd_be Expected results: sssd_be does not crash Additional info: we later found out, that by adding these settings in sssd.conf, sssd_be would no longer crash (which is explained in the SSS performance tuning doc), but the customer did not have performance issues under 7.5, so these performance settings were never considered subdomain_inherit = ignore_group_members ignore_group_members = True I suppose we could have the customer bring up a 7.6 env (or upgrade existing 7.5 to 7.6?) and see if the issue still occurs? if there is not issue under 7.6, I would think we could mark it as 'resolved in a newer ver' kinda thing (In reply to Dave from comment #11) > but as mentioned in the case description, this bug had boiled down to this > (not sure if this helps point closer to the issue - the customer pointed > this out: did not happen in existing envs on 7.4, why does 7.5 have > processes that keep crashing under a new/clean 7.5 install): > > Description of problem: > sssd_be periodically crashing on 7.5 (IdM w/AD trust), did not occur on 7.4 > (same IdM w/AD trust configuration) > > Actual results: > periodic crashes from sssd_be > > Expected results: > sssd_be does not crash > > Additional info: > we later found out, that by adding these settings in sssd.conf, sssd_be > would no longer crash (which is explained in the SSS performance tuning > doc), but the customer did not have performance issues under 7.5, so these > performance settings were never considered > subdomain_inherit = ignore_group_members > ignore_group_members = True > > This and the parsing issue leads the to believe that there really is no crash, but a combination of time outs mitigated by the tuning (the time out would also mark the domain as 'inactive') and the earlier parsing issue. Both would mark the domain as inactive and unless there are pre-cached credentials, then logins would subsequently fail. > I suppose we could have the customer bring up a 7.6 env (or upgrade existing > 7.5 to 7.6?) and see if the issue still occurs? if there is not issue under > 7.6, I would think we could mark it as 'resolved in a newer ver' kinda thing Yes, it would be very nice to reproduce the issue and provide fresh logs. We could also see if the issue with the parsing is still there and if yes, we could provide a patch I guess. upstream is still open: https://pagure.io/SSSD/sssd/issue/3908 customer is tracking this bug via a support case Updated ticket to reflect the following: (In reply to Dave from comment #14) > upstream is still open: > https://pagure.io/SSSD/sssd/issue/3908 > > customer is tracking this bug via a support case I was evaluating this bug multiple times with no positive results. The amount of logs provided in this BZ is very limited. Because of this I decided to close this bug for now. If you disagree with the decision please reopen or open a new support case and create a new BZ. New upstream PR addressing this bug has been submitted: https://github.com/SSSD/sssd/pull/5881 |