Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1631813

Summary: LDAP back end: if parsing an entry fails, the whole back end goes offline
Product: Red Hat Enterprise Linux 8 Reporter: Dave <dsimes>
Component: sssdAssignee: Paweł Poławski <ppolawsk>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Jakub Vavra <jvavra>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: aboscatt, atikhono, dsimes, ekeck, grajaiya, jhrozek, lslebodn, mkosek, mzidek, pbrezina, pkettman, thalman, tscherf
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-21 20:30:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave 2018-09-21 15:38:18 UTC
Description of problem:
sssd_be periodically crashing on 7.5 (IdM w/AD trust), did not occur on 7.4 (same IdM w/AD trust configuration)

Version-Release number of selected component (if applicable):
packages have been upgraded several times since the initial issue was reported, currently they are at 7.5 latest:
kernel-3.10.0-862.11.6.el7.x86_64
sssd-1.16.0-19.el7_5.5.x86_64
ipa-server-4.5.4-10.el7_5.3.x86_64

How reproducible:
always

Steps to Reproduce:
1. install 7.5 system
2. install & configure ipa-server
3. set up AD trust

Actual results:
periodic crashes from sssd_be

Expected results:
sssd_be does not crash

Additional info:
we later found out, that by adding these settings in sssd.conf, sssd_be would no longer crash, but the customer did not have performance issues under 7.5, so these performance settings were never considered
subdomain_inherit = ignore_group_members
ignore_group_members = True

Comment 2 Jakub Hrozek 2018-09-21 15:49:17 UTC
Please provide logs and steps to reproduce.

See https://docs.pagure.org/SSSD.sssd/users/troubleshooting.html and https://docs.pagure.org/SSSD.sssd/users/reporting_bugs.html

Comment 3 Dave 2018-09-22 02:46:16 UTC
https://access.redhat.com/support/cases/#/case/02107258

has sssd debug logs & sosreport from the customer incident

Comment 4 Jakub Hrozek 2018-09-24 08:37:34 UTC
(In reply to Dave from comment #3)
> https://access.redhat.com/support/cases/#/case/02107258
> 
> has sssd debug logs & sosreport from the customer incident

The latest debug logs in the case are from May. I briefly checked all three of them, but couldn't find any crash in the logs.

Could you point me to the exact tarball and ideally also the timestamp of the crash?

Comment 5 Jakub Hrozek 2018-10-25 12:05:39 UTC
Since there was no reply for the needinfo in about a month and no new logs etc in the case, I'm closing the bug as INSUFFICIENT_DATA. Please reopen if you can provide the information requested in comment #4.

Comment 7 Dave 2018-12-07 18:54:01 UTC
additionally, if you look on page 2 of the case in the private messages/discussion, there is some further detail/analysis of the crash.. specifically these posts:

PRIVATE MESSAGE (ASSOCIATE)
Heverley, Alan on May 29 2018 at 04:42 PM -04:00

PRIVATE MESSAGE (ASSOCIATE)
Stephenson, Justin on May 29 2018 at 05:00 PM -04:00

(they are rather long log output, or I would have posted them here)

Comment 9 Jakub Hrozek 2018-12-10 09:04:49 UTC
if 'something' is not working for the customer, then these messages are more indicative of a problem:

(Fri May 25 11:00:04 2018) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]

But with bug report as vague as this, I don't know how to help further, sorry. It would be best to start with what is the problem the customer is seeing and document the problem with configuration and log files.

Comment 10 Dave 2018-12-17 14:20:31 UTC
(In reply to Jakub Hrozek from comment #9)
> if 'something' is not working for the customer, then these messages are more
> indicative of a problem:
> 
> (Fri May 25 11:00:04 2018) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data
> Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
> 
> But with bug report as vague as this, I don't know how to help further,
> sorry. It would be best to start with what is the problem the customer is
> seeing and document the problem with configuration and log files.

The Red Hat customer was asking about the constant sssd_be crashes/restarts, (as mentioned in the case), that was not seen in their other production env (under 7.4, this new/problematic env was under 7.5). The issue was seen while attempting to troubleshoot kerberos logins (also in the case). Sorry I keep referring to the case, there is just so much info/detail there, it is alot to attempt to repaste all that into here (May 29 timeframe is a good source for all this detail/answers).

> This is not a crash, but a graceful shutdown, SIGTERM is used as a signal from the main sssd process to any of the worker processes.

so why is sssd constantly restarting sssd_be?

Comment 11 Dave 2018-12-17 14:28:12 UTC
but as mentioned in the case description, this bug had boiled down to this (not sure if this helps point closer to the issue - the customer pointed this out: did not happen in existing envs on 7.4, why does 7.5 have processes that keep crashing under a new/clean 7.5 install):

Description of problem:
sssd_be periodically crashing on 7.5 (IdM w/AD trust), did not occur on 7.4 (same IdM w/AD trust configuration)

Actual results:
periodic crashes from sssd_be

Expected results:
sssd_be does not crash

Additional info:
we later found out, that by adding these settings in sssd.conf, sssd_be would no longer crash (which is explained in the SSS performance tuning doc), but the customer did not have performance issues under 7.5, so these performance settings were never considered
subdomain_inherit = ignore_group_members
ignore_group_members = True


I suppose we could have the customer bring up a 7.6 env (or upgrade existing 7.5 to 7.6?) and see if the issue still occurs? if there is not issue under 7.6, I would think we could mark it as 'resolved in a newer ver' kinda thing

Comment 13 Jakub Hrozek 2018-12-19 09:13:11 UTC
(In reply to Dave from comment #11)
> but as mentioned in the case description, this bug had boiled down to this
> (not sure if this helps point closer to the issue - the customer pointed
> this out: did not happen in existing envs on 7.4, why does 7.5 have
> processes that keep crashing under a new/clean 7.5 install):
> 
> Description of problem:
> sssd_be periodically crashing on 7.5 (IdM w/AD trust), did not occur on 7.4
> (same IdM w/AD trust configuration)
> 
> Actual results:
> periodic crashes from sssd_be
> 
> Expected results:
> sssd_be does not crash
> 
> Additional info:
> we later found out, that by adding these settings in sssd.conf, sssd_be
> would no longer crash (which is explained in the SSS performance tuning
> doc), but the customer did not have performance issues under 7.5, so these
> performance settings were never considered
> subdomain_inherit = ignore_group_members
> ignore_group_members = True
> 
> 

This and the parsing issue leads the to believe that there really is no crash, but a combination of time outs mitigated by the tuning (the time out would also mark the domain as 'inactive') and the earlier parsing issue. Both would mark the domain as inactive and unless there are pre-cached credentials, then logins would subsequently fail.

> I suppose we could have the customer bring up a 7.6 env (or upgrade existing
> 7.5 to 7.6?) and see if the issue still occurs? if there is not issue under
> 7.6, I would think we could mark it as 'resolved in a newer ver' kinda thing

Yes, it would be very nice to reproduce the issue and provide fresh logs. We could also see if the issue with the parsing is still there and if yes, we could provide a patch I guess.

Comment 14 Dave 2019-12-17 16:11:28 UTC
upstream is still open:
https://pagure.io/SSSD/sssd/issue/3908

customer is tracking this bug via a support case

Comment 15 Alexey Tikhonov 2019-12-20 15:41:39 UTC
Updated ticket to reflect the following:

(In reply to Dave from comment #14)
> upstream is still open:
> https://pagure.io/SSSD/sssd/issue/3908
> 
> customer is tracking this bug via a support case

Comment 19 Paweł Poławski 2021-09-21 20:30:08 UTC
I was evaluating this bug multiple times with no positive results.
The amount of logs provided in this BZ is very limited.
Because of this I decided to close this bug for now.

If you disagree with the decision please reopen or open a new support case and create a new BZ.

Comment 20 Paweł Poławski 2021-11-18 10:11:47 UTC
New upstream PR addressing this bug has been submitted: https://github.com/SSSD/sssd/pull/5881