RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1625937 - sssd restarts unexpectedly when enumeration is set to true unless the timeout is raised from its default value
Summary: sssd restarts unexpectedly when enumeration is set to true unless the timeout...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.2
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Alexey Tikhonov
QA Contact: Jakub Vavra
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-06 09:13 UTC by German Parente
Modified: 2022-05-09 19:43 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-09 19:43:33 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description German Parente 2018-09-06 09:13:04 UTC
Description of problem:

we have a customer case with sssd-ad integration with enumeration set to true.
The process is restarted unexpectedly when performing enumeration after the whole set of users is saved into the cache and before processing the groups.

The process does not restart when enumeration is set to false, or with a greater timeout.

I will attach internally the logs.



Version-Release number of selected component (if applicable): sssd-1.16.0-19.el7_5.5.x86_64



How reproducible: always (customer site)

Comment 11 Alexey Tikhonov 2019-09-27 17:57:18 UTC
There are two issues can be seen in those logs:


(I) Performance degradation of enumeration task in RHEL7 vs RHEL6 SSSD version (if customer claim of the same env is valid)

(1) Fetching data:

RHEL6:
> (Wed Sep  5 12:42:02 2018) [sssd[be[AUDI.VWG]]] [be_ptask_execute] (0x0400): Task [enumeration]: executing task, timeout 300 seconds
> ...
> (Wed Sep  5 12:42:22 2018) [sssd[be[AUDI.VWG]]] [sdap_search_user_process] (0x0400): Search for users, returned 12012 results.
  --  fetching data takes 20 seconds

RHEL7:
> (Wed Sep  5 12:42:30 2018) [sssd[be[AUDI.VWG]]] [be_ptask_execute] (0x0400): Task [enumeration]: executing task, timeout 300 seconds
> ...
> (Wed Sep  5 12:42:56 2018) [sssd[be[AUDI.VWG]]] [sdap_search_user_process] (0x0400): Search for users, returned 12012 results.
  --  fetching data takes 26 seconds

So RHEL7 is 30% slower than RHEL6 version. That not that terrible yet.


(2) Saving data:

RHEL6:
> (Wed Sep  5 12:42:22 2018) [sssd[be[AUDI.VWG]]] [sdap_save_users] (0x4000): User 0 processed!
> ...
> (Wed Sep  5 12:42:38 2018) [sssd[be[AUDI.VWG]]] [sdap_get_users_done] (0x4000): Saving 12012 Users - Done
  --  16 seconds

RHEL7:
> (Wed Sep  5 12:42:56 2018) [sssd[be[AUDI.VWG]]] [sdap_save_users] (0x4000): User 0 processed!
> ...
> (Wed Sep  5 12:43:42 2018) [sssd[be[AUDI.VWG]]] [sdap_get_users_done] (0x4000): Saving 12012 Users - Done
  --  46 seconds.

RHEL7 is 2.8 times slower that RHEL6 version. And that *is* terrible.


I do not think reading two additional attributes from AD (that are not set anyway) and updating 6 additional search indexes (in addition to existing 16) explain this huge difference. But not sure.
In RHEL7 there is additionally update of timestamp ldb. But again this should not be that slow as this ldb is NOSYNC.

It would be good to have logs with milliseconds (RHEL7 version spends about 4 ms for every user).

I will keep this ticket open to track investigation of this performance degradation. But customer case is closed (so investigation would require setting up own reproducer), so it seems to be very low prio.



(II) Watchdog doesn't work properly.

> (Wed Sep  5 12:42:56 2018) [sssd[be[AUDI.VWG]]] [sdap_save_users] (0x4000): User 0 processed!
> ...
> (Wed Sep  5 12:43:42 2018) [sssd[be[AUDI.VWG]]] [sdap_get_users_done] (0x4000): Saving 12012 Users - Done
  --  46 seconds.

With default watchdog timeout 10 secs be-process should be terminated during execution of sdap_save_users()  --  not later than in (10+1)*3 seconds. But process is terminated only after task execution.


This appears to be a bug and I opened upstream tickets: https://pagure.io/SSSD/sssd/issue/4089
(and somewhat related https://pagure.io/SSSD/sssd/issue/4088)

I will not link this bz with those upstream tickets explicitly as those bugs was not cause of customer case (logs just helped to spot it indirectly).

Comment 16 Alexey Tikhonov 2022-05-09 19:43:33 UTC
(In reply to Alexey Tikhonov from comment #11)
> 
> (2) Saving data:
> 
> RHEL6:
> > (Wed Sep  5 12:42:22 2018) [sssd[be[AUDI.VWG]]] [sdap_save_users] (0x4000): User 0 processed!
> > ...
> > (Wed Sep  5 12:42:38 2018) [sssd[be[AUDI.VWG]]] [sdap_get_users_done] (0x4000): Saving 12012 Users - Done
>   --  16 seconds
> 
> RHEL7:
> > (Wed Sep  5 12:42:56 2018) [sssd[be[AUDI.VWG]]] [sdap_save_users] (0x4000): User 0 processed!
> > ...
> > (Wed Sep  5 12:43:42 2018) [sssd[be[AUDI.VWG]]] [sdap_get_users_done] (0x4000): Saving 12012 Users - Done
>   --  46 seconds.
> 
> RHEL7 is 2.8 times slower that RHEL6 version. And that *is* terrible.
> 
> 
> I do not think reading two additional attributes from AD (that are not set
> anyway) and updating 6 additional search indexes (in addition to existing
> 16) explain this huge difference. But not sure.
> In RHEL7 there is additionally update of timestamp ldb. But again this
> should not be that slow as this ldb is NOSYNC.


Looking at the logs, this - update of timestamp ldb - is a most significant difference.
Another (I guess, minor) things are:
 - `[sss_domain_get_state] (0x1000): Domain ... is Active`
 - `[sdap_attrs_add_ldap_attr] (0x2000): authorizedRHost is not available for [...].`
 - `[sdap_attrs_add_ldap_attr] (0x2000): mail is not available for [...].`
 - `Removing attribute [userCertificate]`

I think further comparison of RHEL7 with RHEL6 doesn't make sense.
What make sense is to profile current implementation of `sdap_save_user()`.


Note You need to log in before you can comment on or make changes to this bug.