RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1925608 - [RFE] make 'random_offset' addon to 'offline_timeout' option configurable
Summary: [RFE] make 'random_offset' addon to 'offline_timeout' option configurable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.4
Hardware: All
OS: Linux
medium
unspecified
Target Milestone: rc
: ---
Assignee: Paweł Poławski
QA Contact: Madhuri
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-05 16:36 UTC by Antonio Romito
Modified: 2021-11-10 09:08 UTC (History)
13 users (show)

Fixed In Version: sssd-2.5.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-09 19:47:00 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6043631 0 None None None 2021-07-29 13:35:58 UTC
Red Hat Product Errata RHBA-2021:4435 0 None Closed Server Panic - help with reason why it crashed please 2022-05-20 20:51:54 UTC

Description Antonio Romito 2021-02-05 16:36:26 UTC
Description of problem:

offline_timeout specify the number of seconds before the backend (eg LDAP) must be reconnected.
There is a further random time which is 0 to 30 seconds introduced to not have sssd reconnecting at the same time.

Anyway, setting offline_timeout to 900 (or any high number) will keep the random offset time between 0 and 30 seconds.

The idea would be to have the following formula to calculate the default random offset:

random_offset = max(offline_timestamp / 2)

by default, this will remain therefore 0..30.

The customer suggested also to add a parameter in sssd.conf to customize the maximum random value.


Version-Release number of selected component (if applicable):
Any

How reproducible:


Steps to Reproduce:

1. change the offline_timeout to a large number (eg 1800)
2. force a disconnection from LDAP
3. observe that the reconnection will happen between 1800 and 1830 seconds

Actual results:

The random time is always 0..30

Expected results:

The random value should follow the offline_timeout with a formula.

Additional info:

Comment 2 Alexey Tikhonov 2021-02-05 16:51:33 UTC
(In reply to Antonio Romito from comment #0)
> The idea would be to have the following formula to calculate the default
> random offset:
> 
> random_offset = max(offline_timestamp / 2)

max of "offline_timestamp / 2" and ..?

"offline_timestamp" should means "offline_timeout"?


> Steps to Reproduce:
> 
> 1. change the offline_timeout to a large number (eg 1800)
> 2. force a disconnection from LDAP
> 3. observe that the reconnection will happen between 1800 and 1830 seconds
> 
> Actual results:
> 
> The random time is always 0..30

What's wrong with it?

What is the real life scenario where this causes issues?

Comment 3 Daniele Palumbo 2021-02-06 04:24:45 UTC
> "offline_timestamp" should means "offline_timeout"?

Yes, sorry

>> The random time is always 0..30

> What's wrong with it?

If someone set the offline_timeout to be 1800, the idea behind is that they don’t want quick reconnect but rather relax the ldap reconnections.
In case of ldap offline_timeout is met, the clients will anyway storm (if no other actions trigger a reconnection) the ldap backend in a short time span, putting pressure on it.

> What is the real life scenario where this causes issues?

This was spotted during a review with Pavel, as part of the ldap storm of queries which was triggering down the ldap backend every 15 minutes.

Comment 4 Pavel Březina 2021-02-09 12:11:51 UTC
The main idea behind this request is that it is perfectly fine to have offline_timeout = 60 and randomize it with <0, 30> but if you need to increase the offline timeout to larger value to relax the reconnections, then 30 seconds do not make sense. This RFE is about making also the random offset configurable/adjustable.

Comment 5 Alexey Tikhonov 2021-02-09 12:29:55 UTC
(In reply to Pavel Březina from comment #4)
> The main idea behind this request is that it is perfectly fine to have
> offline_timeout = 60 and randomize it with <0, 30> but if you need to
> increase the offline timeout to larger value to relax the reconnections,
> then 30 seconds do not make sense.

30 seconds is either enough to accommodate all reconnecting clients or not. `offline_timeout` value doesn't matter.

I can imagine case where 10k clients reconnecting within 30 seconds (i.e. 3 msec per client on average) can be a trouble.

But it doesn't matter if this happens after 60 secs pause or after 900 secs pause, thus I don think there should be any dependency as stated in the description "random_offset = max(offline_timestamp / 2)"

Comment 6 Pavel Březina 2021-02-09 12:51:28 UTC
Let me rephrase it: The customer expectation is that given random offset is not configurable if they increase the offline_timeout, it would also increase the random offset which does not currently happen. The RFE is about making it configurable.

Comment 12 Paweł Poławski 2021-03-23 22:53:54 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/5549

Comment 13 Paweł Poławski 2021-03-26 15:43:58 UTC
Upstream ticket:
https://github.com/SSSD/sssd/issues/5556

Comment 14 Pavel Březina 2021-04-15 08:28:44 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5549

* `master`
    * 191b53529700f5d92f3db37b270ed624c53cbaa7 - data_provider: Configure backend probing interval

Comment 19 Steeve Goveas 2021-06-14 04:34:26 UTC
[root@auto-hv-01-guest01 ~]# rpm -q sssd
sssd-2.5.0-1.el8.x86_64

[root@auto-hv-01-guest01 ~]# grep offline /etc/sssd/sssd.conf
offline_timeout = 30

/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 40 seconds from now [1623601833]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 87 seconds from last execution time [1623601920]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 142 seconds from last execution time [1623602062]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 265 seconds from last execution time [1623602327]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 499 seconds from last execution time [1623602826]


Case 2
[root@auto-hv-01-guest01 ~]# grep offline /etc/sssd/sssd.conf
offline_timeout = 40
offline_timeout_random_offset = 10

[root@auto-hv-01-guest01 ~]# grep 'ptask'  -ir /var/log/sssd
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:01:34): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [SUDO Full Refresh]: scheduling task 21620 seconds from last execution time [1623625313]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 42 seconds from now [1623603825]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 83 seconds from last execution time [1623603908]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 169 seconds from last execution time [1623604077]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 329 seconds from last execution time [1623604406]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 645 seconds from last execution time [1623605051]


case 3
[root@auto-hv-01-guest01 ~]# grep offline /etc/sssd/sssd.conf
offline_timeout = 30
offline_timeout_random_offset = 20
offline_timeout_max = 200

[root@auto-hv-01-guest01 ~]# grep 'ptask'  -ir /var/log/sssd
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:21:21): [be[LDAP]] [be_ptask_done] (0x0400): Task [SUDO Full Refresh]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:21:21): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [SUDO Full Refresh]: scheduling task 21606 seconds from last execution time [1623626486]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 45 seconds from now [1623604969]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 70 seconds from last execution time [1623605039]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 122 seconds from last execution time [1623605161]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 210 seconds from last execution time [1623605371]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 200 seconds from last execution time [1623605571]
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully
/var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 214 seconds from last execution time [1623605785]

Comment 21 errata-xmlrpc 2021-11-09 19:47:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4435


Note You need to log in before you can comment on or make changes to this bug.