Bug 1925608
| Summary: | [RFE] make 'random_offset' addon to 'offline_timeout' option configurable | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Antonio Romito <aromito> |
| Component: | sssd | Assignee: | Paweł Poławski <ppolawsk> |
| Status: | CLOSED ERRATA | QA Contact: | Madhuri <mupadhye> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.4 | CC: | atikhono, daniele, dlavu, grajaiya, jhrozek, lslebodn, mupadhye, mzidek, pbrezina, ppolawsk, sgoveas, thalman, tscherf |
| Target Milestone: | rc | Keywords: | FutureFeature, Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | sssd-2.5.0-1.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-09 19:47:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Antonio Romito
2021-02-05 16:36:26 UTC
(In reply to Antonio Romito from comment #0) > The idea would be to have the following formula to calculate the default > random offset: > > random_offset = max(offline_timestamp / 2) max of "offline_timestamp / 2" and ..? "offline_timestamp" should means "offline_timeout"? > Steps to Reproduce: > > 1. change the offline_timeout to a large number (eg 1800) > 2. force a disconnection from LDAP > 3. observe that the reconnection will happen between 1800 and 1830 seconds > > Actual results: > > The random time is always 0..30 What's wrong with it? What is the real life scenario where this causes issues? > "offline_timestamp" should means "offline_timeout"? Yes, sorry >> The random time is always 0..30 > What's wrong with it? If someone set the offline_timeout to be 1800, the idea behind is that they don’t want quick reconnect but rather relax the ldap reconnections. In case of ldap offline_timeout is met, the clients will anyway storm (if no other actions trigger a reconnection) the ldap backend in a short time span, putting pressure on it. > What is the real life scenario where this causes issues? This was spotted during a review with Pavel, as part of the ldap storm of queries which was triggering down the ldap backend every 15 minutes. The main idea behind this request is that it is perfectly fine to have offline_timeout = 60 and randomize it with <0, 30> but if you need to increase the offline timeout to larger value to relax the reconnections, then 30 seconds do not make sense. This RFE is about making also the random offset configurable/adjustable. (In reply to Pavel Březina from comment #4) > The main idea behind this request is that it is perfectly fine to have > offline_timeout = 60 and randomize it with <0, 30> but if you need to > increase the offline timeout to larger value to relax the reconnections, > then 30 seconds do not make sense. 30 seconds is either enough to accommodate all reconnecting clients or not. `offline_timeout` value doesn't matter. I can imagine case where 10k clients reconnecting within 30 seconds (i.e. 3 msec per client on average) can be a trouble. But it doesn't matter if this happens after 60 secs pause or after 900 secs pause, thus I don think there should be any dependency as stated in the description "random_offset = max(offline_timestamp / 2)" Let me rephrase it: The customer expectation is that given random offset is not configurable if they increase the offline_timeout, it would also increase the random offset which does not currently happen. The RFE is about making it configurable. Upstream PR: https://github.com/SSSD/sssd/pull/5549 Upstream ticket: https://github.com/SSSD/sssd/issues/5556 Pushed PR: https://github.com/SSSD/sssd/pull/5549 * `master` * 191b53529700f5d92f3db37b270ed624c53cbaa7 - data_provider: Configure backend probing interval [root@auto-hv-01-guest01 ~]# rpm -q sssd sssd-2.5.0-1.el8.x86_64 [root@auto-hv-01-guest01 ~]# grep offline /etc/sssd/sssd.conf offline_timeout = 30 /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask. /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 40 seconds from now [1623601833] /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:29:53): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:30:33): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 87 seconds from last execution time [1623601920] /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:32:00): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 142 seconds from last execution time [1623602062] /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:34:22): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 265 seconds from last execution time [1623602327] /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 12:38:47): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 499 seconds from last execution time [1623602826] Case 2 [root@auto-hv-01-guest01 ~]# grep offline /etc/sssd/sssd.conf offline_timeout = 40 offline_timeout_random_offset = 10 [root@auto-hv-01-guest01 ~]# grep 'ptask' -ir /var/log/sssd /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:01:34): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [SUDO Full Refresh]: scheduling task 21620 seconds from last execution time [1623625313] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask. /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 42 seconds from now [1623603825] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:03): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:03:45): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 83 seconds from last execution time [1623603908] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:05:08): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 169 seconds from last execution time [1623604077] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:07:57): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 329 seconds from last execution time [1623604406] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 40 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:13:26): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 645 seconds from last execution time [1623605051] case 3 [root@auto-hv-01-guest01 ~]# grep offline /etc/sssd/sssd.conf offline_timeout = 30 offline_timeout_random_offset = 20 offline_timeout_max = 200 [root@auto-hv-01-guest01 ~]# grep 'ptask' -ir /var/log/sssd /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:21:21): [be[LDAP]] [be_ptask_done] (0x0400): Task [SUDO Full Refresh]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:21:21): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [SUDO Full Refresh]: scheduling task 21606 seconds from last execution time [1623626486] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask. /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 45 seconds from now [1623604969] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_offline_cb] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:04): [be[LDAP]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:22:49): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 70 seconds from last execution time [1623605039] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:23:59): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 122 seconds from last execution time [1623605161] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:26:01): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 210 seconds from last execution time [1623605371] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:29:31): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 200 seconds from last execution time [1623605571] /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_execute] (0x0400): Back end is offline /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_execute] (0x0400): Task [Check if online (periodic)]: executing task, timeout 30 seconds /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_done] (0x0400): Task [Check if online (periodic)]: finished successfully /var/log/sssd/sssd_LDAP.log:(2021-06-13 13:32:51): [be[LDAP]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 214 seconds from last execution time [1623605785] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4435 |