Bug 1435013
Summary: | [RFE] Randomize and/or Distribute the execution of rhsmcertd over a large Satellite 6 Deployment | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jason Dickerson <jdickers> |
Component: | subscription-manager | Assignee: | Chris Snyder <csnyder> |
Status: | CLOSED ERRATA | QA Contact: | John Sefler <jsefler> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.3 | CC: | bkearney, csnyder, khowell, redakkan, skallesh, wpinheir |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | subscription-manager-1.19.6-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-01 19:21:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1430554 |
Description
Jason Dickerson
2017-03-22 21:57:58 UTC
I have added an external tracker to a PR against upstream rhsmcertd for an implementation of this feature. Below are the test scenarios that will be used to verify the bug on latest subscription-manager build; subscription-manager: 1.19.12-1.el7 python-rhsm: 1.19.6-1.el7 1)Demonstrates how the Initial auto-heal and cert-checks are randomized between two guest machines with new configuration parameter "splay" set to 1 (ON) 2)Demonstrates how the original behaviour (ie , initial check happens after default configured 2mins interval) when "splay" set to 0 (OFF) 1) Scenario 1 : Demonstrates how the Initial auto-heal and cert-checks are randomized between two guest machines with new configuration parameter "splay" default set to 1 (ON) On machine 1 : -------------------- 1: Register guest machine 1 to server 2: make sure the auto-attach ,cert-check and splay with default value [rhsmcertd] autoattachinterval = [1440] certcheckinterval = [240] splay = 1 3: Restart rhsmcertd and check the rhsmcert.log [root@dhcp151-211 ~]# service rhsmcertd restart Redirecting to /bin/systemctl restart rhsmcertd.service [root@dhcp151-211 ~]# tail -f /var/log/rhsm/rhsmcertd.log Tue May 9 20:12:27 2017 [INFO] (Cert Check) Certificates updated. Wed May 10 00:12:31 2017 [INFO] (Cert Check) Certificates updated. Wed May 10 04:12:29 2017 [INFO] (Cert Check) Certificates updated. Wed May 10 05:35:57 2017 [WARN] (Auto-attach) Update failed (255), retry will occur on next run. Wed May 10 06:44:02 2017 [INFO] rhsmcertd is shutting down... Wed May 10 06:44:02 2017 [INFO] Starting rhsmcertd... Wed May 10 06:44:02 2017 [INFO] Auto-attach interval: 1440.0 minutes [86400 seconds] Wed May 10 06:44:02 2017 [INFO] Cert check interval: 240.0 minutes [14400 seconds] Wed May 10 06:44:02 2017 [INFO] Waiting 2.0 minutes plus 28937 splay seconds [29057 seconds total] before performing first auto-attach. Wed May 10 06:44:02 2017 [INFO] Waiting 2.0 minutes plus 10726 splay seconds [10846 seconds total] before performing first cert check. ^^ Notice the Random splay seconds on the guest machine 1 , due to which the first auto-attach on this machine will be performing at 29057 seconds and cert check at 10846 seconds respectively On machine 2 : --------------------- 1: Register guest machine 2 to server 2: make sure the auto-attach ,cert-check and splay with default value [rhsmcertd] autoattachinterval = [1440] certcheckinterval = [240] splay = 1 3: Restart rhsmcertd and check the rhsmcert.log [root@dhcp35-238 ~]# service rhsmcertd restart Redirecting to /bin/systemctl restart rhsmcertd.service [root@dhcp35-238 ~]# tail -f /var/log/rhsm/rhsmcertd.log Wed May 10 15:47:39 2017 [INFO] Auto-attach interval: 1440.0 minutes [86400 seconds] Wed May 10 15:47:39 2017 [INFO] Cert check interval: 240.0 minutes [14400 seconds] Wed May 10 15:47:39 2017 [INFO] Waiting 2.0 minutes plus 46554 splay seconds [46674 seconds total] before performing first auto-attach. Wed May 10 15:47:39 2017 [INFO] Waiting 2.0 minutes plus 9072 splay seconds [9192 seconds total] before performing first cert check. Wed May 10 16:14:16 2017 [INFO] rhsmcertd is shutting down... Wed May 10 16:14:16 2017 [INFO] Starting rhsmcertd... Wed May 10 16:14:16 2017 [INFO] Auto-attach interval: 1440.0 minutes [86400 seconds] Wed May 10 16:14:16 2017 [INFO] Cert check interval: 240.0 minutes [14400 seconds] Wed May 10 16:14:16 2017 [INFO] Waiting 2.0 minutes plus 62750 splay seconds [62870 seconds total] before performing first auto-attach. Wed May 10 16:14:16 2017 [INFO] Waiting 2.0 minutes plus 3288 splay seconds [3408 seconds total] before performing first cert check. ^^ Notice the Random splay seconds on the guest machine 2 , due to which the first auto-attach on this machine will be performing at 62870 seconds and cert check at 3408 seconds respectively Thus , with new rhsm config parameter 'splay' set to "1" , the machines will have rhsmcertd running at slightly different times there by reducing the load when large number machines restart simulatenouesly 2) Scenarion 2 Demonstrates the orginal behaviour (ie , initial check happens after default configured 2mins interval) when "splay" set to 0 (OFF) [root@dhcp35-238 ~]# subscription-manager config --rhsmcertd.splay 0 [root@dhcp35-238 ~]# service rhsmcertd restart Redirecting to /bin/systemctl restart rhsmcertd.service [root@dhcp35-238 ~]# tail -f /var/log/rhsm/rhsmcertd.log Wed May 10 16:14:16 2017 [INFO] Auto-attach interval: 1440.0 minutes [86400 seconds] Wed May 10 16:14:16 2017 [INFO] Cert check interval: 240.0 minutes [14400 seconds] Wed May 10 16:14:16 2017 [INFO] Waiting 2.0 minutes plus 62750 splay seconds [62870 seconds total] before performing first auto-attach. Wed May 10 16:14:16 2017 [INFO] Waiting 2.0 minutes plus 3288 splay seconds [3408 seconds total] before performing first cert check. Wed May 10 16:43:57 2017 [INFO] rhsmcertd is shutting down... Wed May 10 16:43:57 2017 [INFO] Starting rhsmcertd... Wed May 10 16:43:57 2017 [INFO] Auto-attach interval: 1440.0 minutes [86400 seconds] Wed May 10 16:43:57 2017 [INFO] Cert check interval: 240.0 minutes [14400 seconds] Wed May 10 16:43:57 2017 [INFO] Waiting 2.0 minutes plus 0 splay seconds [120 seconds total] before performing first auto-attach. Wed May 10 16:43:57 2017 [INFO] Waiting 2.0 minutes plus 0 splay seconds [120 seconds total] before performing first cert check. ^^ ^ Notice the Random splay seconds is no longer applied, there by defaulting the initial check to happen in 2mins Conclusion : =========== When splay set to 1 , The randomized splay value will always be between 0 and the interval being randomized. example : for the auto attach splay amount , the value should be between 0 and 86400. (with the default value for autoattachinterval of 1440 min (86400 seconds)) when splay set to 0, the rhsmcertd will be default to 2min check Bsed on the above verification , moving this bug to Verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2083 |