Bug 1883662
Summary: | [sbdb][raft] Tune out of the box timer to be 16sec | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joe Talerico <jtaleric> |
Component: | Networking | Assignee: | Anil Vishnoi <avishnoi> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | avishnoi, dblack, dcbw, fiezzi, mifiedle, rbrattai |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:46:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Joe Talerico
2020-09-29 19:43:30 UTC
Timer is set correctly on 4.6.0-0.nightly-2020-10-05-234751 ovnkube-master-lrrw2 734: - name: OVN_NB_RAFT_ELECTION_TIMER 735- value: "10000" -- 1070: - name: OVN_SB_RAFT_ELECTION_TIMER 1071- value: "16000" ovnkube-master-thrh2 734: - name: OVN_NB_RAFT_ELECTION_TIMER 735- value: "10000" -- 1070: - name: OVN_SB_RAFT_ELECTION_TIMER 1071- value: "16000" ovnkube-master-9t7wm 734: - name: OVN_NB_RAFT_ELECTION_TIMER 735- value: "10000" -- 1070: - name: OVN_SB_RAFT_ELECTION_TIMER 1071- value: "16000" log_ovnkube-master-thrh2 108:2020-10-06T12:52:06Z|00005|raft|INFO|Election timer changed from 1000 to 2000 109:2020-10-06T12:52:06Z|00006|raft|INFO|Election timer changed from 2000 to 4000 110:2020-10-06T12:52:06Z|00007|raft|INFO|Election timer changed from 4000 to 8000 111:2020-10-06T12:52:06Z|00008|raft|INFO|Election timer changed from 8000 to 16000 2903:2020-10-06T12:51:34Z|00005|raft|INFO|Election timer changed from 1000 to 2000 2904:2020-10-06T12:51:34Z|00006|raft|INFO|Election timer changed from 2000 to 4000 2905:2020-10-06T12:51:34Z|00007|raft|INFO|Election timer changed from 4000 to 8000 2906:2020-10-06T12:51:34Z|00008|raft|INFO|Election timer changed from 8000 to 10000 log_ovnkube-master-lrrw2 229:2020-10-06T12:51:36Z|00022|raft|INFO|Election timer changed from 10000 to 2000 230:2020-10-06T12:51:36Z|00023|raft|INFO|Election timer changed from 2000 to 4000 231:2020-10-06T12:51:36Z|00024|raft|INFO|Election timer changed from 4000 to 8000 232:2020-10-06T12:51:36Z|00025|raft|INFO|Election timer changed from 8000 to 10000 2128:2020-10-06T12:52:09Z|00022|raft|INFO|Election timer changed from 16000 to 2000 2129:2020-10-06T12:52:09Z|00023|raft|INFO|Election timer changed from 2000 to 4000 2130:2020-10-06T12:52:09Z|00024|raft|INFO|Election timer changed from 4000 to 8000 2131:2020-10-06T12:52:09Z|00025|raft|INFO|Election timer changed from 8000 to 16000 log_ovnkube-master-9t7wm 125:2020-10-06T12:52:12Z|00023|raft|INFO|Election timer changed from 16000 to 2000 126:2020-10-06T12:52:12Z|00024|raft|INFO|Election timer changed from 2000 to 4000 127:2020-10-06T12:52:12Z|00025|raft|INFO|Election timer changed from 4000 to 8000 128:2020-10-06T12:52:12Z|00026|raft|INFO|Election timer changed from 8000 to 16000 3335:2020-10-06T12:51:39Z|00023|raft|INFO|Election timer changed from 10000 to 2000 3336:2020-10-06T12:51:39Z|00024|raft|INFO|Election timer changed from 2000 to 4000 3337:2020-10-06T12:51:39Z|00025|raft|INFO|Election timer changed from 4000 to 8000 3338:2020-10-06T12:51:39Z|00026|raft|INFO|Election timer changed from 8000 to 10000 @Joe, How many nodes were involved in your scale test? Guess we might need around same to verify this. Although functionally as per Ross comments above, it seems okay. (In reply to Anurag saxena from comment #5) > @Joe, How many nodes were involved in your scale test? Guess we might need > around same to verify this. Although functionally as per Ross comments > above, it seems okay. So, the bummer here is that I was able to cause many leader elections even with the 16 second Timer... I am not sure if this is the silver bullet for the issue. See https://bugzilla.redhat.com/show_bug.cgi?id=1855408#c30 Marking verified. The timer has been changed but there may still be additional issues, per comment 6, tracked in bug 1855408. No need to keep this one open Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |