Bug 2041546 - ovnkube: set election timer at RAFT cluster creation time
Summary: ovnkube: set election timer at RAFT cluster creation time
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.10.0
Assignee: Dan Williams
QA Contact: Anurag saxena
Depends On:
TreeView+ depends on / blocked
Reported: 2022-01-17 16:46 UTC by Dan Williams
Modified: 2022-03-10 16:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2022-03-10 16:40:08 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1282 0 None open Bug 2041546: ovn-kubernetes: set RAFT election timer at RAFT cluster creation time 2022-01-17 16:46:35 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:40:32 UTC

Description Dan Williams 2022-01-17 16:46:03 UTC
OVN support added via https://bugzilla.redhat.com/show_bug.cgi?id=1831778

This change ensures that the DBs will have the correct RAFT timer from the beginning and closes a race where DB leadership changes before the timer is able to be set by the CNO container script (which gets later fixed by the dbchecker, but we shouldn't have to do that).

QE verification: we should see the DB container logs immediately set the election timer to the speficied value, like so:

++ /usr/share/ovn/scripts/ovn-ctl --help
++ grep '\--db-nb-election-timer'
+ test -n '  --db-nb-election-timer=MS OVN Northbound RAFT db election timer to use on db creation (in milliseconds)'
+ election_timer=--db-nb-election-timer=10000
+ wait 130
+ exec /usr/share/ovn/scripts/ovn-ctl --db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr= --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '--ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --db-nb-election-timer=10000 run_nb_ovsdb
Creating cluster database /etc/ovn/ovnnb_db.db.
2022-01-17T16:52:28.726Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2022-01-17T16:52:28.728Z|00002|raft|INFO|term 2: 109914 ms timeout expired, starting election
2022-01-17T16:52:28.728Z|00003|raft|INFO|term 2: elected leader by 1+ of 1 servers
2022-01-17T16:52:28.728Z|00004|raft|INFO|local server ID is 5df2
ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
2022-01-17T16:52:28.732Z|00005|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.16.2
2022-01-17T16:52:28.732Z|00006|raft|INFO|Election timer changed from 1000 to 10000

eg in the the "changed from X to Y", that Y should match the OVN_[N|S]B_RAFT_ELECTION_TIMER env var specified in the CNO's manifests/0000_70_cluster-network-operator_03_deployment.yaml, times 1000.

Comment 2 Surya Seetharaman 2022-01-27 15:13:20 UTC
*** Bug 2033514 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2022-03-10 16:40:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.