On all OVN clusters we expect Raft election to converge within 30 seconds of the deletion of all ovnkube-master pods. OVN stateful set does not meet this condition. [09:19:00] INFO> Shell Commands: oc delete pod -l app\=ovnkube-master --kubeconfig=ocp4_admin.kubeconfig pod "ovnkube-master-0" deleted pod "ovnkube-master-1" deleted pod "ovnkube-master-2" deleted [09:19:03] INFO> Exit Status: 0 [09:19:12] INFO> Exit Status: 1 [09:19:17] INFO> Shell Commands: oc exec ovnkube-master-0 --kubeconfig=ocp4_admin.kubeconfig -n clusters-hypershift-ci-15114 --container=northd -i -- ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound aee7 Name: OVN_Northbound Cluster ID: 1a8d (1a8dd9f1-9866-42cc-9ccd-33953fcc3df7) Server ID: aee7 (aee71e75-9e8c-4ea8-868f-9e7abc1be4ce) Address: ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643 Status: cluster member Role: follower Term: 63 Leader: unknown Vote: self Election timer: 10000 Log: [2, 3002] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->e451) (->358a) Disconnections: 0 Servers: aee7 (aee7 at ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) (self) e451 (e451 at ssl:ovnkube-master-1.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) 358a (358a at ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) STDERR: E0610 05:19:19.141910 950758 v2.go:105] read /dev/stdin: resource temporarily unavailable [09:19:19] INFO> Exit Status: 0 [09:19:24] INFO> Shell Commands: oc exec ovnkube-master-0 --kubeconfig=ocp4_admin.kubeconfig -n clusters-hypershift-ci-15114 --container=northd -i -- ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound aee7 Name: OVN_Northbound Cluster ID: 1a8d (1a8dd9f1-9866-42cc-9ccd-33953fcc3df7) Server ID: aee7 (aee71e75-9e8c-4ea8-868f-9e7abc1be4ce) Address: ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643 Status: disconnected from the cluster (election timeout) Role: candidate Term: 64 Leader: unknown Vote: self Last Election started 6120 ms ago, reason: timeout Election timer: 10000 Log: [2, 3002] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->e451) (->358a) Disconnections: 0 Servers: aee7 (aee7 at ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) (self) (voted for aee7) e451 (e451 at ssl:ovnkube-master-1.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) 358a (358a at ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) STDERR: E0610 05:19:26.148518 950936 v2.go:105] read /dev/stdin: resource temporarily unavailable [09:19:26] INFO> Exit Status: 0 [09:19:31] INFO> Shell Commands: oc exec ovnkube-master-0 --kubeconfig=ocp4_admin.kubeconfig -n clusters-hypershift-ci-15114 --container=northd -i -- ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound aee7 Name: OVN_Northbound Cluster ID: 1a8d (1a8dd9f1-9866-42cc-9ccd-33953fcc3df7) Server ID: aee7 (aee71e75-9e8c-4ea8-868f-9e7abc1be4ce) Address: ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643 Status: disconnected from the cluster (election timeout) Role: candidate Term: 65 Leader: unknown Vote: self Last Election started 2856 ms ago, reason: timeout Election timer: 10000 Log: [2, 3002] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->e451) (->358a) Disconnections: 0 Servers: aee7 (aee7 at ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) (self) (voted for aee7) e451 (e451 at ssl:ovnkube-master-1.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) 358a (358a at ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hypershift-ci-15114.svc.cluster.local:9643) STDERR: E0610 05:19:33.424861 951120 v2.go:105] read /dev/stdin: resource temporarily unavailable [09:19:33] INFO> Exit Status: 0 Unknown leader (RuntimeError)
I tried to reproduce this issue locally, some findings: 1) it takes ~1m for the ovnkube-master-0 in mamagement cluster to pass containercreating. oc describe pod ovnkube-master-0 doesn't show any event on why nbdb takes 40s to start; looking at the nbdb container log, it doesn't show any abnormal behavior, it detects the ovn db file and start without delay. not sure if it needs time to mount the storage PV where database file is saved (we don't use PV in normal openshift deployment, only hypershift) 2) ovnkube-master containers are crashlooping in both ovnkube-master-1 and ovnkube-master-2 pods ### ovnkube-master container log ### F0616 05:10:26.486395 1 ovnkube.go:133] error when trying to initialize libovsdb SB client: unable to connect to any endpoints: failed to connect to ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9642: failed to open connection: dial tcp: lookup ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local: no such host. failed to connect to ssl:ovnkube-master-1.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9642: endpoint is not leader. failed to connect to ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9642: endpoint is not leader In the above ovnkube-master container log, it failed to find the endpoint leader which results in the ovnkube-master container crashes contineously. I didn't find any leader election configmap in hosted cluster namespace, but in openshift-ovn-kubernetes namespace, there is one created by management ovnk deployment. According to ovn-config-namespace settings in ovnkube-config configmap, it still uses openshift-ovn-kubernetes namespace for hypershift deployment which is wrong since it is supposed to use hostedcluser namespace on management cluster. we would need to change the ovn-config-namespace to hostedcluster namespace for it to create its own election lock. PS: this ovn-config-namespace config is also used in ovnk metrics, dbmanager and topology version and could potentially cause other issues.
(In reply to zenghui.shi from comment #2) > I tried to reproduce this issue locally, some findings: > > 1) it takes ~1m for the ovnkube-master-0 in mamagement cluster to pass > containercreating. > > oc describe pod ovnkube-master-0 doesn't show any event on why nbdb takes > 40s to start; > looking at the nbdb container log, it doesn't show any abnormal behavior, it > detects the ovn db file and start without delay. not sure if it needs time > to mount the storage PV where database file is saved (we don't use PV in > normal openshift deployment, only hypershift) > > 2) ovnkube-master containers are crashlooping in both ovnkube-master-1 and > ovnkube-master-2 pods > > ### ovnkube-master container log ### > > F0616 05:10:26.486395 1 ovnkube.go:133] error when trying to > initialize libovsdb SB client: unable to connect to any endpoints: failed to > connect to > ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster. > local:9642: failed to open connection: dial tcp: lookup > ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster. > local: no such host. failed to connect to > ssl:ovnkube-master-1.ovnkube-master-internal.clusters-hostedovn.svc.cluster. > local:9642: endpoint is not leader. failed to connect to > ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster. > local:9642: endpoint is not leader > > > In the above ovnkube-master container log, it failed to find the endpoint > leader which results in the ovnkube-master container crashes contineously. > I didn't find any leader election configmap in hosted cluster namespace, but > in openshift-ovn-kubernetes namespace, there is one created by management > ovnk deployment. > According to ovn-config-namespace settings in ovnkube-config configmap, it > still uses openshift-ovn-kubernetes namespace for hypershift deployment > which is wrong since it is supposed to use hostedcluser namespace on > management cluster. we would need to change the ovn-config-namespace to > hostedcluster namespace for it to create its own election lock. Ignore the above paragrah, the election lock is created in guest cluster openshift-ovn-kubernetes namesapce and it works fine.
log from one of the nbdb containers (ovnkube-master-1) : -> nbdb starts at 06:49:50 2022-06-16T06:49:50+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local, K8S_NODE_IP=10.0.135.118 + echo '2022-06-16T06:49:50+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local, K8S_NODE_IP=10.0.135.118' + initial_raft_create=true + initialize=false + [[ ! -e /etc/ovn/ovnnb_db.db ]] + [[ false == \t\r\u\e ]] + wait 10 + exec /usr/share/ovn/scripts/ovn-ctl --db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=ovnkube-master-1.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '--ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_nb_ovsdb [...] ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory) 2022-06-16T06:49:50.246Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log 2022-06-16T06:49:50Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-06-16T06:49:50Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-06-16T06:49:50.267Z|00002|dns_resolve|WARN|ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local: failed to resolve 2022-06-16T06:49:50.267Z|00003|raft|INFO|local server ID is 08ee 2022-06-16T06:49:50.284Z|00004|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.17.2 -> failed to resolve addresses of the other 2 members 2022-06-16T06:49:50.288Z|00005|stream_ssl|ERR|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:49:50.288Z|00006|reconnect|INFO|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connecting... 2022-06-16T06:49:50.288Z|00007|reconnect|INFO|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connection attempt failed (Address family not supported by protocol) 2022-06-16T06:49:50.288Z|00008|stream_ssl|ERR|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:49:50.288Z|00009|reconnect|INFO|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connecting... 2022-06-16T06:49:50.288Z|00010|reconnect|INFO|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connection attempt failed (Address family not supported by protocol) -> self connected 2022-06-16T06:49:51Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-06-16T06:49:51Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected -> continue connect the other members in background 2022-06-16T06:49:57.288Z|00030|reconnect|INFO|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: continuing to reconnect in the background but suppressing further logging 2022-06-16T06:49:57.289Z|00034|reconnect|INFO|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: continuing to reconnect in the background but suppressing further logging 2022-06-16T06:50:00.292Z|00035|memory|INFO|38604 kB peak resident set size after 10.0 seconds 2022-06-16T06:50:00.292Z|00036|memory|INFO|atoms:3237 cells:3235 monitors:0 raft-connections:2 raft-log:1103 sessions:1 triggers:1 txn-history:55 txn-history-atoms:3200 -> election expired several times 2022-06-16T06:50:00.374Z|00037|raft|INFO|term 43: 10107 ms timeout expired, starting election 2022-06-16T06:50:11.144Z|00040|raft|INFO|term 44: 10770 ms timeout expired, starting election 2022-06-16T06:50:13.290Z|00041|stream_ssl|ERR|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:50:13.290Z|00042|stream_ssl|ERR|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol -> alarm 2022-06-16T06:50:20Z|00005|fatal_signal|WARN|terminating with signal 14 (Alarm clock) /usr/share/openvswitch/scripts/ovs-lib: line 109: 89 Alarm clock "$@" Waiting for OVN_Northbound to come up ... failed! -> continue connecting after alarm 2022-06-16T06:50:21.290Z|00043|stream_ssl|ERR|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:50:21.290Z|00044|stream_ssl|ERR|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol -> election expired several times 2022-06-16T06:50:21.531Z|00045|raft|INFO|term 45: 10386 ms timeout expired, starting election 2022-06-16T06:50:32.517Z|00048|raft|INFO|term 46: 10986 ms timeout expired, starting election 2022-06-16T06:50:42.996Z|00051|raft|INFO|term 47: 10479 ms timeout expired, starting election -> connected leader master-2 at 06:50:53, total time spent (06:49:50 ~ 06:50:53 = 63s ) 2022-06-16T06:50:53.294Z|00055|dns_resolve|WARN|ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local: failed to resolve 2022-06-16T06:50:53.294Z|00056|stream_ssl|ERR|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:50:53.299Z|00057|reconnect|INFO|ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connected 2022-06-16T06:50:53.316Z|00058|raft|INFO|ssl:10.129.2.3:50674: learned server ID 2e5d 2022-06-16T06:50:53.316Z|00059|raft|INFO|ssl:10.129.2.3:50674: learned remote address ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643 2022-06-16T06:50:54.475Z|00060|raft|INFO|ssl:10.131.0.3:59782: learned server ID d65b 2022-06-16T06:50:54.475Z|00061|raft|INFO|ssl:10.131.0.3:59782: learned remote address ssl:ovnkube-master-2.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643 2022-06-16T06:50:55.103Z|00062|raft|INFO|server d65b is leader for term 49 2022-06-16T06:51:01.295Z|00063|stream_ssl|ERR|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:51:09.296Z|00064|stream_ssl|ERR|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connect: Address family not supported by protocol 2022-06-16T06:51:17.303Z|00065|reconnect|INFO|ssl:ovnkube-master-0.ovnkube-master-internal.clusters-hostedovn.svc.cluster.local:9643: connected -> connected master-0 at 06:51:17, total time spent (06:49:50 ~ 06:51:17 = 87s) It took ~1m to resolve and connect the pod dns name of the other members, which includes one connection alarm and several election expirations. The "address family not supported by protocol" message seems an indication that ovnk pod IP is not resolvable. Given that ovnk pods are not hostnetworked in hypershift, their IP change everytime during recreation, I wonder whether the delay is caused by pod dns config not populated immediately.
> Given that ovnk pods are not hostnetworked in hypershift, their IP change > everytime during recreation, I wonder whether the delay is caused by pod dns > config not populated immediately. The max ttl timeout in coredns config is set to 900 which may result in long resolve time for db headless service name. # oc get configmap/dns-default -n openshift-dns -o yaml | grep cache -A3 cache 900 { denial 9984 30 } reload Can we try and see if it can be improved by using short cache time with the following steps? 1) set unmanaged to dns operator managementState: # oc patch dns.operator.openshift.io default --type merge --patch '{"spec":{"managementState":"Unmanaged"}}' 2) edit the coredns configmap to replace 900 with 30 or 10 or 5 oc edit configmap/dns-default -n openshift-dns 3) rerun the db HA tests
> In the above ovnkube-master container log, it failed to find the endpoint leader which results in the ovnkube-master container crashes contineously. > I didn't find any leader election configmap in hosted cluster namespace, but in openshift-ovn-kubernetes namespace, there is one created by management ovnk deployment. > According to ovn-config-namespace settings in ovnkube-config configmap, it still uses openshift-ovn-kubernetes namespace for hypershift deployment which is wrong since it is supposed to use hostedcluser namespace on management cluster. we would need to change the ovn-config-namespace to hostedcluster namespace for it to create its own election lock. This looks like an additional issue that should be fixed, we have had issues if the leader election lock is not released properly BZ 2089807 and BZ 1944180.
With cache 5 election finished in ~1m06s With cache 10 election finished in ~1m05s With cache 60 election finished in ~1m15s This is better but still not as good as regular OVN. Not sure what the ultimate consequences will be.
Tested on 4.11.0-0.ci.test-2022-07-19-205901-ci-ln-rqvyry2-latest pod "ovnkube-master-0" deleted pod "ovnkube-master-1" deleted pod "ovnkube-master-2" deleted [00:08:28] INFO> Exit Status: 0 [00:08:53] INFO> cb.new_north_leader.name = ovnkube-master-1 ~ 25 seconds pod "ovnkube-master-0" deleted pod "ovnkube-master-1" deleted pod "ovnkube-master-2" deleted [00:12:11] INFO> Exit Status: 0 [00:12:43] INFO> cb.new_north_leader.name = ovnkube-master-2 ~ 32 seconds Pod election logs. # rg -e 'server \w+ is leader for term \d+' -e 'local server ID is \w+' -e 'elected leader by' -e 'learned server ID' -e '^\d+[^s]*starting [ns]bdb CLUSTER_INITIATOR_IP' logs-1/ logs-1/log_ovnkube-master-2 118:2022-07-20T00:08:40+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local, K8S_NODE_IP=10.0.137.178 167:2022-07-20T00:08:40.860Z|00002|raft|INFO|local server ID is 09c9 179:2022-07-20T00:08:47.894Z|00010|raft|INFO|server e67c is leader for term 4 183:2022-07-20T00:08:52.660Z|00013|raft|INFO|ssl:10.128.2.39:59954: learned server ID e67c 185:2022-07-20T00:09:00.204Z|00015|raft|INFO|ssl:10.129.2.43:50862: learned server ID 3924 201:2022-07-20T00:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local 264:2022-07-20T00:08:49.325Z|00002|raft|INFO|local server ID is 54ef 270:2022-07-20T00:08:49.649Z|00008|raft|INFO|ssl:10.129.2.43:52048: learned server ID bb47 272:2022-07-20T00:08:50.219Z|00010|raft|INFO|ssl:10.128.2.39:42294: learned server ID 5db2 284:2022-07-20T00:09:05.991Z|00020|raft|INFO|term 5: elected leader by 2+ of 3 servers logs-1/log_ovnkube-master-0 112:2022-07-20T00:08:37+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local, K8S_NODE_IP=10.0.185.106 161:2022-07-20T00:08:37.178Z|00002|raft|INFO|local server ID is 3924 172:2022-07-20T00:08:37.659Z|00011|raft|INFO|ssl:10.128.2.39:37260: learned server ID e67c 192:2022-07-20T00:08:40.882Z|00029|raft|INFO|ssl:10.131.0.54:51198: learned server ID 09c9 206:2022-07-20T00:08:47.894Z|00043|raft|INFO|server e67c is leader for term 4 226:2022-07-20T00:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local 289:2022-07-20T00:08:49.587Z|00002|raft|INFO|local server ID is bb47 295:2022-07-20T00:08:50.218Z|00008|raft|INFO|ssl:10.128.2.39:44708: learned server ID 5db2 297:2022-07-20T00:08:50.377Z|00010|raft|INFO|ssl:10.131.0.54:38266: learned server ID 54ef 305:2022-07-20T00:09:05.992Z|00016|raft|INFO|server 54ef is leader for term 5 logs-1/log_ovnkube-master-1 125:2022-07-20T00:08:37+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local, K8S_NODE_IP=10.0.207.171 174:2022-07-20T00:08:37.636Z|00002|raft|INFO|local server ID is e67c 194:2022-07-20T00:08:40.884Z|00018|raft|INFO|ssl:10.131.0.54:56182: learned server ID 09c9 204:2022-07-20T00:08:47.893Z|00028|raft|INFO|term 4: elected leader by 2+ of 3 servers 218:2022-07-20T00:08:52.203Z|00041|raft|INFO|ssl:10.129.2.43:45900: learned server ID 3924 235:2022-07-20T00:08:50+00:00 - starting sbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local 298:2022-07-20T00:08:50.157Z|00002|raft|INFO|local server ID is 5db2 304:2022-07-20T00:08:50.378Z|00008|raft|INFO|ssl:10.131.0.54:50894: learned server ID 54ef 306:2022-07-20T00:08:50.672Z|00010|raft|INFO|ssl:10.129.2.43:35428: learned server ID bb47 313:2022-07-20T00:09:05.992Z|00015|raft|INFO|server 54ef is leader for term 5 logs-2/log_ovnkube-master-1 095:2022-07-20T00:12:22+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local, K8S_NODE_IP=10.0.207.171 147:2022-07-20T00:12:22.259Z|00002|raft|INFO|local server ID is e67c 156:2022-07-20T00:12:22.778Z|00011|raft|INFO|ssl:10.129.2.44:32956: learned server ID 3924 176:2022-07-20T00:12:26.632Z|00029|raft|INFO|ssl:10.131.0.55:58540: learned server ID 09c9 192:2022-07-20T00:12:37.478Z|00045|raft|INFO|server 09c9 is leader for term 6 209:2022-07-20T00:12:38+00:00 - starting sbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local 272:2022-07-20T00:12:38.755Z|00002|raft|INFO|local server ID is 5db2 278:2022-07-20T00:12:39.225Z|00008|raft|INFO|ssl:10.129.2.44:58672: learned server ID bb47 280:2022-07-20T00:12:39.262Z|00010|raft|INFO|ssl:10.131.0.55:45126: learned server ID 54ef 293:2022-07-20T00:12:55.301Z|00021|raft|INFO|server bb47 is leader for term 6 logs-2/log_ovnkube-master-0 097:2022-07-20T00:12:22+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local, K8S_NODE_IP=10.0.185.106 146:2022-07-20T00:12:22.755Z|00002|raft|INFO|local server ID is 3924 166:2022-07-20T00:12:26.631Z|00018|raft|INFO|ssl:10.131.0.55:45784: learned server ID 09c9 176:2022-07-20T00:12:37.478Z|00028|raft|INFO|server 09c9 is leader for term 6 179:2022-07-20T00:12:45.283Z|00030|raft|INFO|ssl:10.128.2.40:54150: learned server ID e67c 196:2022-07-20T00:12:39+00:00 - starting sbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local 259:2022-07-20T00:12:39.173Z|00002|raft|INFO|local server ID is bb47 265:2022-07-20T00:12:39.260Z|00008|raft|INFO|ssl:10.131.0.55:51526: learned server ID 54ef 267:2022-07-20T00:12:39.812Z|00010|raft|INFO|ssl:10.128.2.40:54192: learned server ID 5db2 277:2022-07-20T00:12:55.300Z|00018|raft|INFO|term 6: elected leader by 2+ of 3 servers logs-2/log_ovnkube-master-2 1639:2022-07-20T00:12:26+00:00 - starting nbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local, K8S_NODE_IP=10.0.137.178 1688:2022-07-20T00:12:26.608Z|00002|raft|INFO|local server ID is 09c9 1702:2022-07-20T00:12:37.283Z|00012|raft|INFO|ssl:10.128.2.40:38020: learned server ID e67c 1705:2022-07-20T00:12:37.477Z|00015|raft|INFO|term 6: elected leader by 2+ of 3 servers 1719:2022-07-20T00:12:45.781Z|00028|raft|INFO|ssl:10.129.2.44:41054: learned server ID 3924 1735:2022-07-20T00:12:39+00:00 - starting sbdb CLUSTER_INITIATOR_IP=ovnkube-master-0.ovnkube-master-internal.clusters-hypershift-ci-25142.svc.cluster.local 1798:2022-07-20T00:12:39.208Z|00002|raft|INFO|local server ID is 54ef 1804:2022-07-20T00:12:39.813Z|00008|raft|INFO|ssl:10.128.2.40:52762: learned server ID 5db2 1808:2022-07-20T00:12:40.225Z|00010|raft|INFO|ssl:10.129.2.44:47806: learned server ID bb47 1813:2022-07-20T00:12:55.301Z|00015|raft|INFO|server bb47 is leader for term 6
Fix in 4.11.0-0.nightly-2022-08-18-223628 pre-verified.
*** Bug 2093057 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.11.3 packages and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6287
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days