Description of problem: Running cluster-density 1000 successfully at 120 node scale is one of the scale targets for OVN as the default SDN. Currently the cluster is not stable when we run the above test. At this point we are not able to get a must-gather pod running as well. ------------------------------------------------------------------------- ovnkube-master-5scpt 5/6 CrashLoopBackOff 11 (80s ago) 5h41m ovnkube-master-h2p9p 6/6 Running 5 (99m ago) 5h44m ovnkube-master-lpv5m 6/6 Running 7 (94m ago) 5h46m ------------------------------------------------------------------------- ------------------------------------------------------------------------- Warning Unhealthy 88m kubelet Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound ++ grep 'Leader: unknown' ++ true + leader_status= Warning Unhealthy 88m (x2 over 88m) kubelet Readiness probe failed: NB DB Raft leader is unknown to the cluster node. ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound ++ grep 'Leader: unknown' + leader_status='Leader: unknown' + [[ ! -z Leader: unknown ]] + echo 'NB DB Raft leader is unknown to the cluster node.' + exit 1 ------------------------------------------------------------------------- ------------------------------------------------------------------------- F0922 19:01:19.303715 1 ovnkube.go:130] error when trying to initialize go-ovn NB client: couldn't initialize NBDB client: error creating SSL OVNDBClient for database OVN_Northbound at address ssl:10.0.144.235:9641,ssl:10.0.164.234:9641,ssl:10.0.197.135:9641: failed to connec ------------------------------------------------------------------------- Version-Release number of selected component (if applicable): 4.9
From the latest scale run the cluster was stable after 120 node cluster density test. The fixes in 1959352 will resolve this issue. *** This bug has been marked as a duplicate of bug 1959352 ***