Bug 2007009 - 120 node OVNK cluster is not stable after cluster-density 1000 projects
Summary: 120 node OVNK cluster is not stable after cluster-density 1000 projects
Keywords:
Status: CLOSED DUPLICATE of bug 1959352
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard: perfscale-ovn
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-22 19:36 UTC by Mohit Sheth
Modified: 2022-11-16 00:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 15:30:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mohit Sheth 2021-09-22 19:36:32 UTC
Description of problem:
Running cluster-density 1000 successfully at 120 node scale is one of the scale targets for OVN as the default SDN.
Currently the cluster is not stable when we run the above test. At this point we are not able to get a must-gather pod running as well.

-------------------------------------------------------------------------
ovnkube-master-5scpt   5/6     CrashLoopBackOff   11 (80s ago)    5h41m
ovnkube-master-h2p9p   6/6     Running            5 (99m ago)     5h44m
ovnkube-master-lpv5m   6/6     Running            7 (94m ago)     5h46m
-------------------------------------------------------------------------

-------------------------------------------------------------------------
  Warning  Unhealthy     88m                  kubelet          Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound
++ grep 'Leader: unknown'
++ true
+ leader_status=
  Warning  Unhealthy  88m (x2 over 88m)  kubelet  Readiness probe failed: NB DB Raft leader is unknown to the cluster node.
++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound
++ grep 'Leader: unknown'
+ leader_status='Leader: unknown'
+ [[ ! -z Leader: unknown ]]
+ echo 'NB DB Raft leader is unknown to the cluster node.'
+ exit 1
-------------------------------------------------------------------------

-------------------------------------------------------------------------
F0922 19:01:19.303715       1 ovnkube.go:130] error when trying to initialize go-ovn NB client: couldn't initialize NBDB client: error creating SSL OVNDBClient for database OVN_Northbound at address ssl:10.0.144.235:9641,ssl:10.0.164.234:9641,ssl:10.0.197.135:9641: failed to connec
-------------------------------------------------------------------------


Version-Release number of selected component (if applicable):
4.9

Comment 2 Tim Rozet 2021-09-29 15:30:23 UTC
From the latest scale run the cluster was stable after 120 node cluster density test. The fixes in 1959352 will resolve this issue.

*** This bug has been marked as a duplicate of bug 1959352 ***


Note You need to log in before you can comment on or make changes to this bug.