Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2141066

Summary: Possible OVN scale issue - help needed determining the issue
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Andreas Karis <akaris>
Component: ovn22.12Assignee: OVN Team <ovnteam>
Status: CLOSED WONTFIX QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 22.LCC: ctrautma, jiji, mmichels, pjagtap, skharat
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-28 17:51:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2022-11-08 16:38:49 UTC
Description of problem:
This is an OVN clone/fork of https://issues.redhat.com/browse/OCPBUGS-3020
The customer case is 03334226 (data is attached there and can be retrieved with support-shell)

On the customer case, the ovn-dbs can be found in:
tmp.lxlYQSQfbN.tar.gz latest dbs
ovnb_db.rar dbs before the last db rebuild

The databases can we extracted from the archives and must then be converted with dos2unix before they can be analyzed locally, e.g:
~~~
dos2unix ovnsb_db.db.ovnkube-master-vnlj5
~~~

After converting the dbs and  loading them into a container that's running the databases I can then have a look at them (https://github.com/andreaskaris/ovn-trace-container):
~~~
[root@c767284545bf /]# ovn-sbctl list Logical_Flow | grep uuid | wc -l
2022-11-08T16:35:36Z|00001|ovsdb_idl|WARN|Logical_Flow table in OVN_Southbound database lacks controller_meter column (database needs upgrade?)
2022-11-08T16:35:36Z|00002|ovsdb_idl|WARN|Logical_Flow table in OVN_Southbound database lacks tags column (database needs upgrade?)
74647
[root@c767284545bf /]# ovn-sbctl list Port_Binding | grep uuid | wc -l
2022-11-08T16:35:42Z|00001|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_chassis column (database needs upgrade?)
2022-11-08T16:35:42Z|00002|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_encap column (database needs upgrade?)
2022-11-08T16:35:42Z|00003|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks port_security column (database needs upgrade?)
2022-11-08T16:35:42Z|00004|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks requested_additional_chassis column (database needs upgrade?)
2022-11-08T16:35:42Z|00005|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks requested_chassis column (database needs upgrade?)
3839
[root@c767284545bf /]# ovn-sbctl list Logical_DP_Group | grep uuid | wc -l
125
[root@c767284545bf /]# ovn-sbctl list Port_Group | grep uuid | wc -l
618
[root@c767284545bf /]# 
[root@c767284545bf /]# 
[root@c767284545bf /]# ovn-sbctl list Load_Balancer | grep uuid | wc -l
933
[root@c767284545bf /]# ovn-sbctl dump-flows | wc -l
246758
~~~

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Andreas Karis 2022-11-08 16:41:06 UTC
I went through the case history - at October 14th, the cluster only had ca 130 nodes; at that point, RH support recommended an upgrade of the master nodes and they also defrag'ed the etcd db.
On October 20th, we see that the cluster is in a much better shape.
Around that time, a massive addition of nodes happens (October 19th / 20th), more than doubling the cluster's size from ~130 nodes to over 300 nodes. Ever since, then cluster could not be recovered - therefore we expect that this is a scale issue

Comment 4 Andreas Karis 2022-11-08 17:09:50 UTC
$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version  4.8.23   True       False        9m40s  Error while reconciling 4.8.23: some cluster operators have not yet rolled out

https://github.com/openshift/ovn-kubernetes/blob/b5183e8b7b7b9551600dea317bf5c212db0cf4e6/Dockerfile#L36
ARG ovnver=20.12.0-183.el8fdp

Comment 10 Mark Michelson 2023-07-28 17:51:18 UTC
I'm closing this since there has been no activity for over 8 months and the customer issue is closed.