Bug 2222543

Summary: Replacing controller node leads to ovn database partition
Product: Red Hat OpenStack Reporter: ldenny
Component: openstack-tripleoAssignee: Terry Wilson <twilson>
Status: ON_DEV --- QA Contact: Joe H. Rahme <jhakimra>
Severity: urgent Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: astillma, bcafarel, chrisw, drosenfe, gregraka, jamsmith, jelynch, knakai, lsvaty, mariel, mblue, mburns, pgrist, scohen, skaplons, twilson, ykarel
Target Milestone: z1Keywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Currently, when a bootstrap Controller node is replaced, the OVN database cluster is partitioned: with two database clusters for both the northbound and southbound databases. This situation makes instances unusable. + To find the name of the bootstrap Controller node, run the following command: + ---- ssh tripleo-admin@CONTROLLER_IP "sudo hiera -c /etc/puppet/hiera.yaml pacemaker_short_bootstrap_node_name" ---- + Workaround: Perform the steps described in Red Hat KCS solution 7024434: link:https://access.redhat.com/solutions/7024434[Recover from partitioned clustered OVN database].
Story Points: ---
Clone Of:
: 2228451 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ldenny 2023-07-13 05:06:23 UTC
Description of problem:
When a bootstrap controller node is replaced the OVN database cluster is left in a partitioned state, using controller0(c0) as an example (Full output at the bottom, IPs are from lab system):
c0:
Cluster ID: 0eea (0eea95da-5ff3-4719-9475-7bb479e7e07b)
Servers:
    1143 (1143 at tcp:172.17.1.89:6643) (self) next_index=3 match_index=3

c1:
Cluster ID: b2c9 (b2c9d033-f770-4763-998d-f0d7a819fd6c)
Servers:
    a7ec (a7ec at tcp:172.17.1.61:6643) next_index=71 match_index=70 last msg 3122 ms ago
    ba93 (ba93 at tcp:172.17.1.89:6643) next_index=71 match_index=70 last msg 79802 ms ago
    ce2b (ce2b at tcp:172.17.1.10:6643) (self) next_index=64 match_index=70

c2:
Cluster ID: b2c9 (b2c9d033-f770-4763-998d-f0d7a819fd6c)
Servers:
    a7ec (a7ec at tcp:172.17.1.61:6643) (self)
    ba93 (ba93 at tcp:172.17.1.89:6643)
    ce2b (ce2b at tcp:172.17.1.10:6643) last msg 796 ms ago

What we can see is c0 is in a single cluster of it's own, while c1 and c2 are in the old cluster with a stale entry for c0

I'm using the northbound database as an example but this affects both North and South in the same way. 

We configure all clients (ovn_controller, ovn_meta, neutron_api, etc) to connect via an array of all 3 servers. With this partition some requests will route to the empty, partitioned cluster and fails in weird ways like Neutron server could fail to find NB/SB resources if connected to the new single node NB/SB cluster, and failures related to vm create/shelve/migration could be seen when ovn-controller on compute node is connected to this single node cluster(this reconnection generally could happen after ovn-controller restart or compute node reboot)

We need to avoid bootstrapping new cluster during controller replacement. One way could be it first checks if there is an existing ovn raft cluster and if so, joins it rather than bootstrapping its own.

Not yet verified the behavior for non bootstrap controller(ex c1/c2) replacement. But seems just "cluster/kick" might be enough before deploying the new controller node. If yes this would only require documentation changes. This needs verification if just this enough or any other steps are required.
   
Version-Release number of selected component (if applicable):
podman exec ovn_cluster_north_db_server rpm -qa | egrep 'ovn|ovs|neutron'
ovn22.03-22.03.0-69.el9fdp.x86_64
rhosp-ovn-22.03-5.el9ost.noarch
ovn22.03-central-22.03.0-69.el9fdp.x86_64
rhosp-ovn-central-22.03-5.el9ost.noarch

[root@controller-0 ~]# rpm -qa | egrep 'ovn|ovs|neutron'
python3-neutronclient-7.3.0-0.20220707060727.4963c7a.el9ost.noarch
puppet-ovn-18.5.0-0.20220218021734.d496e5a.el9ost.noarch
puppet-neutron-18.5.1-0.20220714150330.3bdf311.el9ost.noarch
[root@controller-0 ~]# cat /etc/rhosp-release
Red Hat OpenStack Platform release 17.0.0 (Wallaby)

[root@controller-0 ~]# podman images |egrep 'ovn|ovs|neutron'
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-server      17.0_20220908.1  
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-nova-novncproxy     17.0_20220908.1  
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ovn-nb-db-server    17.0_20220908.1  
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ovn-sb-db-server    17.0_20220908.1  
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ovn-controller      17.0_20220908.1  
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-ovn-northd          17.0_20220908.1  

How reproducible:
Every time


Steps to Reproduce:
To reproduce single node cluster on c0 and 3 node cluster on c1 and c2 with stale entry for c0 do:
On c0:

```bash
podman exec ovn_cluster_north_db_server rm /var/run/ovn/ovnnb_db.db /var/lib/ovn/.ovnnb_db.db.~lock~
systemctl restart tripleo_ovn_cluster_north_db_server
sleep 3
podman exec ovn_cluster_north_db_server ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
```

to fix, 

on c1 or c2 run:
```
podman exec ovn_cluster_north_db_server ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound <server id of c0>
```

on c0 run:
```
podman exec ovn_cluster_north_db_server rm /var/run/ovn/ovnnb_db.db /var/lib/ovn/.ovnnb_db.db.~lock~
podman exec ovn_cluster_north_db_server ovsdb-tool join-cluster /var/lib/ovn/ovnnb_db.db OVN_Northbound tcp:172.17.1.89:6643 tcp:172.17.1.10:6643
systemctl restart
```

Actual results:
Bootstrap Controller is replaced and creates it's own single node cluster, partitioning its self from the original

Expected results:
Bootstrap Controller is replaced and connects to existing OVN database cluster.


Additional info:
[root@controller-0 ~]# podman exec ovn_cluster_north_db_server rm /var/run/ovn/ovnnb_db.db /var/lib/ovn/.ovnnb_db.db.~lock~
systemctl restart tripleo_ovn_cluster_north_db_server
sleep 3
podman exec ovn_cluster_north_db_server ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
1143
Name: OVN_Northbound
Cluster ID: 0eea (0eea95da-5ff3-4719-9475-7bb479e7e07b)
Server ID: 1143 (1143c224-6a6b-4e0e-9ca1-b24390e44555)
Address: tcp:172.17.1.89:6643
Status: cluster member
Role: leader
Term: 2
Leader: self
Vote: self

Last Election started 2908 ms ago, reason: timeout
Last Election won: 2908 ms ago
Election timer: 10000
Log: [2, 4]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-0000 <-0000
Disconnections: 0
Servers:
    1143 (1143 at tcp:172.17.1.89:6643) (self) next_index=3 match_index=3

[root@controller-1 ~]# podman exec ovn_cluster_north_db_server ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
ce2b
Name: OVN_Northbound
Cluster ID: b2c9 (b2c9d033-f770-4763-998d-f0d7a819fd6c)
Server ID: ce2b (ce2be8c7-5536-4c0f-a3a8-4ee17fc202f8)
Address: tcp:172.17.1.10:6643
Status: cluster member
Role: leader
Term: 33
Leader: self
Vote: self

Last Election started 1565452 ms ago, reason: timeout
Last Election won: 1565448 ms ago
Election timer: 10000
Log: [59, 71]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->a7ec <-a7ec ->ba93
Disconnections: 11
Servers:
    a7ec (a7ec at tcp:172.17.1.61:6643) next_index=71 match_index=70 last msg 3122 ms ago
    ba93 (ba93 at tcp:172.17.1.89:6643) next_index=71 match_index=70 last msg 79802 ms ago
    ce2b (ce2b at tcp:172.17.1.10:6643) (self) next_index=64 match_index=70


[root@controller-2 ~]# podman exec ovn_cluster_north_db_server ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
a7ec
Name: OVN_Northbound
Cluster ID: b2c9 (b2c9d033-f770-4763-998d-f0d7a819fd6c)
Server ID: a7ec (a7ec9d49-2281-4811-89e1-ac50f534ad56)
Address: tcp:172.17.1.61:6643
Status: cluster member
Role: follower
Term: 33
Leader: ce2b
Vote: ce2b

Last Election started 98864902 ms ago, reason: leadership_transfer
Last Election won: 98864894 ms ago
Election timer: 10000
Log: [63, 71]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->ce2b <-ce2b ->ba93
Disconnections: 15
Servers:
    a7ec (a7ec at tcp:172.17.1.61:6643) (self)
    ba93 (ba93 at tcp:172.17.1.89:6643)
    ce2b (ce2b at tcp:172.17.1.10:6643) last msg 796 ms ago