Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1762777

Summary: [OVN] HA Chassis failover won't work if there's stale chassis entries
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: ovn2.11Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: FDP 20.ACC: 33186108, ctrautma, fhallal, jishi, kfida, nusiddiq, qding
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-21 17:02:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Alvarez Sanchez 2019-10-17 13:09:34 UTC
We observed than when there is an HA Chassis entry with an empty chassis column like:

# ovn-sbctl list ha_chassis
_uuid               : 4b114ad4-5425-4b66-8644-7d3d2ff3176b
chassis             : []
external_ids        : {chassis-name="f69c10c5-1f4b-4112-a429-d318a058a17f"}
priority            : 1


All the ports that were bound to that chassis (in the example "f69c10c5-1f4b-4112-a429-d318a058a17f") will not be claimed to other HA Chassis in the HA Chassis group where they belong to.
 
This situation can happen easily if the CMS doesn't remove the chassis from the HA Chassis group when ovn-controller shuts down deleting its chassis entry on some node.
If CMS is down and don't process the chassis removal, HA won't kick in.

Moreover, if ovn-controler dies ungracefully and the chassis entry is stale in the SB database, CMS won't detect anything either and even though the Port_Binding entries will get now the chassis column set to empty, they won't move to the next high prio HA Chassis.

Comment 1 Numan Siddique 2019-12-05 07:17:02 UTC
The fix is available in OVN2.11-2.11.1-19

Comment 3 Jianlin Shi 2019-12-20 07:00:03 UTC
reproduced on ovn2.11-2.11.1-8.el7fdp.x86_64:

[root@dell-per740-12 bz1762777]# rpm -qa | grep -E "openvswitch|ovn"                                  
openvswitch2.11-2.11.0-35.el7fdp.x86_64                                                               
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch                                                 
ovn2.11-2.11.1-8.el7fdp.x86_64                                                                        
ovn2.11-host-2.11.1-8.el7fdp.x86_64                                                                   
ovn2.11-central-2.11.1-8.el7fdp.x86_64

start ovn on server:

[root@dell-per740-12 bz1762777]# bash -x rep.sh
+ systemctl start openvswitch                                                                         
+ systemctl start ovn-northd
+ ovn-nbctl set-connection ptcp:6641
+ ovn-sbctl set-connection ptcp:6642                                                                  
+ ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.30.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.30.25
+ systemctl restart ovn-controller                                                                    
+ ovn-nbctl lr-add lr1                                                                                
+ ovn-nbctl lrp-add lr1 lr1-ls1 00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64                       
+ ovn-nbctl ha-chassis-group-add hagrp1                                                               
+ ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv0 30                                                
+ ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv1 20                                                
++ ovn-nbctl --bare --columns _uuid find ha_chassis_group name=hagrp1                                 
+ hagrp1_uuid=45bc12c5-75f4-4de3-8425-883bb990874e                                                    
+ ovn-nbctl set Logical_Router_Port lr1-ls1 ha-chassis-group=45bc12c5-75f4-4de3-8425-883bb990874e     

start on client:
[root@hp-dl380pg8-12 bz1762777]# bash -x rep.sh                                                       
+ systemctl start openvswitch
+ ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.30.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.30.26
+ systemctl restart ovn-controller

[root@hp-dl380pg8-12 bz1762777]# ovs-vsctl show                                                       
90e4c87f-34d3-46b4-96e0-f852798d29cb                                                                  
    Bridge br-int
        fail_mode: secure                                                                             
        Port br-int
            Interface br-int
                type: internal                                                                        
        Port "ovn-hv1-0"
            Interface "ovn-hv1-0"                                                                     
                type: geneve
                options: {csum="true", key=flow, remote_ip="20.0.30.25"}
                bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
    ovs_version: "2.11.0"

on server:

[root@dell-per740-12 bz1762777]# ovn-sbctl list ha_chassis                                            
_uuid               : 968086f6-c440-459b-b57f-a01a42b2b679                                            
chassis             : 49d3fa31-7392-47f0-832a-bb15b8df3504                                            
external_ids        : {chassis-name="hv1"}                                                            
priority            : 20                                                                              

_uuid               : d5deba9e-12a7-4b61-92f8-60f6991bc8eb                                            
chassis             : 3c99aea8-c155-41ef-a39a-b2748c21c27c                                            
external_ids        : {chassis-name="hv0"}                                                            
priority            : 30
[root@dell-per740-12 bz1762777]# ovn-sbctl list port_binding
_uuid               : e60747ac-18bc-4c3b-bbaf-19e47d973a7d
chassis             : 3c99aea8-c155-41ef-a39a-b2748c21c27c  

<=== the port is bound to chassis hv0(client)
                                          
datapath            : 66b56d95-62b7-4174-91fe-762459b1c833                                            
encap               : []
external_ids        : {}                                                                              
gateway_chassis     : []                                                                              
ha_chassis_group    : c49b48c8-c695-4f11-a0d4-b1dba7307853                                            
logical_port        : "cr-lr1-ls1"                                                                    
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {distributed-port="lr1-ls1"}                                                    
parent_port         : []                                                                              
tag                 : []                                                                              
tunnel_key          : 2                                                                               
type                : chassisredirect                                                                 
virtual_parent      : []                                                                              
                                                                                                      
_uuid               : 5319836c-4b19-42a9-9437-9b4b7a3d5d31                                            
chassis             : []                                                                              
datapath            : 66b56d95-62b7-4174-91fe-762459b1c833                                            
encap               : []                                                                              
external_ids        : {}                                                                              
gateway_chassis     : []                                                                              
ha_chassis_group    : []                                                                              
logical_port        : "lr1-ls1"                                                                       
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {}                                                                              
parent_port         : []                                                                              
tag                 : []                                                                              
tunnel_key          : 1                                                                               
type                : patch                                                                           
virtual_parent      : [] 

stop ovn-controller on client:
[root@hp-dl380pg8-12 bz1762777]# systemctl stop ovn-controller 

on server:
[root@dell-per740-12 bz1762777]# ovn-sbctl list ha_chassis                                            
_uuid               : 968086f6-c440-459b-b57f-a01a42b2b679                                            
chassis             : 49d3fa31-7392-47f0-832a-bb15b8df3504
external_ids        : {chassis-name="hv1"}                                                            
priority            : 20

_uuid               : d5deba9e-12a7-4b61-92f8-60f6991bc8eb                                            
chassis             : []                                                                              
external_ids        : {chassis-name="hv0"}
priority            : 30

<=== chassis of hv0 is null

[root@dell-per740-12 bz1762777]# ovn-sbctl list port_binding
_uuid               : e60747ac-18bc-4c3b-bbaf-19e47d973a7d
chassis             : []

<==== chassis becomes null, not bound to hv1, fail over doesn't work

datapath            : 66b56d95-62b7-4174-91fe-762459b1c833                                            
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : c49b48c8-c695-4f11-a0d4-b1dba7307853                                            
logical_port        : "cr-lr1-ls1"                                                                    
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []
options             : {distributed-port="lr1-ls1"}                                                    
parent_port         : []                                                                              
tag                 : []
tunnel_key          : 2
type                : chassisredirect
virtual_parent      : []

_uuid               : 5319836c-4b19-42a9-9437-9b4b7a3d5d31                                            
chassis             : []
datapath            : 66b56d95-62b7-4174-91fe-762459b1c833                                            
encap               : []                                                                              
external_ids        : {}                                                                              
gateway_chassis     : []                                                                              
ha_chassis_group    : []                                                                              
logical_port        : "lr1-ls1"                                                                       
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {}                                                                              
parent_port         : []                                                                              
tag                 : []                                                                              
tunnel_key          : 1                                                                               
type                : patch                                                                           
virtual_parent      : []

Verified on ovn2.11-2.11.1-24.el7fdp.x86_64:

[root@dell-per740-12 ovn]# rpm -qa | grep -E "openvswitch|ovn"                                        
openvswitch2.11-2.11.0-35.el7fdp.x86_64                                                               
ovn2.11-2.11.1-24.el7fdp.x86_64                                                                       
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch                                                 
ovn2.11-central-2.11.1-24.el7fdp.x86_64                                                               
ovn2.11-host-2.11.1-24.el7fdp.x86_64

[root@dell-per740-12 bz1762777]# ovn-sbctl list ha_chassis
_uuid               : 7f693870-01ab-474c-9574-cdcb35cce22f
chassis             : 814b3332-a72c-408f-86b3-6e0de0863dee
external_ids        : {chassis-name="hv1"}
priority            : 20                                  
                                  
_uuid               : 3c121701-7a67-4cf3-aa2c-476d0a90af4d               
chassis             : 0e926138-e86c-427c-87f7-39c2417bcb73
external_ids        : {chassis-name="hv0"}        
priority            : 30
[root@dell-per740-12 bz1762777]# ovn-sbctl list port_binding
_uuid               : bb61994c-7482-48d6-8da0-0de71513e24e
chassis             : []             
datapath            : 94e30e1f-20aa-4d95-9678-4182184e4089
encap               : []        
external_ids        : {}                                  
gateway_chassis     : []                                  
ha_chassis_group    : []                                  
logical_port        : "lr1-ls1"                           
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]
nat_addresses       : []                  
options             : {}                                  
parent_port         : []          
tag                 : []                                                 
tunnel_key          : 1 
type                : patch                       
virtual_parent      : []
                                                            
_uuid               : 4472beb7-70cd-47b4-8aee-7159e26b11fb
chassis             : 0e926138-e86c-427c-87f7-39c2417bcb73

<=== port bound to hv0

datapath            : 94e30e1f-20aa-4d95-9678-4182184e4089
encap               : []        
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : aece3c53-c5c4-46ef-8473-f8927e0d7ddf
logical_port        : "cr-lr1-ls1"
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]
nat_addresses       : []
options             : {distributed-port="lr1-ls1"}
parent_port         : []
tag                 : []
tunnel_key          : 2
type                : chassisredirect
virtual_parent      : []

stop ovn-controller on hv0:
[root@hp-dl380pg8-12 bz1762777]# systemctl stop ovn-controller

[root@dell-per740-12 bz1762777]# ovn-sbctl list ha_chassis
_uuid               : 7f693870-01ab-474c-9574-cdcb35cce22f
chassis             : 814b3332-a72c-408f-86b3-6e0de0863dee
external_ids        : {chassis-name="hv1"}
priority            : 20

_uuid               : 3c121701-7a67-4cf3-aa2c-476d0a90af4d
chassis             : []

<=== chassis for hv0 is null

external_ids        : {chassis-name="hv0"}
priority            : 30
[root@dell-per740-12 bz1762777]# ovn-sbctl list port_binding
_uuid               : bb61994c-7482-48d6-8da0-0de71513e24e
chassis             : []
datapath            : 94e30e1f-20aa-4d95-9678-4182184e4089
encap               : []                                                                              
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []                                                                              
logical_port        : "lr1-ls1"                                                                       
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {}                                                                              
parent_port         : []
tag                 : []                                                                              
tunnel_key          : 1                                                                               
type                : patch
virtual_parent      : []

_uuid               : 4472beb7-70cd-47b4-8aee-7159e26b11fb                                            
chassis             : 814b3332-a72c-408f-86b3-6e0de0863dee     

<=== port bound to hv1, fail over works
                                       
datapath            : 94e30e1f-20aa-4d95-9678-4182184e4089                                            
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : aece3c53-c5c4-46ef-8473-f8927e0d7ddf                                            
logical_port        : "cr-lr1-ls1"
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {distributed-port="lr1-ls1"}                                                    
parent_port         : []
tag                 : []                                                                              
tunnel_key          : 2
type                : chassisredirect                                                                 
virtual_parent      : []

Comment 4 Jianlin Shi 2019-12-20 07:04:29 UTC
also verified on rhel8 version:

[root@hp-dl380pg8-12 bz1762777]# rpm -qa | grep -E "openvswitch|ovn"
kernel-kernel-networking-openvswitch-ovn-basic-1.0-14.noarch                                          
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch                                                 
ovn2.11-host-2.11.1-24.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-common-1.0-6.noarch
ovn2.11-2.11.1-24.el8fdp.x86_64
ovn2.11-central-2.11.1-24.el8fdp.x86_64
openvswitch2.11-2.11.0-35.el8fdp.x86_64

[root@hp-dl380pg8-12 bz1762777]# ovn-sbctl list ha_chassis  
_uuid               : 5bb2e325-d37a-404a-91af-be61b129f61c
chassis             : 5b48c1e4-41c1-42f1-a451-c064e396653f
external_ids        : {chassis-name="hv0"}
priority            : 30                        
                             
_uuid               : a2d63741-ac8b-4be3-9ef9-da186a0e8d6d
chassis             : e00214c3-9d46-4d4b-9901-9b4cfbd11352
external_ids        : {chassis-name="hv1"}
priority            : 20                                                                                                                                                                                   
[root@hp-dl380pg8-12 bz1762777]# ovn-sbctl list port_binding
_uuid               : 8ef09c14-c7ee-41c1-ad8e-0bc8046b9e6f
chassis             : []                                                       
datapath            : 23d40b53-6bdc-49e3-b270-84125fe5733e
encap               : []                              
external_ids        : {}                              
gateway_chassis     : []                                             
ha_chassis_group    : []                          
logical_port        : "lr1-ls1"                                                                  
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]
nat_addresses       : []                                  
options             : {}                                  
parent_port         : []                  
tag                 : []
tunnel_key          : 1
type                : patch                               
virtual_parent      : []                                  
                                          
_uuid               : 43b0083f-b4db-4471-ae38-c07c5aa1ecc9
chassis             : 5b48c1e4-41c1-42f1-a451-c064e396653f  

<==== port bind to hv0

datapath            : 23d40b53-6bdc-49e3-b270-84125fe5733e
encap               : []
external_ids        : {}                                  
gateway_chassis     : []
ha_chassis_group    : 815ee734-34f4-49a9-8689-a626c0a60940
logical_port        : "cr-lr1-ls1"
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]
nat_addresses       : []       
options             : {distributed-port="lr1-ls1"}                       
parent_port         : []
tag                 : []
tunnel_key          : 2 
type                : chassisredirect
virtual_parent      : []

stop ovn-controller on hv0:

[root@dell-per740-12 bz1762777]# systemctl stop ovn-controller

[root@hp-dl380pg8-12 bz1762777]# ovn-sbctl list ha_chassis
_uuid               : 5bb2e325-d37a-404a-91af-be61b129f61c                                            
chassis             : []                                                                              
external_ids        : {chassis-name="hv0"}                                                            
priority            : 30                                                                              
                                                                                                      
_uuid               : a2d63741-ac8b-4be3-9ef9-da186a0e8d6d                                            
chassis             : e00214c3-9d46-4d4b-9901-9b4cfbd11352                                            
external_ids        : {chassis-name="hv1"}                                                            
priority            : 20                                                                              
[root@hp-dl380pg8-12 bz1762777]# ovn-sbctl list port_binding                                          
_uuid               : 8ef09c14-c7ee-41c1-ad8e-0bc8046b9e6f                                            
chassis             : []                                                                              
datapath            : 23d40b53-6bdc-49e3-b270-84125fe5733e                                            
encap               : []                                                                              
external_ids        : {}                                                                              
gateway_chassis     : []                                                                              
ha_chassis_group    : []                                                                              
logical_port        : "lr1-ls1"                                                                       
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {}                                                                              
parent_port         : []                                                                              
tag                 : []                                                                              
tunnel_key          : 1                                                                               
type                : patch                                                                           
virtual_parent      : []                                                                              
                                                                                                      
_uuid               : 43b0083f-b4db-4471-ae38-c07c5aa1ecc9                                            
chassis             : e00214c3-9d46-4d4b-9901-9b4cfbd11352  

<=== port bound to hv1, fail over works
                                          
datapath            : 23d40b53-6bdc-49e3-b270-84125fe5733e                                            
encap               : []                                                                              
external_ids        : {}                                                                              
gateway_chassis     : []                                                                              
ha_chassis_group    : 815ee734-34f4-49a9-8689-a626c0a60940                                            
logical_port        : "cr-lr1-ls1"                                                                    
mac                 : ["00:de:ad:ff:01:03 192.168.111.254/24 3000::a/64"]                             
nat_addresses       : []                                                                              
options             : {distributed-port="lr1-ls1"}                                                    
parent_port         : []                                                                              
tag                 : []                                                                              
tunnel_key          : 2                                                                               
type                : chassisredirect                                                                 
virtual_parent      : []

Comment 6 errata-xmlrpc 2020-01-21 17:02:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0190