Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1761370

Summary: [ovn_cluster][RHEL 8] master node can't be up after restart openvswitch
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: OVNAssignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 19.GCC: ctrautma, jiji, jishi, kfida, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-11 12:19:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Numan Siddique 2019-10-14 09:31:29 UTC
This bug was initially created as a copy of Bug #1684419

I am copying this bug because: 



Description of problem:
this bug is similar with a ovs 2.9 bug bz1684363 on rhel7

Version-Release number of selected component (if applicable):
[root@hp-dl388g8-02 ovn_ha]# uname -a
Linux hp-dl388g8-02.rhts.eng.pek2.redhat.com 4.18.0-64.el8.x86_64 #1 SMP Wed Jan 23 20:50:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl388g8-02 ovn_ha]# rpm -qa | grep openvswitch
openvswitch2.11-ovn-common-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-30.noarch
openvswitch2.11-ovn-central-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
openvswitch-selinux-extra-policy-1.0-10.el8fdp.noarch
openvswitch2.11-ovn-host-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
openvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64

How reproducible:
everytime

Steps to Reproduce:
1.set up cluster with 3 nodes as ovndb_servers
2.restart openvswitch on master node

[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:40:50 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     Masters: [ 70.0.0.2 ]
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# systemctl restart openvswitch
[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:41:21 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     Masters: [ 70.0.0.2 ]
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:41:25 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     ovndb_servers	(ocf::ovn:ovndb-servers):	FAILED 70.0.0.2
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Failed Resource Actions:
* ovndb_servers_demote_0 on 70.0.0.2 'not running' (7): call=17, status=complete, exitreason='',
    last-rc-change='Fri Mar  1 03:41:22 2019', queued=0ms, exec=97ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:41:27 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     ovndb_servers	(ocf::ovn:ovndb-servers):	FAILED 70.0.0.2
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Failed Resource Actions:
* ovndb_servers_demote_0 on 70.0.0.2 'not running' (7): call=17, status=complete, exitreason='',
    last-rc-change='Fri Mar  1 03:41:22 2019', queued=0ms, exec=97ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# 



Actual results:
the node can't be back after restart openvswitch

Expected results:
the node can be back after restart openvswitch

Additional info:

Comment 3 haidong li 2019-11-08 03:03:34 UTC
This issue is blocked by bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1769202

Comment 4 Jianlin Shi 2019-11-09 07:38:06 UTC
Verified on ovn2.12.0-7:

[root@ibm-x3650m5-03 ovn_ha]# ip addr sh eno3                                                         
4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000         
    link/ether 08:94:ef:04:78:5a brd ff:ff:ff:ff:ff:ff                                                
    inet 70.11.0.2/24 scope global eno3
       valid_lft forever preferred_lft forever                                                        
    inet 70.11.0.50/24 scope global secondary eno3
       valid_lft forever preferred_lft forever                                                        
    inet6 2001::a94:efff:fe04:785a/64 scope global dynamic mngtmpaddr
       valid_lft 86359sec preferred_lft 14359sec
[root@ibm-x3650m5-03 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.11.0.12 (version 2.0.2-3.el8-744a30d655) - partition with quorum                       
Last updated: Sat Nov  9 02:36:03 2019                                                                
Last change: Sat Nov  9 02:31:49 2019 by root via crm_attribute on 70.11.0.2                          

3 nodes configured
4 resources configured

Online: [ 70.11.0.2 70.11.0.12 70.11.0.20 ]                                                           

Full list of resources:

 ip-70.11.0.50  (ocf::heartbeat:IPaddr2):       Started 70.11.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)                                          
     Masters: [ 70.11.0.2 ]
     Slaves: [ 70.11.0.12 70.11.0.20 ]                                                                

Daemon Status:                                                                                        
  corosync: active/enabled                                                                            
  pacemaker: active/enabled                                                                           
  pcsd: active/enabled                                                                                
[root@ibm-x3650m5-03 ovn_ha]# systemctl restart openvswitch

<==== restart openvswitch on master 70.11.0.2

[root@ibm-x3650m5-03 ovn_ha]# pcs status
Cluster name: my_cluster                                                                              
Stack: corosync                                                                                       
Current DC: 70.11.0.12 (version 2.0.2-3.el8-744a30d655) - partition with quorum
Last updated: Sat Nov  9 02:36:23 2019                                                                
Last change: Sat Nov  9 02:31:49 2019 by root via crm_attribute on 70.11.0.2
                                                                                                      
3 nodes configured                                                                                    
4 resources configured                                                                                
                                                                                                      
Online: [ 70.11.0.2 70.11.0.12 70.11.0.20 ]                                                           
                                                                                                      
Full list of resources:                                                                               
                                                                                                      
 ip-70.11.0.50  (ocf::heartbeat:IPaddr2):       Started 70.11.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     Masters: [ 70.11.0.2 ]                                                                           
     Slaves: [ 70.11.0.12 70.11.0.20 ]                                                                
                                                                                                      
Daemon Status:                                                                                        
  corosync: active/enabled                                                                            
  pacemaker: active/enabled                                                                           
  pcsd: active/enabled

<=== master is up


[root@ibm-x3650m5-03 ovn_ha]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
ovn2.12-2.12.0-7.el8fdp.x86_64
ovn2.12-central-2.12.0-7.el8fdp.x86_64
ovn2.12-host-2.12.0-7.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-43.noarch
openvswitch2.12-2.12.0-4.el8fdp.x86_64

Comment 6 errata-xmlrpc 2019-12-11 12:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:4209