Bug 1761370 - [ovn_cluster][RHEL 8] master node can't be up after restart openvswitch
Summary: [ovn_cluster][RHEL 8] master node can't be up after restart openvswitch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 19.G
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-14 09:31 UTC by Numan Siddique
Modified: 2020-01-14 21:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-11 12:19:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:4209 0 None None None 2019-12-11 12:20:02 UTC

Description Numan Siddique 2019-10-14 09:31:29 UTC
This bug was initially created as a copy of Bug #1684419

I am copying this bug because: 



Description of problem:
this bug is similar with a ovs 2.9 bug bz1684363 on rhel7

Version-Release number of selected component (if applicable):
[root@hp-dl388g8-02 ovn_ha]# uname -a
Linux hp-dl388g8-02.rhts.eng.pek2.redhat.com 4.18.0-64.el8.x86_64 #1 SMP Wed Jan 23 20:50:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl388g8-02 ovn_ha]# rpm -qa | grep openvswitch
openvswitch2.11-ovn-common-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-30.noarch
openvswitch2.11-ovn-central-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
openvswitch-selinux-extra-policy-1.0-10.el8fdp.noarch
openvswitch2.11-ovn-host-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
openvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64

How reproducible:
everytime

Steps to Reproduce:
1.set up cluster with 3 nodes as ovndb_servers
2.restart openvswitch on master node

[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:40:50 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     Masters: [ 70.0.0.2 ]
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# systemctl restart openvswitch
[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:41:21 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     Masters: [ 70.0.0.2 ]
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:41:25 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     ovndb_servers	(ocf::ovn:ovndb-servers):	FAILED 70.0.0.2
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Failed Resource Actions:
* ovndb_servers_demote_0 on 70.0.0.2 'not running' (7): call=17, status=complete, exitreason='',
    last-rc-change='Fri Mar  1 03:41:22 2019', queued=0ms, exec=97ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.0.0.2 (version 2.0.1-3.el8-0eb7991564) - partition with quorum
Last updated: Fri Mar  1 03:41:27 2019
Last change: Fri Mar  1 03:33:33 2019 by root via crm_attribute on 70.0.0.2

3 nodes configured
4 resources configured

Online: [ 70.0.0.2 70.0.0.12 70.0.0.20 ]

Full list of resources:

 ip-70.0.0.50	(ocf::heartbeat:IPaddr2):	Started 70.0.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     ovndb_servers	(ocf::ovn:ovndb-servers):	FAILED 70.0.0.2
     Slaves: [ 70.0.0.12 70.0.0.20 ]

Failed Resource Actions:
* ovndb_servers_demote_0 on 70.0.0.2 'not running' (7): call=17, status=complete, exitreason='',
    last-rc-change='Fri Mar  1 03:41:22 2019', queued=0ms, exec=97ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@hp-dl388g8-02 ovn_ha]# 



Actual results:
the node can't be back after restart openvswitch

Expected results:
the node can be back after restart openvswitch

Additional info:

Comment 3 haidong li 2019-11-08 03:03:34 UTC
This issue is blocked by bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1769202

Comment 4 Jianlin Shi 2019-11-09 07:38:06 UTC
Verified on ovn2.12.0-7:

[root@ibm-x3650m5-03 ovn_ha]# ip addr sh eno3                                                         
4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000         
    link/ether 08:94:ef:04:78:5a brd ff:ff:ff:ff:ff:ff                                                
    inet 70.11.0.2/24 scope global eno3
       valid_lft forever preferred_lft forever                                                        
    inet 70.11.0.50/24 scope global secondary eno3
       valid_lft forever preferred_lft forever                                                        
    inet6 2001::a94:efff:fe04:785a/64 scope global dynamic mngtmpaddr
       valid_lft 86359sec preferred_lft 14359sec
[root@ibm-x3650m5-03 ovn_ha]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: 70.11.0.12 (version 2.0.2-3.el8-744a30d655) - partition with quorum                       
Last updated: Sat Nov  9 02:36:03 2019                                                                
Last change: Sat Nov  9 02:31:49 2019 by root via crm_attribute on 70.11.0.2                          

3 nodes configured
4 resources configured

Online: [ 70.11.0.2 70.11.0.12 70.11.0.20 ]                                                           

Full list of resources:

 ip-70.11.0.50  (ocf::heartbeat:IPaddr2):       Started 70.11.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)                                          
     Masters: [ 70.11.0.2 ]
     Slaves: [ 70.11.0.12 70.11.0.20 ]                                                                

Daemon Status:                                                                                        
  corosync: active/enabled                                                                            
  pacemaker: active/enabled                                                                           
  pcsd: active/enabled                                                                                
[root@ibm-x3650m5-03 ovn_ha]# systemctl restart openvswitch

<==== restart openvswitch on master 70.11.0.2

[root@ibm-x3650m5-03 ovn_ha]# pcs status
Cluster name: my_cluster                                                                              
Stack: corosync                                                                                       
Current DC: 70.11.0.12 (version 2.0.2-3.el8-744a30d655) - partition with quorum
Last updated: Sat Nov  9 02:36:23 2019                                                                
Last change: Sat Nov  9 02:31:49 2019 by root via crm_attribute on 70.11.0.2
                                                                                                      
3 nodes configured                                                                                    
4 resources configured                                                                                
                                                                                                      
Online: [ 70.11.0.2 70.11.0.12 70.11.0.20 ]                                                           
                                                                                                      
Full list of resources:                                                                               
                                                                                                      
 ip-70.11.0.50  (ocf::heartbeat:IPaddr2):       Started 70.11.0.2
 Clone Set: ovndb_servers-clone [ovndb_servers] (promotable)
     Masters: [ 70.11.0.2 ]                                                                           
     Slaves: [ 70.11.0.12 70.11.0.20 ]                                                                
                                                                                                      
Daemon Status:                                                                                        
  corosync: active/enabled                                                                            
  pacemaker: active/enabled                                                                           
  pcsd: active/enabled

<=== master is up


[root@ibm-x3650m5-03 ovn_ha]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
ovn2.12-2.12.0-7.el8fdp.x86_64
ovn2.12-central-2.12.0-7.el8fdp.x86_64
ovn2.12-host-2.12.0-7.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-43.noarch
openvswitch2.12-2.12.0-4.el8fdp.x86_64

Comment 6 errata-xmlrpc 2019-12-11 12:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:4209


Note You need to log in before you can comment on or make changes to this bug.