Bug 1967142 - [OSP16.2] Migration to ML2/OVN is not working, pacemaker cluster is broken
Summary: [OSP16.2] Migration to ML2/OVN is not working, pacemaker cluster is broken
Keywords:
Status: CLOSED DUPLICATE of bug 1968445
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-02 13:54 UTC by Roman Safronov
Modified: 2022-08-17 15:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-08 17:50:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-4350 0 None None None 2022-08-17 15:02:03 UTC

Description Roman Safronov 2021-06-02 13:54:30 UTC
Description of problem:
During execution of ML2OVS -> ML2OVN migration procedure validation of migration resources failed. 

64 bytes from 10.0.0.254: icmp_seq=2699 ttl=62 time=0.639 ms
64 bytes from 10.0.0.254: icmp_seq=2700 ttl=62 time=1.61 ms
64 bytes from 10.0.0.254: icmp_seq=2701 ttl=62 time=0.610 ms
64 bytes from 10.0.0.254: icmp_seq=2702 ttl=62 time=0.605 ms
From 10.0.0.84 icmp_seq=2712 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2713 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2714 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2715 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2716 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2717 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2718 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2719 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2720 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2721 Destination Host Unreachable
From 10.0.0.84 icmp_seq=2722 Destination Host Unreachable

(overcloud) [stack@undercloud-0 ovn_migration]$ openstack network agent list
Failed to discover available identity versions when contacting http://10.0.0.149:5000. Attempting to parse version from URL.
Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Unable to establish connection to http://10.0.0.149:5000: HTTPConnectionPool(host='10.0.0.149', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f62ff100b38>: Failed to establish a new connection: [Errno 113] No route to host',))

Seems like keystone not responding.

Pacemaker cluster stopped. 

[heat-admin@controller-0 ~]$ sudo pcs status
Cluster name: tripleo_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: controller-0 (version 2.0.5-9.el8-ba59be7122) - partition WITHOUT quorum
  * Last updated: Wed Jun  2 13:42:20 2021
  * Last change:  Wed Jun  2 12:11:19 2021 by root via cibadmin on controller-0
  * 15 nodes configured
  * 47 resource instances configured

Node List:
  * Online: [ controller-0 ]
  * OFFLINE: [ controller-1 controller-2 ]

Full List of Resources:
  * ip-192.168.24.20	(ocf::heartbeat:IPaddr2):	 Stopped
  * ip-10.0.0.149	(ocf::heartbeat:IPaddr2):	 Stopped
  * ip-172.17.1.140	(ocf::heartbeat:IPaddr2):	 Stopped
  * ip-172.17.1.72	(ocf::heartbeat:IPaddr2):	 Stopped
  * ip-172.17.3.74	(ocf::heartbeat:IPaddr2):	 Stopped
  * ip-172.17.4.27	(ocf::heartbeat:IPaddr2):	 Stopped
  * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf::heartbeat:podman):	 Stopped
    * haproxy-bundle-podman-1	(ocf::heartbeat:podman):	 Stopped
    * haproxy-bundle-podman-2	(ocf::heartbeat:podman):	 Stopped
  * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	 Stopped
    * galera-bundle-1	(ocf::heartbeat:galera):	 Stopped
    * galera-bundle-2	(ocf::heartbeat:galera):	 Stopped
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	 Stopped
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	 Stopped
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	 Stopped
  * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]:
    * redis-bundle-0	(ocf::heartbeat:redis):	 Stopped
    * redis-bundle-1	(ocf::heartbeat:redis):	 Stopped
    * redis-bundle-2	(ocf::heartbeat:redis):	 Stopped
  * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Stopped
  * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Stopped
    * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Stopped
    * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Stopped
  * ip-172.17.1.39	(ocf::heartbeat:IPaddr2):	 Stopped

Failed Resource Actions:
  * redis-bundle-0_monitor_30000 on controller-0 'error' (1): call=15, status='Error', exitreason='', last-rc-change='2021-06-02 12:26:37Z', queued=0ms, exec=0ms
  * rabbitmq-bundle-0_monitor_30000 on controller-0 'error' (1): call=10, status='Error', exitreason='', last-rc-change='2021-06-02 12:26:27Z', queued=0ms, exec=0ms
  * galera-bundle-0_monitor_30000 on controller-0 'error' (1): call=5, status='Error', exitreason='', last-rc-change='2021-06-02 12:26:17Z', queued=0ms, exec=0ms
  * ip-172.17.1.72_monitor_10000 on controller-0 'error' (1): call=21, status='complete', exitreason='[findif] failed', last-rc-change='2021-06-02 12:25:58Z', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Feel free to move the bug under more relevant component/DFG.


 
Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20210525.n.0

How reproducible:
100%


1. Deploy a HA environment (3 controllers + 2 compute)
2. Create some resources (internal network, router between external and internal networks, server with fip connected to the internal network) before starting migration and start a script that records ping results to the server's fip. 
3. Perform OVN migration steps from the official documentation https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/networking_with_open_virtual_network/migrating-ml2ovs-to-ovn


Actual results:
Migration to OVN fails. Migration resources are not reachable. Openstack services are not responding.

Expected results:
Migration succeeds

Additional info:

Comment 2 Roman Safronov 2021-06-08 17:50:53 UTC
All VLAN interfaces are down on all nodes
51: vlan20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether de:2e:87:dd:85:36 brd ff:ff:ff:ff:ff:ff
52: vlan40: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ba:38:3e:55:23:2d brd ff:ff:ff:ff:ff:ff
53: vlan50: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 8e:e3:b9:60:4b:7d brd ff:ff:ff:ff:ff:ff
54: vlan30: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 96:60:91:47:7e:c9 brd ff:ff:ff:ff:ff:ff

Looks like a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1968445

*** This bug has been marked as a duplicate of bug 1968445 ***


Note You need to log in before you can comment on or make changes to this bug.