Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2252717

Summary: Some of internal network ports lost connectivity after not complete batch ovn migration (activate a batch then revert the batch)
Product: Red Hat OpenStack Reporter: Roman Safronov <rsafrono>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: bmv, chrisw, jlibosva, mariel, mburns, prgutier, scohen
Target Milestone: z2Keywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-18.6.1-17.1.20231025110806.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-01-16 14:31:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2222082    

Description Roman Safronov 2023-12-04 09:29:38 UTC
Description of problem:
Some of internal network ports lost connectivity after not complete batch ovn migration (activate a batch then revert the batch)
It seems like ports on same compute node are able to communicate but connections to ports on other compute node (that belongs the same batch) were lost.

Initital OVS environment had DVR disabled and used iptables_firewall. Migration templates were configured to OVN with DVR.

OVN migration scenario was:
1. activate-ovn batch1
2. revert-ovn batch1

[batch1]
compute-0
compute-1

[batch2]
compute-2
compute-3

[batch3]
controller-0
controller-1
controller-2


Summary of connectivity checks from internal network
INFO: ping 192.168.211.60 via 10.0.0.189 = failed
INFO: ping 192.168.212.68 via 10.0.0.189 = failed
INFO: ping 192.168.211.221 via 10.0.0.189 = passed

INFO: ping 192.168.211.144 via 10.0.0.241 = failed
INFO: ping 192.168.211.221 via 10.0.0.241 = failed

INFO: ping 192.168.212.199 via 10.0.0.163 = failed

INFO: ping 192.168.211.144 via 10.0.0.227 = passed
INFO: ping 192.168.211.60 via 10.0.0.227 = failed

INFO: ping 192.168.213.229 via 10.0.0.198 = passed
INFO: ping 192.168.214.22 via 10.0.0.198 = passed
INFO: ping 192.168.213.248 via 10.0.0.198 = passed

INFO: ping 192.168.213.249 via 10.0.0.194 = passed
INFO: ping 192.168.213.248 via 10.0.0.194 = passed

INFO: ping 192.168.214.180 via 10.0.0.170 = passed

INFO: ping 192.168.213.249 via 10.0.0.236 = passed
INFO: ping 192.168.213.229 via 10.0.0.236 = passed

VMs that were used as a workload
+--------------------------------------+------------------------------------------------------------------------------------------------------------+------------------------+
| ID                                   | Networks                                                                                                   | Host                   |
+--------------------------------------+------------------------------------------------------------------------------------------------------------+------------------------+
| aa8be9af-e617-4a15-908d-fc7dadb177ed | public=10.0.0.198, 2620:52:0:13b8:f816:3eff:fea3:37d5                                                      | compute-2.redhat.local |
| 5322c2d9-8260-48e3-87fb-acb2e8532d94 | ovn-migration-net-vnf1-pinger-zone1=192.168.213.229; public=10.0.0.194, 2620:52:0:13b8:f816:3eff:fe4f:b333 | compute-3.redhat.local |
| b694b66c-7d7a-42cc-84d4-18bcac4c82ac | ovn-migration-net-vnf2-pinger-zone1=192.168.214.22; public=10.0.0.170, 2620:52:0:13b8:f816:3eff:fe12:807d  | compute-3.redhat.local |
| 80754630-d87c-419f-bf1c-bd7d3e42a353 | ovn-migration-net-vnf1-pinger-zone1=192.168.213.248; public=10.0.0.236, 2620:52:0:13b8:f816:3eff:fe2f:38ae | compute-2.redhat.local |
| ce53fa48-be51-4a98-bede-7d525b581fdb | public=10.0.0.189, 2620:52:0:13b8:f816:3eff:feca:50c                                                       | compute-0.redhat.local |
| 833faa6d-95e1-4c28-9922-90a3da78e613 | ovn-migration-net-vnf1-pinger-zone0=192.168.211.60; public=10.0.0.241, 2620:52:0:13b8:f816:3eff:fe04:d936  | compute-1.redhat.local |
| 7913c667-cf70-405a-a8da-a2f7798c98ff | ovn-migration-net-vnf2-pinger-zone0=192.168.212.68; public=10.0.0.163, 2620:52:0:13b8:f816:3eff:fe5d:6254  | compute-1.redhat.local |
| f6e2c5cf-9ae4-4075-9f74-9a736c45c59b | ovn-migration-net-vnf1-pinger-zone0=192.168.211.221; public=10.0.0.227, 2620:52:0:13b8:f816:3eff:fe65:fafe | compute-0.redhat.local |
+--------------------------------------+------------------------------------------------------------------------------------------------------------+------------------------+

2 VMs had trunk ports with 2 subports, e.g. trunk on VM ce53fa48-be51-4a98-bede-7d525b581fdb had subports with addresses 192.168.211.144 and 192.168.212.199


Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20231122.n.1
openstack-neutron-ovn-migration-tool-18.6.1-17.1.20231025110805.el9ost.noarch

How reproducible:
always

Steps to Reproduce:
1. Deploy HA environment (3 controllers + 4 compute nodes) with OVS neutron backend.
In my case it was an environment with centralized (no-DVR) routing.
2. Create a workload, in my case it was the following (see also the attached server and network list)
- 2 availability zones, first zone: compute nodes 0 and 1, second: compute nodes 2 and 3
- 8 VMs divided to 2 groups of 4 VMs, each group in a separate availability zone
- All VMs are connected to the external network directly
- 2 separate internal networks in each zone (4 networks total)
- There is a single VM with a trunk port and 2 subports, in each zone
- 3 other VMs in each zone have an intenral port that is connected to one of internal networks
- No neutron routers involved and there are no VMs in the workload that are using floating ip addresses in order to connect to the external network
3. Follow the official procedure of network backend migration from OVS to OVN 
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/migrating_to_the_ovn_mechanism_driver/migrating-ovs-to-ovn
in order to configure ovn migration templates
I used no-DVR to DVR
4. Configure batches as specified above
5. Run the following commands
ovn_migration.sh install-ovn
ovn_migration.sh activate-ovn batch1
ovn_migration.sh revert-ovn batch1

Actual results:
Some of internal network ports lost connectivity, see summary above

Expected results:
All internal ports on VMs that run on nodes included in batch1 are able to ping their peers on another VM (connected to the same network) 

Additional info:

Comment 15 errata-xmlrpc 2024-01-16 14:31:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0209