Bug 2210534

Summary: 'Revert OVN migration' procedure fails on checking server network availability
Product: Red Hat OpenStack Reporter: Roman Safronov <rsafrono>
Component: openstack-neutronAssignee: Yatin Karel <ykarel>
Status: CLOSED ERRATA QA Contact: Roman Safronov <rsafrono>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: chrisw, dhughes, jlibosva, pgrist, scohen, ykarel
Target Milestone: gaKeywords: AutomationBlocker, Regression, Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-18.6.1-1.20230518200965.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:15:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roman Safronov 2023-05-28 08:54:45 UTC
Description of problem:

'Revert OVN migration' procedure fails on overcloud_opdate stage during execution of tripleo_nodes_validation tasks (PLAY [Server network validation]). Some nodes are not responding, see error from overcloud_deploy log:

2023-05-27 02:11:08.862553 | 52540048-e52d-8a7e-f8d7-0000000030e9 |      FATAL | Check Default IPv4 Gateway availability | compute-1 | error={"attempts": 10, "changed": false, "cmd": ["ping", "-w", "10", "-c", "5", "10.0.0.1"], "delta": "0:00:01.196542", "end": "2023-05-27 02:11:08.815152", "msg": "non-zero return code", "rc": 1, "start": "2023-05-27 02:11:07.618610", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\nFrom 10.0.0.131 icmp_seq=1 Destination Host Unreachable\nFrom 10.0.0.131 icmp_seq=2 Destination Host Unreachable\n\n--- 10.0.0.1 ping statistics ---\n2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1062ms\npipe 2", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "From 10.0.0.131 icmp_seq=1 Destination Host Unreachable", "From 10.0.0.131 icmp_seq=2 Destination Host Unreachable", "", "--- 10.0.0.1 ping statistics ---", "2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1062ms", "pipe 2"]}
2023-05-27 02:11:08.863904 | 52540048-e52d-8a7e-f8d7-0000000030e9 |     TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | compute-1 | 0:11:26.476122 | 618.96s
2023-05-27 02:11:14.020416 | 52540048-e52d-8a7e-f8d7-0000000030e9 |      FATAL | Check Default IPv4 Gateway availability | compute-0 | error={"attempts": 10, "changed": false, "cmd": ["ping", "-w", "10", "-c", "5", "10.0.0.1"], "delta": "0:00:02.908385", "end": "2023-05-27 02:11:13.975070", "msg": "non-zero return code", "rc": 1, "start": "2023-05-27 02:11:11.066685", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\nFrom 10.0.0.119 icmp_seq=1 Destination Host Unreachable\nFrom 10.0.0.119 icmp_seq=2 Destination Host Unreachable\nFrom 10.0.0.119 icmp_seq=3 Destination Host Unreachable\n\n--- 10.0.0.1 ping statistics ---\n3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2070ms\npipe 3", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "From 10.0.0.119 icmp_seq=1 Destination Host Unreachable", "From 10.0.0.119 icmp_seq=2 Destination Host Unreachable", "From 10.0.0.119 icmp_seq=3 Destination Host Unreachable", "", "--- 10.0.0.1 ping statistics ---", "3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2070ms", "pipe 3"]}
2023-05-27 02:11:14.021456 | 52540048-e52d-8a7e-f8d7-0000000030e9 |     TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | compute-0 | 0:11:31.633684 | 624.31s
2023-05-27 02:11:20.627932 | 52540048-e52d-8a7e-f8d7-0000000030e9 |      FATAL | Check Default IPv4 Gateway availability | networker-2 | error={"attempts": 10, "changed": false, "cmd": ["ping", "-w", "10", "-c", "5", "10.0.0.1"], "delta": "0:00:01.102085", "end": "2023-05-27 02:11:20.581607", "msg": "non-zero return code", "rc": 1, "start": "2023-05-27 02:11:19.479522", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\nFrom 10.0.0.128 icmp_seq=1 Destination Host Unreachable\nFrom 10.0.0.128 icmp_seq=2 Destination Host Unreachable\n\n--- 10.0.0.1 ping statistics ---\n2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1031ms\npipe 2", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "From 10.0.0.128 icmp_seq=1 Destination Host Unreachable", "From 10.0.0.128 icmp_seq=2 Destination Host Unreachable", "", "--- 10.0.0.1 ping statistics ---", "2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1031ms", "pipe 2"]}
2023-05-27 02:11:20.629618 | 52540048-e52d-8a7e-f8d7-0000000030e9 |     TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | networker-2 | 0:11:38.241837 | 631.33s


Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230525.n.1
python3-neutron-18.6.1-1.20230518200958.da43b03.el9ost.noarch

How reproducible:
100%, in case tempest and tobiko stages were running
In case d/s CI job is configured to skip tempest/tobiko stages and runs only 'ovn migration' and then 'restore ovs' stages the issue does not happen

Steps to Reproduce:

Found by d/s ovs2ovn CI jobs that perform the following scenario
1. Deploy OVS environment
2. Run tempest neutron and tobiko create-resources
3. Create backup of control plane nodes 
4. Perform migration from OVS to OVN according to official procedure
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/testing_migration_of_the_networking_service_to_the_ml2ovn_mechanism_driver/migrating-ovs-to-ovn
5. Run tempest neutron and tobiko check-resources
6. Restore control plane nodes from backup
7. Run /usr/share/ansible/neutron-ovn-migration/playbooks/revert.yml script
8. Run the initial overcloud deploy script (the same that was used in step 1) to update overcloud to use ovs

Actual results:
overcloud deploy script fails on server network validation

Expected results:
overcloud deploy script passes

Additional info:

Comment 12 Roman Safronov 2023-06-08 12:51:50 UTC
Verified that the issue does not happen on RHOS-17.1-RHEL-9-20230607.n.2 with openstack-neutron-ovn-migration-tool-18.6.1-1.20230518200966.el9ost.noarch

Comment 20 errata-xmlrpc 2023-08-16 01:15:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577

Comment 21 Red Hat Bugzilla 2023-12-15 04:26:18 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days