Description of problem: 'Revert OVN migration' procedure fails on overcloud_opdate stage during execution of tripleo_nodes_validation tasks (PLAY [Server network validation]). Some nodes are not responding, see error from overcloud_deploy log: 2023-05-27 02:11:08.862553 | 52540048-e52d-8a7e-f8d7-0000000030e9 | FATAL | Check Default IPv4 Gateway availability | compute-1 | error={"attempts": 10, "changed": false, "cmd": ["ping", "-w", "10", "-c", "5", "10.0.0.1"], "delta": "0:00:01.196542", "end": "2023-05-27 02:11:08.815152", "msg": "non-zero return code", "rc": 1, "start": "2023-05-27 02:11:07.618610", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\nFrom 10.0.0.131 icmp_seq=1 Destination Host Unreachable\nFrom 10.0.0.131 icmp_seq=2 Destination Host Unreachable\n\n--- 10.0.0.1 ping statistics ---\n2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1062ms\npipe 2", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "From 10.0.0.131 icmp_seq=1 Destination Host Unreachable", "From 10.0.0.131 icmp_seq=2 Destination Host Unreachable", "", "--- 10.0.0.1 ping statistics ---", "2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1062ms", "pipe 2"]} 2023-05-27 02:11:08.863904 | 52540048-e52d-8a7e-f8d7-0000000030e9 | TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | compute-1 | 0:11:26.476122 | 618.96s 2023-05-27 02:11:14.020416 | 52540048-e52d-8a7e-f8d7-0000000030e9 | FATAL | Check Default IPv4 Gateway availability | compute-0 | error={"attempts": 10, "changed": false, "cmd": ["ping", "-w", "10", "-c", "5", "10.0.0.1"], "delta": "0:00:02.908385", "end": "2023-05-27 02:11:13.975070", "msg": "non-zero return code", "rc": 1, "start": "2023-05-27 02:11:11.066685", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\nFrom 10.0.0.119 icmp_seq=1 Destination Host Unreachable\nFrom 10.0.0.119 icmp_seq=2 Destination Host Unreachable\nFrom 10.0.0.119 icmp_seq=3 Destination Host Unreachable\n\n--- 10.0.0.1 ping statistics ---\n3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2070ms\npipe 3", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "From 10.0.0.119 icmp_seq=1 Destination Host Unreachable", "From 10.0.0.119 icmp_seq=2 Destination Host Unreachable", "From 10.0.0.119 icmp_seq=3 Destination Host Unreachable", "", "--- 10.0.0.1 ping statistics ---", "3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2070ms", "pipe 3"]} 2023-05-27 02:11:14.021456 | 52540048-e52d-8a7e-f8d7-0000000030e9 | TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | compute-0 | 0:11:31.633684 | 624.31s 2023-05-27 02:11:20.627932 | 52540048-e52d-8a7e-f8d7-0000000030e9 | FATAL | Check Default IPv4 Gateway availability | networker-2 | error={"attempts": 10, "changed": false, "cmd": ["ping", "-w", "10", "-c", "5", "10.0.0.1"], "delta": "0:00:01.102085", "end": "2023-05-27 02:11:20.581607", "msg": "non-zero return code", "rc": 1, "start": "2023-05-27 02:11:19.479522", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\nFrom 10.0.0.128 icmp_seq=1 Destination Host Unreachable\nFrom 10.0.0.128 icmp_seq=2 Destination Host Unreachable\n\n--- 10.0.0.1 ping statistics ---\n2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1031ms\npipe 2", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "From 10.0.0.128 icmp_seq=1 Destination Host Unreachable", "From 10.0.0.128 icmp_seq=2 Destination Host Unreachable", "", "--- 10.0.0.1 ping statistics ---", "2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1031ms", "pipe 2"]} 2023-05-27 02:11:20.629618 | 52540048-e52d-8a7e-f8d7-0000000030e9 | TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | networker-2 | 0:11:38.241837 | 631.33s Version-Release number of selected component (if applicable): RHOS-17.1-RHEL-9-20230525.n.1 python3-neutron-18.6.1-1.20230518200958.da43b03.el9ost.noarch How reproducible: 100%, in case tempest and tobiko stages were running In case d/s CI job is configured to skip tempest/tobiko stages and runs only 'ovn migration' and then 'restore ovs' stages the issue does not happen Steps to Reproduce: Found by d/s ovs2ovn CI jobs that perform the following scenario 1. Deploy OVS environment 2. Run tempest neutron and tobiko create-resources 3. Create backup of control plane nodes 4. Perform migration from OVS to OVN according to official procedure https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/testing_migration_of_the_networking_service_to_the_ml2ovn_mechanism_driver/migrating-ovs-to-ovn 5. Run tempest neutron and tobiko check-resources 6. Restore control plane nodes from backup 7. Run /usr/share/ansible/neutron-ovn-migration/playbooks/revert.yml script 8. Run the initial overcloud deploy script (the same that was used in step 1) to update overcloud to use ovs Actual results: overcloud deploy script fails on server network validation Expected results: overcloud deploy script passes Additional info:
Verified that the issue does not happen on RHOS-17.1-RHEL-9-20230607.n.2 with openstack-neutron-ovn-migration-tool-18.6.1-1.20230518200966.el9ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days