Description of problem: When running ovs2ovn migration with a workload that include octavia load balancer, at the end of the migration the load balancer is in ERROR state. The issue happens (see details below) because a failover of the LB was triggered and the failover tries to create a VM plugged on the lb-mgmt-net, but it failed. One of the possible solutions (according to Greg, gthiemonge) can be disabling the Octavia services (except octavia-api) during a migration. Version-Release number of selected component (if applicable): RHOS-16.2-RHEL-8-20211027.n.1 How reproducible: 100% Steps to Reproduce: 1. Deploy OSP environment with openvswitch firewall driver, 2. Create a workload that consist of 2 VMs running on connected to internal network and have FIPs on external network. Create an octavia load balancer and use the VMs as members. Make sure octavia health monitor is also running. 3. Run ovs2ovn migration according to the official documentation. Actual results: Load balancer status is ERROR Expected results: Load balancer status is ACTIVE. Additional info from Greg (gthiemonge): A failover of the LB was triggered after the migration because the Octavia health-manager service didn't receive any heartbeat packets from the amphora (I don't know what should be the behavior of the existing VMs after the migration, but I guess triggering a failover of the load balancers is acceptable). The failover creates a VM plugged on the lb-mgmt-net but it failed: On networker-0, we can see the exception in the health-manager logs: /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker Traceback (most recent call last): /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker result = task.execute(**arguments) /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/controller/worker/v1/tasks/compute_tasks.py", line 249, in execute /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker raise exceptions.ComputeBuildException(fault=fault) /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker octavia.common.exceptions.ComputeBuildException: Failed to build compute instance due to: {'code': 500, 'created': '2021-11-10T13:47:10Z', 'message': 'Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6619, in _create_domain_and_network\n network_info)\n File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__\n next(self.gen)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 478, in wait_for_instance_event\n actual_event = event.wait()\n File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\neventlet.timeout.Timeout: 300 seconds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n block_device_info=block_device_info)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3746, in spawn\n cleanup_instance_disks=created_disks)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6642, in _create_domain_and_network\n raise exception.VirtualInterfaceCreateException()\nnova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2168, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2508, in _build_and_run_instance\n reason=msg)\nnova.exception.BuildAbortException: Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.\n'} /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:14.699 15 ERROR octavia.controller.worker.v1.controller_worker /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker [-] Amphora cc27198b-863d-4a56-aff9-a84986225b39 failover exception: Failed to build compute instance due to: {'code': 500, 'created': '2021-11-10T13:47:10Z', 'message': 'Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6619, in _create_domain_and_network\n network_info)\n File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__\n next(self.gen)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 478, in wait_for_instance_event\n actual_event = event.wait()\n File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\neventlet.timeout.Timeout: 300 seconds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n block_device_info=block_device_info)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3746, in spawn\n cleanup_instance_disks=created_disks)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6642, in _create_domain_and_network\n raise exception.VirtualInterfaceCreateException()\nnova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2168, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2508, in _build_and_run_instance\n reason=msg)\nnova.exception.BuildAbortException: Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.\n'}: octavia.common.exceptions.ComputeBuildException: Failed to build compute instance due to: {'code': 500, 'created': '2021-11-10T13:47:10Z', 'message': 'Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6619, in _create_domain_and_network\n network_info)\n File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__\n next(self.gen)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 478, in wait_for_instance_event\n actual_event = event.wait()\n File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\neventlet.timeout.Timeout: 300 seconds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n block_device_info=block_device_info)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3746, in spawn\n cleanup_instance_disks=created_disks)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6642, in _create_domain_and_network\n raise exception.VirtualInterfaceCreateException()\nnova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2168, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2508, in _build_and_run_instance\n reason=msg)\nnova.exception.BuildAbortException: Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.\n'} /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker Traceback (most recent call last): /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/controller/worker/v1/controller_worker.py", line 895, in failover_amphora /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker failover_amphora_tf.run() /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker for _state in self.run_iter(timeout=timeout): /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker failure.Failure.reraise_if_any(er_failures) /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/types/failure.py", line 339, in reraise_if_any /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker failures[0].reraise() /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/types/failure.py", line 346, in reraise /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker six.reraise(*self._exc_info) /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker raise value /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker result = task.execute(**arguments) /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/controller/worker/v1/tasks/compute_tasks.py", line 249, in execute /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker raise exceptions.ComputeBuildException(fault=fault) /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker octavia.common.exceptions.ComputeBuildException: Failed to build compute instance due to: {'code': 500, 'created': '2021-11-10T13:47:10Z', 'message': 'Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6619, in _create_domain_and_network\n network_info)\n File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__\n next(self.gen)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 478, in wait_for_instance_event\n actual_event = event.wait()\n File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait\n result = hub.switch()\n File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch\n return self.greenlet.switch()\neventlet.timeout.Timeout: 300 seconds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n block_device_info=block_device_info)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3746, in spawn\n cleanup_instance_disks=created_disks)\n File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6642, in _create_domain_and_network\n raise exception.VirtualInterfaceCreateException()\nnova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2168, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2508, in _build_and_run_instance\n reason=msg)\nnova.exception.BuildAbortException: Build of instance 8637dde8-1372-426b-a5e0-a92a8a237ce7 aborted: Failed to allocate the network(s), not rescheduling.\n'} /var/log/containers/octavia/health-manager.log.5.gz:2021-11-10 13:47:17.006 15 ERROR octavia.controller.worker.v1.controller_worker Related logs in nova on compute-0 (/var/log/containers/nova/nova-compute.log.11.gz) 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [req-d4e1a790-1578-4ef3-b300-a7e9e0269216 ffa80baa6b574c8ea4d2636eb02090a7 ba759e30bc8b44c58fc15f2a4cac0394 - default default] [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] Traceback (most recent call last): 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6619, in _create_domain_and_network 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] network_info) 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] next(self.gen) 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 478, in wait_for_instance_event 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] actual_event = event.wait() 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] result = hub.switch() 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] return self.greenlet.switch() 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] eventlet.timeout.Timeout: 300 seconds 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] During handling of the above exception, another exception occurred: 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] Traceback (most recent call last): 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2668, in _build_resources 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] yield resources 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] block_device_info=block_device_info) 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3746, in spawn 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] cleanup_instance_disks=created_disks) 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6642, in _create_domain_and_network 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] raise exception.VirtualInterfaceCreateException() 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed 2021-11-10 13:47:09.029 8 ERROR nova.compute.manager [instance: 8637dde8-1372-426b-a5e0-a92a8a237ce7]
Note: please don't confuse this issue with BZ 2005964. The BZ 2005964 was found on an environment with iptables_hybrid firewall driver and with mellanox driver involved.
One more update from Greg Roman Safronov <rsafrono> wrote: Greg, Is there a command for disabling octavia services? Or users should just stop octavia-related containers? Could you please suggest commands that customers should run before ovs2ovn migration in order to disable octavia services properly? I think we need to include such instructions into official documentation. "systemctl stop tripleo_octavia_worker tripleo_octavia_housekeeping tripleo_octavia_health_manager tripleo_octavia_driver_agent" on Controllers and/or Networkers should prevent this issue. Just an additional note: the LB was in ERROR because Octavia detected a connectivity issue during the migration and failed to recover from it, but the LB was fully functional after the end of the migration. And there's no way to remove this error flag (except a failover, which creates a new VM)