Red Hat Bugzilla – Bug 1446825
OSP10 -> OSP11 upgrade: nova live migration fails before/after upgrading compute node
Last modified: 2017-05-18 01:55:11 EDT
Description of problem: OSP10 -> OSP11 upgrade: nova live migration fails before/after upgrading compute node with errors like: 2017-04-29 09:19:40.273 60594 ERROR nova.virt.libvirt.driver [req-6c05cab3-172d-440c-bc38-d5e748ecb972 997449a88f7d4116afeeb4822f34f16c 0d3a76f69f8f444783385464e590414f - - -] [instance: bdfc5839-dc68-4832-8a46-19030e55d90e] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://overcloud-compute-1.localdomain/system: unable to connect to server at 'overcloud-compute-1.localdomain:16509': Connection refused Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-6.0.0-10.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 2 compute nodes 2. Run major-upgrade-composable-steps.yaml OSP11 upgrade step 3. Before upgrading one of the compute nodes live migrate instances running on it with: nova host-evacuate-live overcloud-compute-1.localdomain Wait for node to be quiesced and no instances are running on it. 4. Upgrade compute node: upgrade-non-controller.sh --upgrade overcloud-compute-1 5. Reboot compute node 6. Wait for nova-compute to be up on overcloud-compute-1.localdomain 7. Live migrate intances back to overcloud-compute-1.localdomain: nova live-migration st-provinstance-ryetobxqen63-my_instance-tfvwjp427pbs overcloud-compute-1.localdomain Actual results: Instance doesn't get migrated to the upgraded node because there's no service listening on port 16509 and migration fails with: 2017-04-29 09:19:40.273 60594 ERROR nova.virt.libvirt.driver [req-6c05cab3-172d-440c-bc38-d5e748ecb972 997449a88f7d4116afeeb4822f34f16c 0d3a76f69f8f444783385464e590414f - - -] [instance: bdfc5839-dc68-4832-8a46-19030e55d90e] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://overcloud-compute-1.localdomain/system: unable to connect to server at 'overcloud-compute-1.localdomain:16509': Connection refused 2017-04-29 09:19:40.353 60594 ERROR nova.virt.libvirt.driver [req-6c05cab3-172d-440c-bc38-d5e748ecb972 997449a88f7d4116afeeb4822f34f16c 0d3a76f69f8f444783385464e590414f - - -] [instance: bdfc5839-dc68-4832-8a46-19030e55d90e] Migration oper ation has aborted Expected results: Instance live migration works during the upgrade process so workloads can be moved to nodes which are not being upgraded thus minimizing the risk of failures happening during the upgrade process. Additional info: listen_tcp is set to 0 in /etc/libvirt/libvirtd.conf [root@overcloud-compute-1 heat-admin]# grep listen_tcp /etc/libvirt/libvirtd.conf listen_tcp = 0 [root@overcloud-compute-1 heat-admin]# ps axu | grep libvirt root 1823 0.0 0.2 1440240 21596 ? Ssl 09:12 0:01 /usr/sbin/libvirtd --listen After setting listen_tcp = 1 in /etc/libvirt/libvirtd.conf and restarting libvirtd migration completes fine. Note: in the scenario described above the nova control plane services are running on a custom role. When the nova control plane service are running on the monolithic controller the test fails at step 3: when running nova host-evacuate-live the instances are not migrated from the host, failing on the same error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245