Hide Forgot
rhel-osp-director: OSP10 after successful minor update, live migration fails: ERROR nova.virt.libvirt.driver ..Migration operation has aborted Environment: openstack-nova-novncproxy-14.0.2-2.el7ost.noarch puppet-openstacklib-9.4.0-2.el7ost.noarch openstack-nova-scheduler-14.0.2-2.el7ost.noarch openstack-nova-conductor-14.0.2-2.el7ost.noarch openstack-puppet-modules-9.3.0-1.el7ost.noarch openstack-nova-cert-14.0.2-2.el7ost.noarch openstack-nova-compute-14.0.2-2.el7ost.noarch openstack-nova-api-14.0.2-2.el7ost.noarch openstack-nova-common-14.0.2-2.el7ost.noarch openstack-nova-console-14.0.2-2.el7ost.noarch puppet-openstack_extras-9.4.0-1.el7ost.noarch instack-undercloud-5.0.0-3.el7ost.noarch openstack-puppet-modules-9.3.0-1.el7ost.noarch openstack-tripleo-heat-templates-5.0.0-1.3.el7ost.noarch Steps to reproduce: 1. Deploy 10 with: openstack overcloud deploy --debug --templates --libvirt-type kvm --ntp-server clock.redhat.com --neutron-network-type vxlan --neutron-tunnel-types vxlan --control-scale 3 --control-flavor controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 --compute-scale 2 --compute-flavor compute-b634c10a-570f-59ba-bdbf-0c313d745a10 --ceph-storage-scale 2 --ceph-storage-flavor ceph-cf1f074b-dadb-5eb8-9eb0-55828273fab7 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e virt/ceph.yaml -e virt/hostnames.yml -e virt/network/network-environment.yaml --log-file overcloud_deployment_48.log 2. Run an instance. 3. Successfully update to latest 10. 4. Attempt to live-migrate the instance. Result: The instance doesn't migrate. No error is shown on the console. on the node where the instance is running I see the following in nova-compute.log: f33b5899bafb] Increasing downtime to 46 ms after 0 sec elapsed time 2016-11-08 23:45:50.512 26496 INFO nova.virt.libvirt.driver [req-9046bcb3-a716-4ceb-b118-7e6c38524c2d 8776fa665118445db4b6b911463f0b8a 7cc89df348e14715b4886b7441fa9dec - - -] [instance: 5ea96f1c-7246-4428-b62a-f33b5899bafb] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0) 2016-11-08 23:45:50.860 26496 ERROR nova.virt.libvirt.driver [req-9046bcb3-a716-4ceb-b118-7e6c38524c2d 8776fa665118445db4b6b911463f0b8a 7cc89df348e14715b4886b7441fa9dec - - -] [instance: 5ea96f1c-7246-4428-b62a-f33b5899bafb] Live Migration failure: unable to connect to server at 'compute-0.localdomain:49152': No route to host 2016-11-08 23:45:51.015 26496 ERROR nova.virt.libvirt.driver [req-9046bcb3-a716-4ceb-b118-7e6c38524c2d 8776fa665118445db4b6b911463f0b8a 7cc89df348e14715b4886b7441fa9dec - - -] [instance: 5ea96f1c-7246-4428-b62a-f33b5899bafb] Migration operation has aborted 2016-11-08 23:46:09.028 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Auditing locally available compute resources for node compute-1.localdomain 2016-11-08 23:46:09.264 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Total usable vcpus: 4, total allocated vcpus: 1 2016-11-08 23:46:09.265 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Final resource view: name=compute-1.localdomain phys_ram=6143MB used_ram=2560MB phys_disk=71GB used_disk=1GB total_vcpus=4 used_vcpus=1 pci_stats=[] 2016-11-08 23:46:09.296 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Compute_service record updated for compute-1.localdomain:compute-1.localdomain 2016-11-08 23:47:11.031 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Auditing locally available compute resources for node compute-1.localdomain 2016-11-08 23:47:11.252 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Total usable vcpus: 4, total allocated vcpus: 1 2016-11-08 23:47:11.253 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Final resource view: name=compute-1.localdomain phys_ram=6143MB used_ram=2560MB phys_disk=71GB used_disk=1GB total_vcpus=4 used_vcpus=1 pci_stats=[] Expected result: The live-migration should work.
in /var/log/messages: libvirtError: unable to connect to server at 'compute-2.localdomain:49152': No route to host Running: iptables -I INPUT -p tcp --dport 49152:49215 -j ACCEPT on every compute, made it possible to live migrate.
@sasha can we please confirm that is was possible to live migrate before the update. I mean is the workaround from comment #3 because of some environmental issue in your setup which would also be necessary before. I can't think of anything during the minor update that would cause this behaviour (and you are going 10 to latest 10 so it is more or less a no-op wrt actual package updates).
patches needed: https://review.openstack.org/#/c/394015/ https://review.openstack.org/#/c/394019/
Applying patches from comment #5 made the live migration pass: [stack@undercloud-0 ~]$ nova show after_deploy|grep hyper | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.localdomain [stack@undercloud-0 ~]$ nova live-migration after_deploy [stack@undercloud-0 ~]$ nova show after_deploy|grep hyper | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain
This looks like a duplicate of BZ#1390070
should be tested against latest puddle 11/11/16
Verified: Environment: openstack-tripleo-heat-templates-5.0.0-1.7.el7ost.noarch instack-undercloud-5.0.0-4.el7ost.noarch openstack-puppet-modules-9.3.0-1.el7ost.noarch [stack@undercloud-0 ~]$ nova list +--------------------------------------+--------------+---------+------------+-------------+---------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+---------+------------+-------------+---------------------------------------+ | 6bf506af-f9b3-4155-9c76-18a45ef93f1f | after_deploy | SHUTOFF | - | Shutdown | tenantvxlan=192.168.32.10, 10.0.0.110 | | 291812ed-2939-423b-be06-7f870ea1df2d | after_update | ACTIVE | - | Running | tenantvxlan=192.168.32.12, 10.0.0.103 | +--------------------------------------+--------------+---------+------------+-------------+---------------------------------------+ [stack@undercloud-0 ~]$ nova show after_update|grep -i hyper | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain [stack@undercloud-0 ~]$ nova live-migration after_update [stack@undercloud-0 ~]$ nova show after_update|grep -i hyper | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.localdomain [stack@undercloud-0 ~]$ ping 10.0.0.103 PING 10.0.0.103 (10.0.0.103) 56(84) bytes of data. 64 bytes from 10.0.0.103: icmp_seq=1 ttl=63 time=1.59 ms ^C --- 10.0.0.103 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.596/1.596/1.596/0.000 ms
*** This bug has been marked as a duplicate of bug 1390070 ***