Bug 1393135

Summary: rhel-osp-director: OSP10 after successful minor update, live migration fails: ERROR nova.virt.libvirt.driver ..Migration operation has aborted
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-tripleo-heat-templatesAssignee: Angus Thomas <athomas>
Status: CLOSED DUPLICATE QA Contact: Alexander Chuzhoy <sasha>
Severity: unspecified Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: dbecker, dgilbert, jcoufal, jschluet, lbezdick, mandreou, mburns, mcornea, morazi, ohochman, rhel-osp-director-maint, rhos-flags, sasha, sgordon
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.0.0-1.7.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-17 13:36:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alexander Chuzhoy 2016-11-08 23:52:54 UTC
rhel-osp-director:   OSP10 after successful minor update, live migration fails: ERROR nova.virt.libvirt.driver ..Migration operation has aborted


Environment:
openstack-nova-novncproxy-14.0.2-2.el7ost.noarch
puppet-openstacklib-9.4.0-2.el7ost.noarch
openstack-nova-scheduler-14.0.2-2.el7ost.noarch
openstack-nova-conductor-14.0.2-2.el7ost.noarch
openstack-puppet-modules-9.3.0-1.el7ost.noarch
openstack-nova-cert-14.0.2-2.el7ost.noarch
openstack-nova-compute-14.0.2-2.el7ost.noarch
openstack-nova-api-14.0.2-2.el7ost.noarch
openstack-nova-common-14.0.2-2.el7ost.noarch
openstack-nova-console-14.0.2-2.el7ost.noarch
puppet-openstack_extras-9.4.0-1.el7ost.noarch
instack-undercloud-5.0.0-3.el7ost.noarch
openstack-puppet-modules-9.3.0-1.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-1.3.el7ost.noarch


Steps to reproduce:
1. Deploy 10 with:
openstack overcloud deploy --debug --templates --libvirt-type kvm --ntp-server clock.redhat.com --neutron-network-type vxlan --neutron-tunnel-types vxlan --control-scale 3 --control-flavor controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 --compute-scale 2 --compute-flavor compute-b634c10a-570f-59ba-bdbf-0c313d745a10 --ceph-storage-scale 2 --ceph-storage-flavor ceph-cf1f074b-dadb-5eb8-9eb0-55828273fab7 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e virt/ceph.yaml -e virt/hostnames.yml -e virt/network/network-environment.yaml --log-file overcloud_deployment_48.log

2. Run an instance.
3. Successfully update to latest 10.
4. Attempt to live-migrate the instance.

Result:
The instance doesn't migrate. No error is shown on the console.

on the node where the instance is running I see the following in nova-compute.log:
f33b5899bafb] Increasing downtime to 46 ms after 0 sec elapsed time
2016-11-08 23:45:50.512 26496 INFO nova.virt.libvirt.driver [req-9046bcb3-a716-4ceb-b118-7e6c38524c2d 8776fa665118445db4b6b911463f0b8a 7cc89df348e14715b4886b7441fa9dec - - -] [instance: 5ea96f1c-7246-4428-b62a-f33b5899bafb] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)
2016-11-08 23:45:50.860 26496 ERROR nova.virt.libvirt.driver [req-9046bcb3-a716-4ceb-b118-7e6c38524c2d 8776fa665118445db4b6b911463f0b8a 7cc89df348e14715b4886b7441fa9dec - - -] [instance: 5ea96f1c-7246-4428-b62a-f33b5899bafb] Live Migration failure: unable to connect to server at 'compute-0.localdomain:49152': No route to host
2016-11-08 23:45:51.015 26496 ERROR nova.virt.libvirt.driver [req-9046bcb3-a716-4ceb-b118-7e6c38524c2d 8776fa665118445db4b6b911463f0b8a 7cc89df348e14715b4886b7441fa9dec - - -] [instance: 5ea96f1c-7246-4428-b62a-f33b5899bafb] Migration operation has aborted
2016-11-08 23:46:09.028 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Auditing locally available compute resources for node compute-1.localdomain
2016-11-08 23:46:09.264 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Total usable vcpus: 4, total allocated vcpus: 1
2016-11-08 23:46:09.265 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Final resource view: name=compute-1.localdomain phys_ram=6143MB used_ram=2560MB phys_disk=71GB used_disk=1GB total_vcpus=4 used_vcpus=1 pci_stats=[]
2016-11-08 23:46:09.296 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Compute_service record updated for compute-1.localdomain:compute-1.localdomain
2016-11-08 23:47:11.031 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Auditing locally available compute resources for node compute-1.localdomain
2016-11-08 23:47:11.252 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Total usable vcpus: 4, total allocated vcpus: 1
2016-11-08 23:47:11.253 26496 INFO nova.compute.resource_tracker [req-5ca8d359-e70c-4e1b-95e3-9a982df1e857 - - - - -] Final resource view: name=compute-1.localdomain phys_ram=6143MB used_ram=2560MB phys_disk=71GB used_disk=1GB total_vcpus=4 used_vcpus=1 pci_stats=[]


Expected result:
The live-migration should work.

Comment 3 Alexander Chuzhoy 2016-11-09 18:53:44 UTC
in /var/log/messages:
libvirtError: unable to connect to server at 'compute-2.localdomain:49152': No route to host

Running:
iptables -I INPUT -p tcp --dport 49152:49215 -j ACCEPT on every compute, made it possible to live migrate.

Comment 4 Marios Andreou 2016-11-10 14:34:52 UTC
@sasha can we please confirm that is was possible to live migrate before the update. I mean is the workaround from comment #3 because of some environmental issue in your setup which would also be necessary before. I can't think of anything during the minor update that would cause this behaviour (and you are going 10 to latest 10 so it is more or less a no-op wrt actual package updates).

Comment 6 Alexander Chuzhoy 2016-11-11 03:19:38 UTC
Applying patches from comment #5 made the live migration pass:
[stack@undercloud-0 ~]$ nova show after_deploy|grep hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.localdomain

[stack@undercloud-0 ~]$ nova live-migration after_deploy

[stack@undercloud-0 ~]$ nova show after_deploy|grep hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.localdomain

Comment 7 Marius Cornea 2016-11-11 06:55:47 UTC
This looks like a duplicate of BZ#1390070

Comment 9 Omri Hochman 2016-11-11 16:14:18 UTC
should be tested against latest puddle 11/11/16

Comment 10 Alexander Chuzhoy 2016-11-16 23:03:29 UTC
Verified:
Environment:
openstack-tripleo-heat-templates-5.0.0-1.7.el7ost.noarch
instack-undercloud-5.0.0-4.el7ost.noarch
openstack-puppet-modules-9.3.0-1.el7ost.noarch


[stack@undercloud-0 ~]$ nova list                   
+--------------------------------------+--------------+---------+------------+-------------+---------------------------------------+
| ID                                   | Name         | Status  | Task State | Power State | Networks                              |
+--------------------------------------+--------------+---------+------------+-------------+---------------------------------------+
| 6bf506af-f9b3-4155-9c76-18a45ef93f1f | after_deploy | SHUTOFF | -          | Shutdown    | tenantvxlan=192.168.32.10, 10.0.0.110 |
| 291812ed-2939-423b-be06-7f870ea1df2d | after_update | ACTIVE  | -          | Running     | tenantvxlan=192.168.32.12, 10.0.0.103 |
+--------------------------------------+--------------+---------+------------+-------------+---------------------------------------+
[stack@undercloud-0 ~]$ nova show after_update|grep -i hyper            
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.localdomain    

[stack@undercloud-0 ~]$ nova live-migration after_update


[stack@undercloud-0 ~]$ nova show after_update|grep -i hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.localdomain    

[stack@undercloud-0 ~]$ ping 10.0.0.103
PING 10.0.0.103 (10.0.0.103) 56(84) bytes of data.
64 bytes from 10.0.0.103: icmp_seq=1 ttl=63 time=1.59 ms
^C
--- 10.0.0.103 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.596/1.596/1.596/0.000 ms

Comment 13 Stephen Gordon 2016-11-17 13:36:08 UTC

*** This bug has been marked as a duplicate of bug 1390070 ***