Bug 1476235 - [osp12]cannot access to instance after live migration after some time
Summary: [osp12]cannot access to instance after live migration after some time
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Eoghan Glynn
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-28 11:38 UTC by Artem Hrechanychenko
Modified: 2020-12-21 19:38 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-18 00:37:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Artem Hrechanychenko 2017-07-28 11:38:01 UTC
Description of problem:

(overcloud) [stack@undercloud-0 ~]$ nova show nisim1 |grep hyp
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.redhat.local     

overcloud) [stack@undercloud-0 ~]$ ssh cirros.0.193
Warning: Permanently added '10.0.0.193' (RSA) to the list of known hosts.
cirros.0.193's password:
$ echo "testing"> file
$ cat file
testing
$ exit
 
(overcloud) [stack@undercloud-0 ~]$ nova live-migration nisim1

nova list
+--------------------------------------+--------+-----------+------------+-------------+---------------------------------------+
| ID                                   | Name   | Status    | Task State | Power State | Networks                              |
+--------------------------------------+--------+-----------+------------+-------------+---------------------------------------+
| 95efb634-5442-41f0-9ba4-52c300ed59dc | nisim1 | MIGRATING | migrating  | Running     | tenantvxlan=192.168.32.13, 10.0.0.193 |


nova list
+--------------------------------------+--------+--------+------------+-------------+---------------------------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks                              |
+--------------------------------------+--------+--------+------------+-------------+---------------------------------------+
| 95efb634-5442-41f0-9ba4-52c300ed59dc | nisim1 | ACTIVE | -          | Running     | tenantvxlan=192.168.32.13, 10.0.0.193 |


nova show nisim1 |grep hyp
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.redhat.local    

ping 10.0.0.193
PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data.
64 bytes from 10.0.0.193: icmp_seq=1 ttl=63 time=1.68 ms


ssh cirros.0.193
Warning: Permanently added '10.0.0.193' (RSA) to the list of known hosts.
cirros.0.193's password:
$ ls
file
$ cat file
testing

After one day I cannot ping or vnc console to instance from libvirt container

(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------+--------+------------+-------------+---------------------------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks                              |
+--------------------------------------+--------+--------+------------+-------------+---------------------------------------+
| 95efb634-5442-41f0-9ba4-52c300ed59dc | nisim1 | ACTIVE | -          | Running     | tenantvxlan=192.168.32.13, 10.0.0.193 |
+--------------------------------------+--------+--------+------------+-------------+---------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ nova show nisim1
+--------------------------------------+----------------------------------------------------------------------------------+
| Property                             | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                           |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | compute-0.redhat.local                                                           |
| OS-EXT-SRV-ATTR:hostname             | nisim1                                                                           |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.redhat.local                                                           |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000002                                                                |
| OS-EXT-SRV-ATTR:kernel_id            |                                                                                  |
| OS-EXT-SRV-ATTR:launch_index         | 0                                                                                |
| OS-EXT-SRV-ATTR:ramdisk_id           |                                                                                  |
| OS-EXT-SRV-ATTR:reservation_id       | r-xycj4lpq                                                                       |
| OS-EXT-SRV-ATTR:root_device_name     | /dev/vda                                                                         |
| OS-EXT-SRV-ATTR:user_data            | -                                                                                |
| OS-EXT-STS:power_state               | 1                                                                                |
| OS-EXT-STS:task_state                | -                                                                                |
| OS-EXT-STS:vm_state                  | active                                                                           |
| OS-SRV-USG:launched_at               | 2017-07-27T14:33:19.000000                                                       |
| OS-SRV-USG:terminated_at             | -                                                                                |
| accessIPv4                           |                                                                                  |
| accessIPv6                           |                                                                                  |
| config_drive                         |                                                                                  |
| created                              | 2017-07-27T14:33:04Z                                                             |
| description                          | -                                                                                |
| flavor                               | m1.tiny (1)                                                                      |
| hostId                               | 64d14ac1f4aeb533393623c6a25fe417d363b1c651f70b62085416e7                         |
| host_status                          | UP                                                                               |
| id                                   | 95efb634-5442-41f0-9ba4-52c300ed59dc                                             |
| image                                | cirros (a889c1eb-bee1-485e-84aa-0fc76e377e25)                                    |
| key_name                             | oskey                                                                            |
| locked                               | False                                                                            |
| metadata                             | {}                                                                               |
| name                                 | nisim1                                                                           |
| os-extended-volumes:volumes_attached | [{"id": "fffb4f3c-91d4-4883-9643-0f193cd50fcd", "delete_on_termination": false}] |
| progress                             | 0                                                                                |
| security_groups                      | default                                                                          |
| status                               | ACTIVE                                                                           |
| tags                                 | []                                                                               |
| tenant_id                            | 709029e9ec5e418ea0261021eb37826a                                                 |
| tenantvxlan network                  | 192.168.32.13, 10.0.0.193                                                        |
| updated                              | 2017-07-27T14:39:01Z                                                             |
| user_id                              | 81a74358f9dd4b268ae41b74fbda96c3                                                 |
+--------------------------------------+----------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ ping 10.0.0.193
PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data.
^C
--- 10.0.0.193 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms


[heat-admin@compute-0 ~]$ sudo docker exec -it nova_libvirt /bin/bash
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
()[root@compute-0 /]# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-00000002              running


()[root@compute-0 /]# virsh console instance-00000002 
Connected to domain instance-00000002
Escape character is ^]

[heat-admin@compute-0 ~]$ sudo docker exec -it neutron_ovs_agent /bin/bash
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
()[neutron@compute-0 /]$ ping 10.0.0.193
PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data.
^C
--- 10.0.0.193 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified



Version-Release number of selected component (if applicable):
[heat-admin@compute-1 ~]$ sudo docker images
REPOSITORY                                                                                  TAG                 IMAGE ID            CREATED             SIZE
docker-registry.engineering.redhat.com/rhosp12/openstack-nova-libvirt-docker                2017-07-22.1        f0cb2390b630        21 hours ago        1.016 GB
openstack-nova-libvirt-docker                                                               latest              f0cb2390b630        21 hours ago        1.016 GB
docker-registry.engineering.redhat.com/rhosp12/openstack-neutron-server-docker              2017-07-22.1        f5575774bb54        5 days ago          666.5 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-nova-compute-docker                2017-07-22.1        ef828cfed6e4        5 days ago          1.033 GB
docker-registry.engineering.redhat.com/rhosp12/openstack-neutron-openvswitch-agent-docker   2017-07-22.1        55d1a61f3e7d        5 days ago          629 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-nova-libvirt-docker                <none>              7766e95409a1        5 days ago          939.5 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-mariadb-docker                     2017-07-22.1        5efd1092a3bd        5 days ago          587.5 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-iscsid-docker                      2017-07-22.1        ff760a12c723        5 days ago          305 MB

openstack-neutron-common-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-glance-15.0.0-0.20170718113127.20ea7ab.el7ost.noarch
openstack-neutron-ml2-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-ironic-api-8.0.1-0.20170719072039.d9983f1.el7ost.noarch
puppet-openstack_extras-11.2.0-0.20170704143612.8932465.el7ost.noarch
openstack-nova-conductor-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-mistral-engine-5.0.0-0.20170718095321.61231ec.el7ost.noarch
openstack-heat-engine-9.0.0-0.20170719132024.923d018.el7ost.noarch
python-openstackclient-3.11.0-0.20170613232431.c69304e.el7ost.noarch
openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch
openstack-tripleo-common-containers-7.3.1-0.20170718114623.1d79e16.el7ost.noarch
openstack-ironic-inspector-5.1.1-0.20170705203602.c38596e.el7ost.noarch
puppet-openstacklib-11.2.0-0.20170714191355.76de885.el7ost.noarch
openstack-nova-common-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-nova-scheduler-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-mistral-api-5.0.0-0.20170718095321.61231ec.el7ost.noarch
openstack-tempest-16.1.1-0.20170719134023.2a0e141.el7ost.noarch
openstack-heat-api-cfn-9.0.0-0.20170719132024.923d018.el7ost.noarch
openstack-swift-container-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch
openstack-tripleo-validations-7.1.1-0.20170717141229.ce35d5f.el7ost.noarch
openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch
python-openstacksdk-0.9.17-0.20170621195806.7946243.el7ost.noarch
openstack-heat-common-9.0.0-0.20170719132024.923d018.el7ost.noarch
openstack-mistral-common-5.0.0-0.20170718095321.61231ec.el7ost.noarch
openstack-nova-placement-api-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-nova-api-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-neutron-openvswitch-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-keystone-12.0.0-0.20170718172821.239bc36.el7ost.noarch
openstack-heat-api-9.0.0-0.20170719132024.923d018.el7ost.noarch
openstack-tripleo-image-elements-7.0.0-0.20170712081605.35068ac.el7ost.noarch
openstack-swift-account-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch
python-openstack-mistral-5.0.0-0.20170718095321.61231ec.el7ost.noarch
openstack-nova-compute-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-mistral-executor-5.0.0-0.20170718095321.61231ec.el7ost.noarch
openstack-zaqar-5.0.0-0.20170719124338.13b85cc.el7ost.noarch
openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch
openstack-swift-proxy-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch
openstack-tripleo-ui-7.1.1-0.20170718122426.8337319.el7ost.noarch
openstack-tripleo-common-7.3.1-0.20170718114623.1d79e16.el7ost.noarch
openstack-ironic-common-8.0.1-0.20170719072039.d9983f1.el7ost.noarch
openstack-neutron-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-ironic-conductor-8.0.1-0.20170719072039.d9983f1.el7ost.noarch
openstack-swift-object-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.0-0.20170715003644.4092ef5.el7ost.noarch


Steps to Reproduce:
1. deploy HA with 2 compute nodes http://etherpad.corp.redhat.com/testing-osp12-containers-ha
Befora deployment patch your local tht template that will be used for deployment with  https://review.openstack.org/471956 and https://review.openstack.org/482170 t

2. boot instance  
3. live migrate instance
4. wait for some time...
5. try to ping instance
Actual results:
Instance not reachable 

Expected results:
instance reachable

Additional info:

Comment 2 Artom Lifshitz 2017-08-03 18:16:52 UTC
Hello,

Thanks for the bug report. I have a couple of questions to get a better idea of what's going on.

Is this consistently reproducible? Does every instance that gets live migrated become unreachable?

Is the 24 hour delay between live migration and becoming unreachable constant? Assuming all instances become unreachable after being migrated, does it happen to all of them after 24 hours, or do some take 1, some 3, some 14, etc?

Do instances that do *not* get live migrated stay reachable all the time? Or, do some, or all of them, also become unreachable after a certain period of time?

Cheers!

Comment 3 Artom Lifshitz 2017-08-04 13:56:01 UTC
Also, if you reproduce this, it'd be awesome to get sosreports attached to this bugzilla. I'm specifically interested in qemu debug logs (in addition to nova debug logs). Thanks!

Comment 4 Artom Lifshitz 2017-08-18 00:37:45 UTC
Hi Artem,

I'm going to close this bug for now to get if off the Compute DFG's pre-triage list. I understand Kashyap was interested in this, and he might even have some of the qemu logs. He's currently on PTO though, so we can't check with him.

By all means re-open this bugzilla if logs become available, or if Kashyap comes back from PTO and finds something.

Cheers!

Comment 5 Artem Hrechanychenko 2017-10-19 22:50:39 UTC
agree!
If i get this issue again - I'll update this issue


Note You need to log in before you can comment on or make changes to this bug.