Description of problem: (overcloud) [stack@undercloud-0 ~]$ nova show nisim1 |grep hyp | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local overcloud) [stack@undercloud-0 ~]$ ssh cirros.0.193 Warning: Permanently added '10.0.0.193' (RSA) to the list of known hosts. cirros.0.193's password: $ echo "testing"> file $ cat file testing $ exit (overcloud) [stack@undercloud-0 ~]$ nova live-migration nisim1 nova list +--------------------------------------+--------+-----------+------------+-------------+---------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+-----------+------------+-------------+---------------------------------------+ | 95efb634-5442-41f0-9ba4-52c300ed59dc | nisim1 | MIGRATING | migrating | Running | tenantvxlan=192.168.32.13, 10.0.0.193 | nova list +--------------------------------------+--------+--------+------------+-------------+---------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+---------------------------------------+ | 95efb634-5442-41f0-9ba4-52c300ed59dc | nisim1 | ACTIVE | - | Running | tenantvxlan=192.168.32.13, 10.0.0.193 | nova show nisim1 |grep hyp | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.redhat.local ping 10.0.0.193 PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data. 64 bytes from 10.0.0.193: icmp_seq=1 ttl=63 time=1.68 ms ssh cirros.0.193 Warning: Permanently added '10.0.0.193' (RSA) to the list of known hosts. cirros.0.193's password: $ ls file $ cat file testing After one day I cannot ping or vnc console to instance from libvirt container (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+--------+--------+------------+-------------+---------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+---------------------------------------+ | 95efb634-5442-41f0-9ba4-52c300ed59dc | nisim1 | ACTIVE | - | Running | tenantvxlan=192.168.32.13, 10.0.0.193 | +--------------------------------------+--------+--------+------------+-------------+---------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova show nisim1 +--------------------------------------+----------------------------------------------------------------------------------+ | Property | Value | +--------------------------------------+----------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-0.redhat.local | | OS-EXT-SRV-ATTR:hostname | nisim1 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.redhat.local | | OS-EXT-SRV-ATTR:instance_name | instance-00000002 | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-xycj4lpq | | OS-EXT-SRV-ATTR:root_device_name | /dev/vda | | OS-EXT-SRV-ATTR:user_data | - | | OS-EXT-STS:power_state | 1 | | OS-EXT-STS:task_state | - | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2017-07-27T14:33:19.000000 | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | config_drive | | | created | 2017-07-27T14:33:04Z | | description | - | | flavor | m1.tiny (1) | | hostId | 64d14ac1f4aeb533393623c6a25fe417d363b1c651f70b62085416e7 | | host_status | UP | | id | 95efb634-5442-41f0-9ba4-52c300ed59dc | | image | cirros (a889c1eb-bee1-485e-84aa-0fc76e377e25) | | key_name | oskey | | locked | False | | metadata | {} | | name | nisim1 | | os-extended-volumes:volumes_attached | [{"id": "fffb4f3c-91d4-4883-9643-0f193cd50fcd", "delete_on_termination": false}] | | progress | 0 | | security_groups | default | | status | ACTIVE | | tags | [] | | tenant_id | 709029e9ec5e418ea0261021eb37826a | | tenantvxlan network | 192.168.32.13, 10.0.0.193 | | updated | 2017-07-27T14:39:01Z | | user_id | 81a74358f9dd4b268ae41b74fbda96c3 | +--------------------------------------+----------------------------------------------------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ ping 10.0.0.193 PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data. ^C --- 10.0.0.193 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms [heat-admin@compute-0 ~]$ sudo docker exec -it nova_libvirt /bin/bash tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified ()[root@compute-0 /]# virsh list Id Name State ---------------------------------------------------- 1 instance-00000002 running ()[root@compute-0 /]# virsh console instance-00000002 Connected to domain instance-00000002 Escape character is ^] [heat-admin@compute-0 ~]$ sudo docker exec -it neutron_ovs_agent /bin/bash tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified ()[neutron@compute-0 /]$ ping 10.0.0.193 PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data. ^C --- 10.0.0.193 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified Version-Release number of selected component (if applicable): [heat-admin@compute-1 ~]$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker-registry.engineering.redhat.com/rhosp12/openstack-nova-libvirt-docker 2017-07-22.1 f0cb2390b630 21 hours ago 1.016 GB openstack-nova-libvirt-docker latest f0cb2390b630 21 hours ago 1.016 GB docker-registry.engineering.redhat.com/rhosp12/openstack-neutron-server-docker 2017-07-22.1 f5575774bb54 5 days ago 666.5 MB docker-registry.engineering.redhat.com/rhosp12/openstack-nova-compute-docker 2017-07-22.1 ef828cfed6e4 5 days ago 1.033 GB docker-registry.engineering.redhat.com/rhosp12/openstack-neutron-openvswitch-agent-docker 2017-07-22.1 55d1a61f3e7d 5 days ago 629 MB docker-registry.engineering.redhat.com/rhosp12/openstack-nova-libvirt-docker <none> 7766e95409a1 5 days ago 939.5 MB docker-registry.engineering.redhat.com/rhosp12/openstack-mariadb-docker 2017-07-22.1 5efd1092a3bd 5 days ago 587.5 MB docker-registry.engineering.redhat.com/rhosp12/openstack-iscsid-docker 2017-07-22.1 ff760a12c723 5 days ago 305 MB openstack-neutron-common-11.0.0-0.20170719132730.1c94a80.el7ost.noarch openstack-glance-15.0.0-0.20170718113127.20ea7ab.el7ost.noarch openstack-neutron-ml2-11.0.0-0.20170719132730.1c94a80.el7ost.noarch openstack-ironic-api-8.0.1-0.20170719072039.d9983f1.el7ost.noarch puppet-openstack_extras-11.2.0-0.20170704143612.8932465.el7ost.noarch openstack-nova-conductor-16.0.0-0.20170719155122.7ae3753.el7ost.noarch openstack-mistral-engine-5.0.0-0.20170718095321.61231ec.el7ost.noarch openstack-heat-engine-9.0.0-0.20170719132024.923d018.el7ost.noarch python-openstackclient-3.11.0-0.20170613232431.c69304e.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch openstack-tripleo-common-containers-7.3.1-0.20170718114623.1d79e16.el7ost.noarch openstack-ironic-inspector-5.1.1-0.20170705203602.c38596e.el7ost.noarch puppet-openstacklib-11.2.0-0.20170714191355.76de885.el7ost.noarch openstack-nova-common-16.0.0-0.20170719155122.7ae3753.el7ost.noarch openstack-nova-scheduler-16.0.0-0.20170719155122.7ae3753.el7ost.noarch openstack-mistral-api-5.0.0-0.20170718095321.61231ec.el7ost.noarch openstack-tempest-16.1.1-0.20170719134023.2a0e141.el7ost.noarch openstack-heat-api-cfn-9.0.0-0.20170719132024.923d018.el7ost.noarch openstack-swift-container-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch openstack-tripleo-validations-7.1.1-0.20170717141229.ce35d5f.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch python-openstacksdk-0.9.17-0.20170621195806.7946243.el7ost.noarch openstack-heat-common-9.0.0-0.20170719132024.923d018.el7ost.noarch openstack-mistral-common-5.0.0-0.20170718095321.61231ec.el7ost.noarch openstack-nova-placement-api-16.0.0-0.20170719155122.7ae3753.el7ost.noarch openstack-nova-api-16.0.0-0.20170719155122.7ae3753.el7ost.noarch openstack-neutron-openvswitch-11.0.0-0.20170719132730.1c94a80.el7ost.noarch openstack-keystone-12.0.0-0.20170718172821.239bc36.el7ost.noarch openstack-heat-api-9.0.0-0.20170719132024.923d018.el7ost.noarch openstack-tripleo-image-elements-7.0.0-0.20170712081605.35068ac.el7ost.noarch openstack-swift-account-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch python-openstack-mistral-5.0.0-0.20170718095321.61231ec.el7ost.noarch openstack-nova-compute-16.0.0-0.20170719155122.7ae3753.el7ost.noarch openstack-mistral-executor-5.0.0-0.20170718095321.61231ec.el7ost.noarch openstack-zaqar-5.0.0-0.20170719124338.13b85cc.el7ost.noarch openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch openstack-swift-proxy-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch openstack-tripleo-ui-7.1.1-0.20170718122426.8337319.el7ost.noarch openstack-tripleo-common-7.3.1-0.20170718114623.1d79e16.el7ost.noarch openstack-ironic-common-8.0.1-0.20170719072039.d9983f1.el7ost.noarch openstack-neutron-11.0.0-0.20170719132730.1c94a80.el7ost.noarch openstack-ironic-conductor-8.0.1-0.20170719072039.d9983f1.el7ost.noarch openstack-swift-object-2.14.1-0.20170718054917.3c11f6b.el7ost.noarch openstack-tripleo-puppet-elements-7.0.0-0.20170715003644.4092ef5.el7ost.noarch Steps to Reproduce: 1. deploy HA with 2 compute nodes http://etherpad.corp.redhat.com/testing-osp12-containers-ha Befora deployment patch your local tht template that will be used for deployment with https://review.openstack.org/471956 and https://review.openstack.org/482170 t 2. boot instance 3. live migrate instance 4. wait for some time... 5. try to ping instance Actual results: Instance not reachable Expected results: instance reachable Additional info:
Hello, Thanks for the bug report. I have a couple of questions to get a better idea of what's going on. Is this consistently reproducible? Does every instance that gets live migrated become unreachable? Is the 24 hour delay between live migration and becoming unreachable constant? Assuming all instances become unreachable after being migrated, does it happen to all of them after 24 hours, or do some take 1, some 3, some 14, etc? Do instances that do *not* get live migrated stay reachable all the time? Or, do some, or all of them, also become unreachable after a certain period of time? Cheers!
Also, if you reproduce this, it'd be awesome to get sosreports attached to this bugzilla. I'm specifically interested in qemu debug logs (in addition to nova debug logs). Thanks!
Hi Artem, I'm going to close this bug for now to get if off the Compute DFG's pre-triage list. I understand Kashyap was interested in this, and he might even have some of the qemu logs. He's currently on PTO though, so we can't check with him. By all means re-open this bugzilla if logs become available, or if Kashyap comes back from PTO and finds something. Cheers!
agree! If i get this issue again - I'll update this issue