Created attachment 1370007 [details] Compute-0 sosreport Description of problem: I'm trying to live migrate instance on OSPd10 DPDK environment using guide [1] On the first try, migration failed with the following warning: 2017-12-19 08:17:26.293 20611 WARNING nova.virt.libvirt.driver [req-1c5f59ca-b9ff-4522-90eb-0dfe20f52b89 - - - - -] couldn't obtain the vcpu count from domain id: 5956b817-567f-4eb5-a6c5-bf640d7ae8f1, exception: Requested operation is not valid: cpu affinity is not supported So I removed isolcpus parameter from the grub and disabled tuned service. On second try, migration failed and the instance disappeared from virt. 2017-12-19 10:03:01.180 2698 ERROR oslo_messaging.rpc.server raise exception.InstanceNotFound(instance_id=instance_name) 2017-12-19 10:03:01.180 2698 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance instance-00000009 could not be found. Huge pages have to be assigned to DPDK instance. [stack@undercloud-0 ~]$ openstack flavor show m1.test | grep properties | properties | hw:mem_page_size='large' | Instance gets into error state: [stack@undercloud-0 ~]$ openstack server list +-------------------+------+--------+-------------------+-------------------+ | ID | Name | Status | Networks | Image Name | +-------------------+------+--------+-------------------+-------------------+ | a6047aed-e1fb- | test | ERROR | mgmt=10.35.141.17 | rhel-guest-image- | | 48c0-994f- | | | 1 | 7.3-36.x86_64.qco | | 65130aa19363 | | | | w2 | +-------------------+------+--------+-------------------+-------------------+ moreover, virsh display empty list [root@compute-1 ~]# virsh list --all Id Name State ---------------------------------------------------- [root@compute-0 ~]# virsh list --all Id Name State ---------------------------------------------------- But available at dest host: [root@compute-0 ~]# ll /var/lib/nova/instances/a6047aed-e1fb-48c0-994f-65130aa19363/ Those instances located on the hypervisor as mentioned above, when using --block-migration I'm getting the following error: $ openstack server migrate a6047aed-e1fb-48c0-994f-65130aa19363 --block-migration --live compute-0.localdomain compute-1.localdomain is not on shared storage: Live migration can not be used without shared storage except a booted from volume VM which does not have a local disk. (HTTP 400) (Request-ID: req-ff46ea00-5dd0-4096-9839-1172b5b9faa4) sosreport for both compute nodes are attached. Thanks! [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/director_installation_and_usage/#sect-Migrating_VMs_from_an_Overcloud_Compute_Node Version-Release number of selected component (if applicable): OSPd10 openstack-tripleo-heat-templates-5.3.3-1.el7ost.noarch libvirt-3.2.0-14.el7_4.4.x86_6 4 openstack-nova-compute-14.0.8-5.el7ost.noarch How reproducible: Always Steps to Reproduce: 1.Boot instance 2.openstack migrate to the another host 3. Actual results: Migration fails Expected results: Should move to second compute with CPU pinning and active tuned profile. Additional info: This migration works on OSP12 DPDK env.
Created attachment 1370008 [details] Compute-1 sosreport
I can't find the bugzilla but I think there is a know issue with command 'openstack server migrate' when using '--block-migration' option. Can you try with command "nova live-migrate --block-migration" ? Also can you provide sosreport of source host since it's where we will have the error message returned by libvirt/QEMU.
Oh I see compute-1 is source host, but there is not a lot of information regarding an error related to a DPDK context. Please try with nova command.
Hi, [stack@undercloud-0 ~]$ nova live-migration --block-migrate test Still getting into error state: | OS-EXT-STS:vm_state | error 2017-12-25 07:58:33.407 2698 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance instance-0000000b could not be found. 2017-12-25 07:58:33.407 2698 ERROR oslo_messaging.rpc.server 2017-12-25 07:58:45.182 2698 INFO nova.compute.manager [-] [instance: 6e269ae4-bfc7-4dd9-903a-8cd94e480271] VM Stopped (Lifecycle Event) [root@compute-0 ~]# ll /var/lib/nova/instances/6e269ae4-bfc7-4dd9-903a-8cd94e480271/ total 56644 -rw-------. 1 root root 0 Dec 25 07:58 console.log -rw-r--r--. 1 root root 57999360 Dec 25 07:58 disk -rw-r--r--. 1 nova nova 78 Dec 25 07:58 disk.info Any additional info I could provide? would you like to take a look at the setup? Thanks,
It seems that some wrong happened during the post live migration step but the logs you have reported do not included DEBUG so we can't investigate more of that what was the root cause. Can you configure nova.conf in debug and reproduce the case?
Hum so the instance is crashing in destination host. ... 2018-01-10T13:36:07.894624Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/1 (label charserial1) 2018-01-10T13:36:11.366484Z qemu-kvm: Not a migration stream 2018-01-10T13:36:11.366722Z qemu-kvm: load of migration failed: Invalid argument 2018-01-10 13:36:11.596+0000: shutting down, reason=crashed I'm exchanging of that with dgilbert and continuing investigation...
[root@compute-1 ~]# ovs-vsctl show e333b920-a3df-4a7f-9256-0fb90824e9c8 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-int Interface br-int type: internal Port int-br-link Interface int-br-link type: patch options: {peer=phy-br-link} Port "vhue13713ea-58" tag: 8 Interface "vhue13713ea-58" type: dpdkvhostuser Bridge br-link Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "dpdk0" Interface "dpdk0" type: dpdk Port br-link Interface br-link type: internal Port phy-br-link Interface phy-br-link type: patch options: {peer=int-br-link} ovs_version: "2.6.1" [root@compute-1 ~]# ovs-vsctl list interface vhue13713ea-58 _uuid : 58955a74-51c3-4c7a-b92c-ce4e81aba761 admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {attached-mac="fa:16:3e:09:79:34", iface-id="e13713ea-58e2-4f2e-9e21-8923accdd0c4", iface-status=active, vm-uuid="1719f566-7903-4cff-8f75-3d2801f78f66"} ifindex : 0 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "00:00:00:00:00:00" mtu : 1496 mtu_request : 1496 name : "vhue13713ea-58" ofport : 9 ofport_request : [] options : {} other_config : {} statistics : {"rx_1024_to_1518_packets"=1, "rx_128_to_255_packets"=26, "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=16, "rx_256_to_511_packets"=4, "rx_512_to_1023_packets"=0, "rx_65_to_127_packets"=363, rx_bytes=38586, rx_dropped=0, rx_errors=0, rx_packets=409, tx_bytes=45541, tx_packets=468} status : {} type : dpdkvhostuser [root@compute-1 ~]# cat /var/log/libvirt/qemu/instance-00000008.log 2018-01-10 15:38:19.809+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-12-19-04:58:04, x86-041.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.13), hostname: compute-1.localdomain LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000008,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-8-instance-00000008/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client-IBRS,ss=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off -m 4096 -realtime mlock=off -smp 6,sockets=3,cores=1,threads=2 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/8-instance-00000008,share=yes,size=4294967296,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-5,memdev=ram-node0 -uuid 1719f566-7903-4cff-8f75-3d2801f78f66 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.8-5.el7ost,serial=e7b3bfa8-30e2-42ce-95c6-58637aa201a5,uuid=1719f566-7903-4cff-8f75-3d2801f78f66,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-8-instance-00000008/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/1719f566-7903-4cff-8f75-3d2801f78f66/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue13713ea-58 -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:09:79:34,bus=pci.0,addr=0x3 -add-fd set=0,fd=27 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.100.120.112:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on 2018-01-10T15:38:19.954121Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/1 (label charserial1) 2018-01-10 16:01:17.447+0000: initiating migration 2018-01-10T16:01:17.453777Z qemu-kvm: Failed to read msg header. Read -1 instead of 12. Original request 6. 2018-01-10T16:01:17.453998Z qemu-kvm: vhost_set_log_base failed: Input/output error (5) 2018-01-10T16:01:17.454060Z qemu-kvm: Failed to set msg fds. 2018-01-10T16:01:17.454076Z qemu-kvm: vhost_set_vring_addr failed: Invalid argument (22) 2018-01-10T16:01:17.454090Z qemu-kvm: Failed to set msg fds. 2018-01-10T16:01:17.454111Z qemu-kvm: vhost_set_vring_addr failed: Invalid argument (22) 2018-01-10T16:01:17.454125Z qemu-kvm: Failed to set msg fds. 2018-01-10T16:01:17.454138Z qemu-kvm: vhost_set_features failed: Invalid argument (22) 2018-01-10 16:01:17.697+0000: shutting down, reason=crashed Based on errors it seems that we are hitting the "Issue2" of bug 1450680. I'm marking it as duplicate even if we do not have configured the interface to use 2 queues and do not have traffic in the guest. *** This bug has been marked as a duplicate of bug 1450680 ***
I was unable to reproduce the issue with latest puddle I suspect an issue in OVS/DPDK configuration. Please re-open if necessary.