Description of problem: Live migration of VM with latest oVirt 4.1 release fails. Version-Release number of selected component (if applicable): OS Version: RHEL - 7 - 3.1611.el7.centos OS Description: CentOS Linux 7 (Core) Kernel Version: 3.10.0 - 514.6.1.el7.x86_64 KVM Version: 2.6.0 - 28.el7_3.3.1 LIBVIRT Version: libvirt-2.0.0-10.el7_3.4 VDSM Version: vdsm-4.19.4-1.el7.centos SPICE Version: 0.12.4 - 19.el7 GlusterFS Version: glusterfs-3.8.8-1.el7 CEPH Version: librbd1-0.94.5-1.el7 How reproducible: Update oVirt-Node to latest available oVirt 4.1 release and try to live migrate a VM. Migration ends up with: Migration failed (VM: <Hostname>, Source: <source-node>, Destination: <dest-node>). Steps to Reproduce: 1. 2. 3. Actual results: /var/log/vdsm/vdsm.log (Dest-Node) 2017-02-20 13:52:41,095 ERROR (vm/3dc71915) [virt.vm] (vmId='3dc71915-b154-4731-b933-26dc63ebd6bc') The vm start process failed (vm:616) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 571, in _startUnderlyingVm self._completeIncomingMigration() File "/usr/share/vdsm/virt/vm.py", line 3084, in _completeIncomingMigration self._domDependentInit() File "/usr/share/vdsm/virt/vm.py", line 1787, in _domDependentInit self._vmDependentInit() File "/usr/share/vdsm/virt/vm.py", line 1854, in _vmDependentInit self._dom_vcpu_setup() File "/usr/share/vdsm/virt/vm.py", line 1877, in _dom_vcpu_setup self._updateVcpuLimit() File "/usr/share/vdsm/virt/vm.py", line 2564, in _updateVcpuLimit self._vcpuLimit = vmxml.dom_text(vcpuLimit) AttributeError: 'module' object has no attribute 'dom_text' 2017-02-20 13:52:41,867 WARN (jsonrpc/4) [virt.vm] (vmId='3dc71915-b154-4731-b933-26dc63ebd6bc') trying to set state to Powering down when already Down (vm:351) 2017-02-20 13:52:42,241 WARN (jsonrpc/4) [root] File: /var/lib/libvirt/qemu/channels/3dc71915-b154-4731-b933-26dc63ebd6bc.com.redhat.rhevm.vdsm already removed (utils:120) 2017-02-20 13:52:42,241 WARN (jsonrpc/4) [root] File: /var/lib/libvirt/qemu/channels/3dc71915-b154-4731-b933-26dc63ebd6bc.org.qemu.guest_agent.0 already removed (utils:120) 2017-02-20 13:52:42,243 WARN (libvirt/events) [virt.vm] (vmId='3dc71915-b154-4731-b933-26dc63ebd6bc') trying to set state to Down when already Down (vm:351) Expected results: Live migration of VM on oVirt-Node with latest stable release should be working Additional info: /var/log/libvirt/qemu/<vm>.log (Source-Node) 2017-02-21 09:32:14.104+0000: initiating migration 2017-02-21 09:32:22.998+0000: shutting down 2017-02-21T09:32:23.144289Z qemu-kvm: terminating on signal 15 from pid 16029 /var/log/libvirt/qemu/<vm>.log (Dest-Node) 2017-02-21 09:32:13.901+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.3.1), hostname: .com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=.com,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-18-.com/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Nehalem -m size=2097152k,slots=16,maxmem=8388608k -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=2048 -uuid 3dc71915-b154-4731-b933-26dc63ebd6bc -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=36343035-3333-5A43-4A30-303730314259,uuid=3dc71915-b154-4731-b933-26dc63ebd6bc' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-18-.com/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-02-21T10:32:13,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/2ea1031e-ebc3-4be9-b97f-bc8436745d45/4d256da4-f5f2-4f77-ab18-8d5264033c23/images/fe9c5408-1edc-4415-8396-f2436cac5f3c/8fe48b5c-96d1-46bf-bb1d-7dfd3f0a3007,format=raw,if=none,id=drive-virtio-disk0,serial=fe9c5408-1edc-4415-8396-f2436cac5f3c,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:56,bus=pci.0,addr=0x3,bootindex=2 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/3dc71915-b154-4731-b933-26dc63ebd6bc.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/3dc71915-b154-4731-b933-26dc63ebd6bc.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice tls-port=5900,addr=172.xx.xx.xx,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming defer -msg timestamp=on 2017-02-21T09:32:14.005364Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2017-02-21T09:32:14.005474Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config red_dispatcher_loadvm_commands: 2017-02-21T09:32:32.814759Z qemu-kvm: terminating on signal 15 from pid 2039 2017-02-21 09:32:33.015+0000: shutting down
Can you please attach complete VDSM logs of source and destination?
You seem to shut down the guest in the middle of migration. Few warnings are expected in that case
Created attachment 1256377 [details] vdsm.log dest
Created attachment 1256379 [details] vdsm.log source
(In reply to Michal Skrivanek from comment #2) > You seem to shut down the guest in the middle of migration. Few warnings are > expected in that case Any logs from guest then? It looks like a normal case of shutting down a guest (by you, or by the guest OS itself)
Created attachment 1256387 [details] engine.log oVirt engine log is also attached. Search for: - ovirt-node-source.tld - ovirt-node-dest.tld - ovirt-engine.tld - vm.tld
Created attachment 1256388 [details] VM guest agent log Guest agent log is attached. Installed Packages Name : ovirt-guest-agent-common Arch : noarch Version : 1.0.13 Release : 1.20161220085008.git165fff1.el7.centos
the guest agent log doesn't seem to be from the same time interval (problem happened at 10:05, the agent log starts at 10:26) What I meant was rather guest's system logs, like /var/log/messages or systemd journal as I suspect your guest simply decided to shut down normally, and it just so happens that it did that in the middle of migration, but it seems to be unrelated to that migration. Thanks!
Created attachment 1256408 [details] Migration 11:17 AM
Created attachment 1256409 [details] Migration 11:17 AM
Created attachment 1256411 [details] Migration 11:17 AM
Michal: unfortunately there are no other log entries in /var/log/messages, dmesg or journalctl but a new one in vdsm-source-20170222-1117.log. FYI: I just started another migration at 11:17 AM - the timestamp in every log file should be the same now. Sorry causing confusion.
ok. thanks. in this case I can see that the VM doesn't run immediately after the migration is triggered. This is suspicious, it may be that the qemu process crashes immediately as a consequence of starting migration. Can you please dig out the corresponding systemd journal on host, and qemu.log in /var/log/libvirt/qemu/<vmname>.log?
Created attachment 1256428 [details] journal dest
Created attachment 1256429 [details] journal source
Created attachment 1256430 [details] qemu.log
Michal: are there any news concerning this issue?
bug 1426727 is likely the same thing. Do you use QoS or any CPU profiles? When you see which PID is killing the VM, can you please get the process name? To see if it is libvirt or something else
Yes, this is exactly the same issue (maybe anyone want to declare this ticket as duplicate). Migration is working again after renaming the clusters cpu profile to "Default" and set "QoS Name" to "[Unlimited]"
thanks for confirmation, we now have a fix too. *** This bug has been marked as a duplicate of bug 1426727 ***