Bug 1283696

Summary: [vdsm] Can't live migrate a VM from 3.6 cluster to 3.5 cluster in 3.6 engine
Product: [oVirt] vdsm Reporter: Jiri Belka <jbelka>
Component: GeneralAssignee: Dan Kenigsberg <danken>
Status: CLOSED WORKSFORME QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.16.29CC: bugs, gklein, jbelka
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-26 12:46:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1285700    

Description Jiri Belka 2015-11-19 15:34:16 UTC
Description of problem:

Can't live migrate a VM from 3.6 cluster to 3.5 cluster in 3.6 engine.

2015-11-19 15:03:57.055+0000: shutting down
2015-11-19 15:05:57.266+0000: starting up libvirt version: 1.2.17, package: 13.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-10-08-09:11:06, x86-035.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-31.el7_2.3)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name w81-64 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu SandyBridge -m size=1048576k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 9aab5765-0762-4827-9d8a-7e3a32c141ed -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.2-20151104.0.el7ev,serial=4C4C4544-0034-5310-8052-B3C04F4A354A,uuid=9aab5765-0762-4827-9d8a-7e3a32c141ed -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-w81-64/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-11-19T15:07:54,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/mnt/10.34.63.204:_home_iso_shared/0c78b4d6-ba00-4d3e-9f9f-65c7d5899d71/images/11111111-1111-1111-1111-111111111111/RHEV-toolsSetup_3.6_2.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-0000000003e1/2834fba3-6200-489d-9868-7b8c162749ca/images/723a00be-a773-42d0-a768-8c6b19844cd0/e9e8f579-05e0-49b9-80eb-2f501e4fd879,if=none,id=drive-virtio-disk0,format=qcow2,serial=723a00be-a773-42d0-a768-8c6b19844cd0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:e7:3f:0c,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/9aab5765-0762-4827-9d8a-7e3a32c141ed.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/9aab5765-0762-4827-9d8a-7e3a32c141ed.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5902,tls-port=5903,addr=10.34.63.223,x509-dir=/etc/pki/vdsm/libvirt-spice,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming tcp:0.0.0.0:49152 -msg timestamp=on
Domain id=5 is tainted: hook-script
2015-11-19T15:05:57.339521Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2015-11-19T15:05:57.339675Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
2015-11-19T15:07:06.995285Z qemu-kvm: error while loading state section id 2(ram)
2015-11-19T15:07:06.995594Z qemu-kvm: load of migration failed: Input/output error
2015-11-19 15:07:07.035+0000: shutting down


...
Nov 19 16:12:18 dell-r210ii-04 journal: vdsm vm.Vm ERROR vmId=`6b2cddc7-bc08-402b-aa92-df10b47116d8`::Stats function failed: <AdvancedStatsFunction _sampleCpuTune at 0x18fb458>#012Traceback (most recent call last):#012  File "/usr/share/vdsm/virt/sampling.py", line 484, in collect#012    statsFunction()#012  File "/usr/share/vdsm/virt/sampling.py", line 359, in __call__#012    retValue = self._function(*args, **kwargs)#012  File "/usr/share/vdsm/virt/vm.py", line 391, in _sampleCpuTune#012    infos['vcpuLimit'] = nodeList[0].childNodes[0].data#012IndexError: list index out of range
...

(Powering off the VM, changing cluster and the VM starts fine on the host where I could not migrate to.)

Version-Release number of selected component (if applicable):
vdsm-4.16.29-1.el7ev.x86_64

How reproducible:
just this w81-x64 VM othere VMs were live migrated fine

Steps to Reproduce:
1.
2.
3.

Actual results:
can't live migrate this specific VM

Expected results:
should work too as i could live migrate other ones

Additional info:
-orig host: Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2
            (20151112.1.el7ev)
            vdsm-4.17.10.1-0.el7ev.noarchbled
- dest host: RHEL-7.2-20151030.0
             vdsm-4.16.29-1.el7ev.x86_64

Comment 2 Yaniv Kaul 2015-11-20 08:07:37 UTC
1. Why is this a low severity bug?
2. Are we sure it's not a QEMU bug? 
2015-11-19T15:07:06.995285Z qemu-kvm: error while loading state section id 2(ram)
2015-11-19T15:07:06.995594Z qemu-kvm: load of migration failed: Input/output error

3. Does it happen all the time?

Comment 3 Jiri Belka 2015-11-20 09:01:44 UTC
(In reply to Yaniv Kaul from comment #2)
> 1. Why is this a low severity bug?

There is bold red warning that live migration during clusters can cause unexpected behaviour.

> 2. Are we sure it's not a QEMU bug? 
> 2015-11-19T15:07:06.995285Z qemu-kvm: error while loading state section id
> 2(ram)
> 2015-11-19T15:07:06.995594Z qemu-kvm: load of migration failed: Input/output
> error

It is engine which instructs libvirt which instructs qemu about how to construct final command.
 
> 3. Does it happen all the time?

Only for this specific VM.

Anyway, there's strange traceback.

Comment 4 Gil Klein 2015-11-26 10:45:42 UTC
Raising severity, cause this was well tested for 3.5, and should work just fine for 3.6

Comment 5 Jiri Belka 2015-11-26 12:46:09 UTC
can't reproduce anymore on vdsm-4.17.11-0.el7ev.noarch (3.6.0-22) -> vdsm-4.16.30-1.el7ev.x86_64 (3.5.6) while using engine from 3.6.0-22.