Created attachment 1108584 [details] rhev-h logs Description of problem: Migration failed with error - libvirtError: internal error: early end of file from monitor: possible problem: 2015-12-22T07:29:04.742599Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2015-12-22T07:29:04.742812Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config 2015-12-22T07:29:05.023584Z qemu-kvm: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x20000 in != 0x40000: Invalid argument 2015-12-22T07:29:05.023620Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2015-12-22T07:29:05.023722Z qemu-kvm: load of migration failed: Invalid argument - The migration failing from latest rhel 7.2 vdsm-4.17.13-1.el7ev.noarch to rhev-h 7.2(20151218.2.el7ev) vdsm-4.17.13-1.el7ev.noarch and vise versa. - Note, both servers were upgraded from vdsm-4.16.30/31 and before the upgrade the migration between the servers was successful.(attaching screen shot with event log of successful migration between servers just before the upgrade) Version-Release number of selected component (if applicable): rhev-m 3.6.1.3-0.1.el6 vdsm-4.17.13-1.el7ev.noarch(both) libvirt-1.2.17-13.el7_2.2.x86_64(both) qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64(both) Steps to Reproduce: 1. Try migrate between rhel 7.2 latest vdsm 3.6.1.3 and rhev-h 7.2 latest vdsm 3.6.1.3 in both directions. Actual results: Failing with libvirt and numa errors in vdsm logs. Expected results: Should work as expected.
Created attachment 1108585 [details] rhel 7.2 logs Note, the migration failing on both directions, so both servers are source and destination
Created attachment 1108586 [details] screenshots screenshots(event log UI) of the successful migration just before rhev-h upgrade.
Does it happen without upgrade? Is it reproducible? Anything interesting in the VM configuration?
(In reply to Yaniv Kaul from comment #3) > Does it happen without upgrade? Is it reproducible? Anything interesting in > the VM configuration? I saw this issue only as reported and described above.(vdsm 3.5 > 3.6.1) Didn't saw it on 3.5.6/3.5.7 and not on 3.6.1/3.6.2 without involving upgrade. Nothing special on my VMs configurations.
running also into this bug, just without NUMA reference when trying to migrate VM from CentOS 6.7 to CentOS 7.2. Dec 28 13:18:10 onode030231 journal: Domain id=10 name='vm-dtaffin-25796' uuid=86191e76-5765-4e77-b909-5d29150797b9 is tainted: hook-script Dec 28 13:18:10 onode030231 systemd-machined: New machine qemu-vm-dtaffin-25796. Dec 28 13:18:10 onode030231 systemd: Started Virtual Machine qemu-vm-dtaffin-25796. Dec 28 13:18:10 onode030231 systemd: Starting Virtual Machine qemu-vm-dtaffin-25796. Dec 28 13:18:10 onode030231 kvm: 2 guests now active Dec 28 13:18:11 onode030231 kernel: int312: port 3(vnet1) entered disabled state Dec 28 13:18:11 onode030231 kernel: device vnet1 left promiscuous mode Dec 28 13:18:11 onode030231 kernel: int312: port 3(vnet1) entered disabled state Dec 28 13:18:11 onode030231 journal: internal error: End of file from monitor Dec 28 13:18:11 onode030231 journal: internal error: early end of file from monitor: possible problem:#0122015-12-28T12:18:11.038163Z qemu-kvm: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x20000 in != 0x40000: Invalid argument#0122015-12-28T12:18:11.038230Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'#0122015-12-28T12:18:11.038334Z qemu-kvm: load of migration failed: Invalid argument Dec 28 13:18:11 onode030231 kvm: 1 guest now active Dec 28 13:18:11 onode030231 systemd-machined: Machine qemu-vm-dtaffin-25796 terminated. Destination 7.2 host: qemu-kvm-ev-2.3.0-31.el7_2.3.1.x86_64 vdsm-4.17.13-0.el7.centos.noarch libvirt-daemon-1.2.17-13.el7.x86_64 Source 6.7 host: qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 vdsm-4.16.27-0.el6.x86_64 libvirt-0.10.2-54.el6_7.3.x86_64 engine: ovirt-engine-3.6.1.3-1.el6.noarch
Just as additional information: same issue occurs when trying to migrate VM between two CentOS 7.2 hosts. both running identical versions: qemu-kvm-ev-2.3.0-31.el7_2.3.1.x86_64 vdsm-4.17.13-0.el7.centos.noarch libvirt-daemon-1.2.17-13.el7.x86_64 In case it matters: SELinux enforced on all hosts.
(In reply to Michael Burman from comment #4) > (In reply to Yaniv Kaul from comment #3) > > Does it happen without upgrade? Is it reproducible? Anything interesting in > > the VM configuration? > > I saw this issue only as reported and described above.(vdsm 3.5 > 3.6.1) > Didn't saw it on 3.5.6/3.5.7 and not on 3.6.1/3.6.2 without involving > upgrade. > Nothing special on my VMs configurations. so both hosts were upgraded, what about engine? if yes - it's in cluster level 3.5 or 3.6? VM were still running? No stop&start or anything?
(In reply to dominique.taffin from comment #6) similar question to you. But from your details it seems like you are running that VM in 3.5 cluster level. migrated from 6.7 to 7.2 and it failed (correct?); then a separate vm between two 7.2 hosts(correct?) - when and where did you launch that vm? any previous migrations?
Hi Michal, Yes, both hosts were upgraded, as well the engine ^^ to rhev-m 3.6.1.3-0.1.el6. It was part of a whole upgrade cycle in a very mixed environment, please note it's been a while since reported. First the engine was upgraded(from 3.5.7), then i upgraded my 2 servers to 3.6 vdsm, and i think that i upgraded my cluster level to 3.6(but i can't really be sure, this setup no longer exists in the reported status and maybe the cluster level left on 3.5) VMs were still running(no stop/start), 1 VM on each host.
Hello, (In reply to Michal Skrivanek from comment #8) > similar question to you. But from your details it seems like you are running > that VM in 3.5 cluster level. migrated from 6.7 to 7.2 and it failed > (correct?); then a separate vm between two 7.2 hosts(correct?) - when and > where did you launch that vm? any previous migrations? correct. background: We do have a large infrastructure with several thousand VMs runinng on 3.5.7, cluster level 3.5. We do need to migrate those step by step without downtime to oVirt 3.6.x. our migration step is: - update engine to latest 3.6.x - move some CentOS 6 hosts of an old cluster (running in 3.5 level) to maintenance, reinstall them using CentOS 7.2 and 3.6.x ovirt packages. - put CentOS 7.2 hosts in new cluster, migrate some VMs from old cluster to new one. - repeat steps until all VMs / hosts are in new cluster. Using the latest qemu-kvm-ev version we are now able to migrate VMs that have been launched on CentOS 7 between CentOS 7 hosts, but are still not able to migrate between CentOS 6 and CentOS 7 hosts, meaning we are blocked. Please let me know what information I can provide in order to assist you.
(In reply to dominique.taffin from comment #10) > background: We do have a large infrastructure with several thousand VMs > runinng on 3.5.7, cluster level 3.5. We do need to migrate those step by > step without downtime to oVirt 3.6.x. that's quite a few - did you consider automation via REST API or everything manual only? > our migration step is: > - update engine to latest 3.6.x > - move some CentOS 6 hosts of an old cluster (running in 3.5 level) to > maintenance, reinstall them using CentOS 7.2 and 3.6.x ovirt packages. > - put CentOS 7.2 hosts in new cluster, migrate some VMs from old cluster to > new one. can you please confirm that cluster settings are exactly the same between both? It needs to match not only the actual cluster level, but all other properties as well > Using the latest qemu-kvm-ev version we are now able to migrate VMs that > have been launched on CentOS 7 between CentOS 7 hosts, but are still not so unlike Michael's case migration between 7.2 and 7.2 works ok? is that cross-cluster(3.5->3.5) or within cluster(3.5)?
(In reply to Michal Skrivanek from comment #11) > > that's quite a few - did you consider automation via REST API or everything > manual only? Mainly manual over several weeks as we do need to move host by host. > can you please confirm that cluster settings are exactly the same between > both? > It needs to match not only the actual cluster level, but all other > properties as well I will recheck to verify and come back to you on this. AFAIK everything is identical. > so unlike Michael's case migration between 7.2 and 7.2 works ok? is that > cross-cluster(3.5->3.5) or within cluster(3.5)? migration within cluster(3.5 level / 7.2 hosts). Cross Cluster 3.5/7.2 not tested.
Verified again, all cluster settings are identical. here a current libvirt log entry for an example VM that fails: 2016-01-14 09:25:24.171+0000: starting up libvirt version: 1.2.17, package: 13.el7 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-20-16:24:10, worker1.bsys.centos.org), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7_2.4.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name vm-dtaffin-25796 -S -machine rhel6.5.0,accel=kvm,usb=off -cpu Westmere -m 1024 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 86191e76-5765-4e77-b909-5d29150797b9 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=6-7.el6.centos.12.3,serial=32393735-3733-5A43-3332-303235575250,uuid=86191e76-5765-4e77-b909-5d29150797b9 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vm-dtaffin-25796/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-01-14T09:25:23,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000002-0002-0002-0002-00000000037c/2dfc3bc7-ec09-4efa-82fb-0615b1f7c1d0/images/2d526a9a-43c4-4d5b-99bf-3460d2aceb01/d8ddaf83-3fec-439e-931b-a5d89eb1b05d,if=none,id=drive-virtio-disk0,format=raw,serial=2d526a9a-43c4-4d5b-99bf-3460d2aceb01,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:03:00:10,bus=pci.0,addr=0x3,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/86191e76-5765-4e77-b909-5d29150797b9.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/86191e76-5765-4e77-b909-5d29150797b9.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5902,tls-port=5903,addr=10.76.98.160,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on Domain id=7 is tainted: hook-script 2016-01-14T09:25:24.603034Z qemu-kvm: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x20000 in != 0x40000: Invalid argument 2016-01-14T09:25:24.603108Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2016-01-14T09:25:24.603193Z qemu-kvm: load of migration failed: Invalid argument 2016-01-14 09:25:24.625+0000: shutting down
(In reply to Michael Burman from comment #9) > Hi Michal, > > Yes, both hosts were upgraded, as well the engine ^^ to rhev-m > 3.6.1.3-0.1.el6. > > It was part of a whole upgrade cycle in a very mixed environment, please > note it's been a while since reported. > > First the engine was upgraded(from 3.5.7), then i upgraded my 2 servers to > 3.6 vdsm, and i think that i upgraded my cluster level to 3.6(but i can't > really be sure, this setup no longer exists in the reported status and maybe > the cluster level left on 3.5) > > VMs were still running(no stop/start), 1 VM on each host. I've reviewed the logs and I wonder if it's the same issue or not. In your case the vms started with the new machine type (i.e. in upgraded cluster level 3.6) and were not running. E.g. vm-n2 was shut down as 3.5 VM and then started as a 3.6 VM properly (not via migration) Also, your hosts have different TZ set so it's a bit difficult to troubleshoot logs. That said, the last migration on vm-n2 should not have failed.
(In reply to dominique.taffin from comment #13) > Verified again, all cluster settings are identical. > > here a current libvirt log entry for an example VM that fails: ... > 2016-01-14T09:25:24.603034Z qemu-kvm: Length mismatch: > 0000:00:03.0/virtio-net-pci.rom: 0x20000 in != 0x40000: Invalid argument > 2016-01-14T09:25:24.603108Z qemu-kvm: error while loading state for instance > 0x0 of device 'ram' > 2016-01-14T09:25:24.603193Z qemu-kvm: load of migration failed: Invalid > argument > 2016-01-14 09:25:24.625+0000: shutting down After comparing with comment #14 it looks similar, but there is a difference in machine type. Michale's VM is 3.6 and yours is 3.5(which is correct/consistent to what you described) We need to retest qemu migration support. I suppose it happens when the VM has a NIC, right? Can you quickly test a VM without any? That would be helpful Thanks a lot!
(In reply to Michal Skrivanek from comment #15) > We need to retest qemu migration support. I suppose it happens when the VM > has a NIC, right? Can you quickly test a VM without any? That would be > helpful All of our VMs do have at least 1 NIC. Depending on customer request, also 2 NICs per VM. I will deploy an identical VM and remove the NIC. please note that we also use PXE as primary boot target, as all OS deployment is done via PXE. All our KVM NICs are VirtIO.
migration without NIC is working. I noted that location and filename on the hypervisor are different for PXE files. But I assume it does not matter, as the newer qemu-kvm-ev should be build with correct paths. CentOS 6: /usr/share/gpxe/virtio-net.rom CentOS 7: /usr/share/qemu-kvm/rhel6-virtio.rom (/usr/share/ipxe/1af41000.rom) libvirt log for the successfull mirgration (without NIC): 2016-01-14 10:15:37.592+0000: starting up libvirt version: 1.2.17, package: 13.el7 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-20-16:24:10, worker1.bsys.centos.org), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7_2.4.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name vm-dtaffin-26037 -S -machine rhel6.5.0,accel=kvm,usb=off -cpu Westmere -m 1024 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid d28b8835-c360-418b-b45d-5842df1765e6 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=6-7.el6.centos.12.3,serial=32393735-3733-5A43-3332-303235575245,uuid=d28b8835-c360-418b-b45d-5842df1765e6 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vm-dtaffin-26037/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-01-14T10:15:37,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000002-0002-0002-0002-00000000037c/0bb94892-4574-4d7f-a514-478999af10a0/images/a2545f6b-141a-4769-9151-39276e76ba16/235662a0-2845-43f7-b61a-c9e613bca557,if=none,id=drive-virtio-disk0,format=raw,serial=a2545f6b-141a-4769-9151-39276e76ba16,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/d28b8835-c360-418b-b45d-5842df1765e6.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/d28b8835-c360-418b-b45d-5842df1765e6.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5902,tls-port=5903,addr=10.76.98.160,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on Domain id=8 is tainted: hook-script copying E and F segments from pc.bios to pc.ram copying C and D segments from pc.rom to pc.ram best, Dominique
(In reply to Michael Burman from comment #9) Meital, we would need need a local reproducer ASAP. Thanks.
I think I found it: The PXE ROMs have different md5sums. I copied the PXE ROM file from CentOS 6 to the CentOS 7 machine: CentOS 6 Source: /usr/share/gpxe/virtio-net.rom CentOS 7.2 Destinations (yes, no symlink, but 2 copies for testing): /usr/share/qemu-kvm/rhel6-virtio.rom /usr/share/ipxe/1af41000.rom and the migration seems to work. I will have to test it some more with other VMs.
(In reply to dominique.taffin from comment #17) > migration without NIC is working. Makes sense, because: (In reply to Michael Burman from comment #0) > Migration failed with error - libvirtError: internal error: early end of > file from monitor: possible problem: > 2015-12-22T07:29:04.742599Z qemu-kvm: warning: CPU(s) not present in any > NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > 2015-12-22T07:29:04.742812Z qemu-kvm: warning: All CPU(s) up to maxcpus This is a warning we need to check, but should not be critical > should be described in NUMA config > 2015-12-22T07:29:05.023584Z qemu-kvm: Length mismatch: > 0000:00:03.0/virtio-net-pci.rom: 0x20000 in != 0x40000: Invalid argument > 2015-12-22T07:29:05.023620Z qemu-kvm: error while loading state for instance > 0x0 of device 'ram' > 2015-12-22T07:29:05.023722Z qemu-kvm: load of migration failed: Invalid > argument This really looks like a qemu issue, in the upgrade path. What Vdsm needs to guarantee is that the configuration of the VMs is consistent and correct. I will now carefully check that Vdsm/Engine did the right thing and gave consistent configuration. If this is the case, we'll need to move the bug down the stack, to qemu.
(In reply to dominique.taffin from comment #19) can you please try with older gpxe on the el6 host and the original state on el7 host? e.g. gpxe-0.9.7-6.12.el6 (I suppose the VM would need to be started with this gpxe in place on that el6 host first) Just to check which side is to blame
(In reply to Michal Skrivanek from comment #21) > can you please try with older gpxe on the el6 host and the original state on > el7 host? e.g. gpxe-0.9.7-6.12.el6 (I suppose the VM would need to be > started with this gpxe in place on that el6 host first) > Just to check which side is to blame I will, but it might take up until Monday noon before I can report back.
(In reply to Michael Burman from comment #0) > Created attachment 1108584 [details] > rhev-h logs > > Description of problem: > Migration failed with error - libvirtError: internal error: early end of > file from monitor: possible problem: > 2015-12-22T07:29:04.742599Z qemu-kvm: warning: CPU(s) not present in any > NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > 2015-12-22T07:29:04.742812Z qemu-kvm: warning: All CPU(s) up to maxcpus > should be described in NUMA config In order to debug this ^^^^^^^^^^^^^^^^ could you please share more vdsm logs, showing how this VM (vmId': u'22fa763b-3ea5-473f-8621-3eefeb51c350) was created? Specifically, I'd like to see the logs regarding VM.create verb. A Vdsm log snippet which shows both VM creation and (failed) migration would be very nice.
(In reply to Francesco Romani from comment #23) > (In reply to Michael Burman from comment #0) > > Created attachment 1108584 [details] > > rhev-h logs > > > > Description of problem: > > Migration failed with error - libvirtError: internal error: early end of > > file from monitor: possible problem: > > 2015-12-22T07:29:04.742599Z qemu-kvm: warning: CPU(s) not present in any > > NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > 2015-12-22T07:29:04.742812Z qemu-kvm: warning: All CPU(s) up to maxcpus > > should be described in NUMA config > > In order to debug this ^^^^^^^^^^^^^^^^ > > could you please share more vdsm logs, showing how this VM (vmId': > u'22fa763b-3ea5-473f-8621-3eefeb51c350) was created? Specifically, I'd like > to see the logs regarding VM.create verb. > > A Vdsm log snippet which shows both VM creation and (failed) migration would > be very nice. Sorry, I missed this bit in the migration XML <vcpu placement='static' current='1'>16</vcpu> looks like the VM was configured to have just one cpu, is this right?
(In reply to Michal Skrivanek from comment #21) > can you please try with older gpxe on the el6 host and the original state on > el7 host? e.g. gpxe-0.9.7-6.12.el6 (I suppose the VM would need to be > started with this gpxe in place on that el6 host first) > Just to check which side is to blame OK, reinstalled all vdsm/qemu/... packages on centos6 and centos7 to ensure my virtio.net ROM image is gone. Deployed new VM running with stock PXE ROM image. Current setup 1st host: CentOS 6.7 host with: qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 gpxe-roms-qemu-0.9.7-6.14.el6.noarch qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 vdsm-4.16.27-0.el6.x86_64 and: md5sum /usr/share/gpxe/virtio-net.rom bab6408c84e62746fdc06fe9baa47919 /usr/share/gpxe/virtio-net.rom Current setup 2nd host: CentOS 7.2 with: qemu-kvm-ev-2.3.0-31.el7_2.4.1.x86_64 qemu-kvm-tools-ev-2.3.0-31.el7_2.4.1.x86_64 qemu-img-ev-2.3.0-31.el7_2.4.1.x86_64 qemu-kvm-common-ev-2.3.0-31.el7_2.4.1.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch and: md5sum /usr/share/qemu-kvm/rhel6-virtio.rom 281bb91bcb083a32b5db5059f51ead24 /usr/share/qemu-kvm/rhel6-virtio.rom md5sum /usr/share/ipxe/1af41000.rom 281bb91bcb083a32b5db5059f51ead24 /usr/share/ipxe/1af41000.rom Trying to migrate freshly powered-on VM from CentOS6 to CentOS 7. Migration fails as before with: 2016-01-14 14:00:18.078+0000: starting up libvirt version: 1.2.17, package: 13.el7 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-20-16:24:10, worker1.bsys.centos.org), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7_2.4.1) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name vm-dtaffin-26051 -S -machine rhel6.5.0,accel=kvm,usb=off -cpu Westmere -m 1024 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 35ef0ad2-0bc2-45aa-86b6-2e85c28259df -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=6-7.el6.centos.12.3,serial=32393735-3733-5A43-3332-303235575245,uuid=35ef0ad2-0bc2-45aa-86b6-2e85c28259df -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vm-dtaffin-26051/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-01-14T14:00:17,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000002-0002-0002-0002-00000000037c/0bb94892-4574-4d7f-a514-478999af10a0/images/1c4c1f5e-4eb6-4e99-83e2-5d89b7d8dda9/c8757869-c9dc-4d17-81a3-2f44abd47f15,if=none,id=drive-virtio-disk0,format=raw,serial=1c4c1f5e-4eb6-4e99-83e2-5d89b7d8dda9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:03:00:26,bus=pci.0,addr=0x3,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/35ef0ad2-0bc2-45aa-86b6-2e85c28259df.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/35ef0ad2-0bc2-45aa-86b6-2e85c28259df.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=10.76.98.163,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on Domain id=2 is tainted: hook-script 2016-01-14T14:00:18.621239Z qemu-kvm: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x10000 in != 0x40000: Invalid argument 2016-01-14T14:00:18.621321Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2016-01-14T14:00:18.621415Z qemu-kvm: load of migration failed: Invalid argument 2016-01-14 14:00:18.646+0000: shutting down
Dominique, while QEMU guys are investigating, a potential workaround might be to hot unplug NIC and reconnect after migration. It may be tedious to do this for many VMs, but worth a try and it would perhaps be useful for important stuff
(In reply to dominique.taffin from comment #19) > I think I found it: > > > The PXE ROMs have different md5sums. > > I copied the PXE ROM file from CentOS 6 to the CentOS 7 machine: > > CentOS 6 Source: /usr/share/gpxe/virtio-net.rom > > CentOS 7.2 Destinations (yes, no symlink, but 2 copies for testing): > /usr/share/qemu-kvm/rhel6-virtio.rom > /usr/share/ipxe/1af41000.rom Hi Dominique, I'm confused by that line, those should be different files. Can you show me (from your RHEL7 box): ls -l /usr/share/qemu-kvm/rhel6-virtio.rom ls -l /usr/share/ipxe/1af41000.rom rpm -qf /usr/share/qemu-kvm/rhel6-virtio.rom rpm -qf /usr/share/ipxe/1af41000.rom the rhel6-virtio.rom should be a nice tiny 53kb, the 1af41000 should be 256k. If somehow the rhel6-virtio.rom has grown, then that would explain what's going on. Dave > and the migration seems to work. I will have to test it some more with other > VMs.
(In reply to Francesco Romani from comment #23) > (In reply to Michael Burman from comment #0) > > Created attachment 1108584 [details] > > rhev-h logs > > > > Description of problem: > > Migration failed with error - libvirtError: internal error: early end of > > file from monitor: possible problem: > > 2015-12-22T07:29:04.742599Z qemu-kvm: warning: CPU(s) not present in any > > NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > 2015-12-22T07:29:04.742812Z qemu-kvm: warning: All CPU(s) up to maxcpus > > should be described in NUMA config > > In order to debug this ^^^^^^^^^^^^^^^^ > > could you please share more vdsm logs, showing how this VM (vmId': > u'22fa763b-3ea5-473f-8621-3eefeb51c350) was created? Specifically, I'd like > to see the logs regarding VM.create verb. > > A Vdsm log snippet which shows both VM creation and (failed) migration would > be very nice. Hi Francesco, I can't share more vdsm logs. It's been to long since reported and this setup and logs no longer available.
Hello, (In reply to Dr. David Alan Gilbert from comment #27) > Hi Dominique, > I'm confused by that line, those should be different files. Can you show > me (from your RHEL7 box): > ls -l /usr/share/qemu-kvm/rhel6-virtio.rom > ls -l /usr/share/ipxe/1af41000.rom > rpm -qf /usr/share/qemu-kvm/rhel6-virtio.rom > rpm -qf /usr/share/ipxe/1af41000.rom > > the rhel6-virtio.rom should be a nice tiny 53kb, the 1af41000 should be > 256k. > If somehow the rhel6-virtio.rom has grown, then that would explain what's > going on. > sure, here the result: # cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) # ls -l /usr/share/qemu-kvm/rhel6-virtio.rom lrwxrwxrwx. 1 root root 28 14. Jan 14:49 /usr/share/qemu-kvm/rhel6-virtio.rom -> /usr/share/ipxe/1af41000.rom # ls -l /usr/share/ipxe/1af41000.rom -rw-r--r--. 1 root root 262144 20. Nov 07:28 /usr/share/ipxe/1af41000.rom # rpm -qf /usr/share/qemu-kvm/rhel6-virtio.rom qemu-kvm-ev-2.3.0-31.el7_2.4.1.x86_64 # rpm -qf /usr/share/ipxe/1af41000.rom ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
(In reply to dominique.taffin from comment #29) > Hello, > > (In reply to Dr. David Alan Gilbert from comment #27) > > Hi Dominique, > > I'm confused by that line, those should be different files. Can you show > > me (from your RHEL7 box): > > ls -l /usr/share/qemu-kvm/rhel6-virtio.rom > > ls -l /usr/share/ipxe/1af41000.rom > > rpm -qf /usr/share/qemu-kvm/rhel6-virtio.rom > > rpm -qf /usr/share/ipxe/1af41000.rom > > > > the rhel6-virtio.rom should be a nice tiny 53kb, the 1af41000 should be > > 256k. > > If somehow the rhel6-virtio.rom has grown, then that would explain what's > > going on. > > > > sure, here the result: Thanks, > # cat /etc/redhat-release > CentOS Linux release 7.2.1511 (Core) > > # ls -l /usr/share/qemu-kvm/rhel6-virtio.rom > lrwxrwxrwx. 1 root root 28 14. Jan 14:49 > /usr/share/qemu-kvm/rhel6-virtio.rom -> /usr/share/ipxe/1af41000.rom > > # ls -l /usr/share/ipxe/1af41000.rom > -rw-r--r--. 1 root root 262144 20. Nov 07:28 /usr/share/ipxe/1af41000.rom > > # rpm -qf /usr/share/qemu-kvm/rhel6-virtio.rom > qemu-kvm-ev-2.3.0-31.el7_2.4.1.x86_64 > > # rpm -qf /usr/share/ipxe/1af41000.rom > ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch Well that at least half explains the problem - that /usr/share/qemu-kvm/rhel6-virtio.rom should *NOT* be a link; Now we just have to figure out how it ended up that way. I just donwloaded the qemu-kvm-ev from: http://cbs.centos.org/kojifiles/packages/qemu-kvm-ev/2.3.0/31.el7_2.4.1/x86_64/ and did: rpm2cpio http://cbs.centos.org/kojifiles/packages/qemu-kvm-ev/2.3.0/31.el7_2.4.1/x86_64/qemu-kvm-ev-2.3.0-31.el7_2.4.1.x86_64.rpm | cpio -t -v and it shows: -rwxr-xr-x 1 root root 53248 Dec 18 12:13 ./usr/share/qemu-kvm/rhel6-virtio.rom so that's OK. Can you confirm: 1) Exactly how you installed this host, 2) Which repo you got the qemu-kvm-ev from (I think yum info qemu-kvm-ev should show you) Dave
(In reply to Dr. David Alan Gilbert from comment #30) > Can you confirm: > 1) Exactly how you installed this host, > 2) Which repo you got the qemu-kvm-ev from (I think yum info qemu-kvm-ev > should show you) regarding 1) CentOS minimal installation over PXE with repo configuration (OS, Updated, EPEL, oVirt repo - all company internal mirrors), manually added to oVirt engine - which then installes the relevant packages like qemu-kvm-ev. regarding 2) all the packages are mirrored from the official ovirt download site (http://resources.ovirt.org/pub/ovirt-3.6/) to a local repository, including dependent gluster packages from gluster.org We do not have any custom build packages, just "official" once. best, Dominique
(In reply to dominique.taffin from comment #31) > (In reply to Dr. David Alan Gilbert from comment #30) > > Can you confirm: > > 1) Exactly how you installed this host, > > 2) Which repo you got the qemu-kvm-ev from (I think yum info qemu-kvm-ev > > should show you) > > regarding 1) > CentOS minimal installation over PXE with repo configuration (OS, Updated, > EPEL, oVirt repo - all company internal mirrors), manually added to oVirt > engine - which then installes the relevant packages like qemu-kvm-ev. OK > regarding 2) > all the packages are mirrored from the official ovirt download site > (http://resources.ovirt.org/pub/ovirt-3.6/) to a local repository, including > dependent gluster packages from gluster.org OK, thanks; I've checked the packages on there as well, and they look fine as well. > We do not have any custom build packages, just "official" once. Thanks for the info; we'll keep trying to figure out how that's happening. > best, > Dominique
I tried installing rhev-h (20151218.1.iso ) and the file looks right, and I also then upgraded that (20151218.2) and it still looks right. I did the upgrade via boot off CD.
(In reply to Dr. David Alan Gilbert from comment #33) > I tried installing rhev-h (20151218.1.iso ) and the file looks right, and I > also then upgraded that (20151218.2) and it still looks right. I did the > upgrade via boot off CD. I also tried an upgrade from rhev-m; still looks right. What I've not tried is rhel6->rhel7 host upgrade - someone who knows the rhev-m side better than me needs to try and follow that to recreate it.
I tried to reproduced this report and maybe succeeded, but this time failing with other error. - Before upgrade(3.5) migration was working, after upgrade to 3.6(tested on cluster 3.5 and on cluster 3.6 after upgrade), migration failing. Not sure if it is the same issue, but the same steps reproduction as described in the report ^^ - Please contact me for setup details, will leave it for one day. Thanks. - Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160105.1.el7ev) ovirt-node-3.2.3-30.el7.noarch vdsm-4.16.32-1.el7ev.x86_64 libvirt-1.2.17-13.el7_2.2.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 kernel - 3.10.0-327.3.1.el7.x86_64 >> - RHEV Hypervisor - 7.2 - 20160113.0.el7ev ovirt-node-3.6.1-3.0.el7ev.noarch vdsm-4.17.17-0.el7ev libvirt-1.2.17-13.el7_2.2.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 3.10.0-327.4.4.el7.x86_64 - Red Hat Enterprise Linux Server release 7.2 (Maipo) vdsm-4.16.32-1.el7ev.x86_64 libvirt-1.2.17-13.el7_2.2.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 kerenl - 3.10.0-327.el7.x86_64 >> - Red Hat Enterprise Linux Server release 7.2 (Maipo) 3.10.0 - 327.8.1.el7.x86_64 vdsm-4.17.17-0.el7ev libvirt-1.2.17-13.el7_2.2.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.6.x86_64 vdsm log error from source(rhev-h) : Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 211, in _recover self._destServer.destroy(self._vm.id) AttributeError: 'SourceThread' object has no attribute '_destServer' Thread-499::DEBUG::2016-01-19 08:19:14,160::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4301361260, "0974fb9c-131f-4ee4-a428-1d8172e489a2": {"status": "Migration Source"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|0974fb9c-131f-4ee4-a428-1d8172e489a2"} Thread-499::ERROR::2016-01-19 08:19:14,160::migration::310::virt.vm::(run) vmId=`0974fb9c-131f-4ee4-a428-1d8172e489a2`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 278, in run self._setupVdsConnection() File "/usr/share/vdsm/virt/migration.py", line 143, in _setupVdsConnection client = self._createClient(port) File "/usr/share/vdsm/virt/migration.py", line 130, in _createClient self.remoteHost, int(port), sslctx) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1267, in create_connected_socket sock.connect((host, port)) File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 181, in connect self.socket.connect(addr) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) gaierror: [Errno -2] Name or service not known
(In reply to Michael Burman from comment #35) add the setup details then, or logs, please. thanks
Hi Michal , i'm trying to load logs, but have issues with bugzilla for some reason. I will provide setup details in private. I think i'm failing to migrate because of this BZ 1232338, because rhev-h server is set to local host after upgrade and reboot. I will fix it and see if i can reproduce the original error ^^
I can't reproduce this report. After fixing the local host issue, migration is successful after upgrade.
dominique: I think the right way to clean up that box is to reinstall your qemu-kvm-ev package, you should then find the /usr/share/qemu/rhel6-virtio.rom is a nice 53k file. If you do that, then you should be able to migrate from rhel6 hosts into that box; however, if you've got VMs that are running on the box with the messed up ROM you wont be able to migrate them out. You need to shut the guest down, fix the qemu package install and then restart them. Dave
mburman's errors have two different cases: 2015-12-21T15:42:35.446059Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2015-12-21T15:42:35.446247Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config 2015-12-22 07:29:04.675+0000: 18144: info : libvirt version: 1.2.17, package: 13.el7_2.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-11-23-07:46:04, x86-019.build.eng.bos.redhat.com) 2015-12-22 07:29:04.675+0000: 18144: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb75410b1e0 2015-12-22T07:29:04.742599Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2015-12-22T07:29:04.742812Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config 2015-12-22T07:29:05.023584Z qemu-kvm: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x20000 in != 0x40000: Invalid argument 2015-12-22T07:29:05.023620Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2015-12-22T07:29:05.023722Z qemu-kvm: load of migration failed: Invalid argument So the NUMA error is on both sides and needs looking at probably; but also note he's using rhel7.2 machine types Now, for rhel7 machine types we have: lrwxrwxrwx. 1 root root 20 Dec 18 12:51 pxe-virtio.rom -> ../ipxe/1af41000.rom -rw-r--r--. 1 root root 262144 May 6 2015 /usr/share/ipxe/1af41000.rom which comes from: ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch I can see that back in 2013 we had ipxe-roms that were smaller (66k) - os maybe this is what's happening; an old ipxe-roms ?
OK, I think I see the problem; we're shipping the wrong ipxe roms in the rhev-h image and the rom is ~66k - i.e. rounds up to 128k (which is where we get the 0x20000) but if you install rhel you get the latest rhel ipxe rom which is 0x40000.
adding lzap and Mike, as they might have a clue.
closing since we identified the problem is caused by running a different ipxe on one of the hypervisors (source), likely getting pulled in by Katello or other means, making the VMs incompatible with the target machine
Dominique, do you have an idea how you have ended up with two different iPXE roms on the two machines?
Hello guys, we use iPXE to build our bootdisk ISO which are used for provisioning in non-PXE or non-DHCP environments. And since we have customers with modern hardware which is not supported in iPXE from RHEL, we build latest and greatest iPXE into Satellite 6. It is not supported to have RHEV hypervisor on the same server with Satellite 6, we were not expecting problems. But this was not the case even. After reading this BZ, we should perhaps consider changing this policy of rebasing and ask our platform team to backport new drivers or fixes into the RHEL iPXE branch. This was not the root cause of this bug, but it puts some light into this grey area.