Description of problem: Can not migrate HE VM, not via WebAdmin and not via CLI(hosted-engine --set-maintenance --mode=local), on the source host I can see: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: Cannot access storage file '/var/run/vdsm/storage/7a9a1478-6a7c-465b-b0e5-598205be7faf/76b07256-29ce-4634-9f4a-db35d3bc61c6/0fa79d54-74ac-41bb-9936-c6ecdd947150' (as uid:107, gid:107): No such file or directory Version-Release number of selected component (if applicable): # rpm -qa | grep vdsm vdsm-client-4.20.9-1.el7ev.noarch vdsm-4.20.9-1.el7ev.x86_64 vdsm-api-4.20.9-1.el7ev.noarch vdsm-hook-openstacknet-4.20.9-1.el7ev.noarch vdsm-hook-vhostmd-4.20.9-1.el7ev.noarch vdsm-jsonrpc-4.20.9-1.el7ev.noarch vdsm-hook-ethtool-options-4.20.9-1.el7ev.noarch vdsm-yajsonrpc-4.20.9-1.el7ev.noarch vdsm-python-4.20.9-1.el7ev.noarch vdsm-hook-vmfex-dev-4.20.9-1.el7ev.noarch vdsm-hook-vfio-mdev-4.20.9-1.el7ev.noarch vdsm-http-4.20.9-1.el7ev.noarch vdsm-hook-fcoe-4.20.9-1.el7ev.noarch # rpm -qa | grep hosted ovirt-hosted-engine-ha-2.2.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.0-2.el7ev.noarch How reproducible: Always We encountered the same issue in our automation environment Steps to Reproduce: 1. Deploy HE environment 2. Try to migrate HE VM via WebAdmin 3. Actual results: HE VM failed to migrate Expected results: HE VM succeeds to migrate Additional info:
Created attachment 1365499 [details] logs
Does it happen with the node zero deployment or the other way or both?
- What kind of storage was used? - Can you check if the file is there? - Can you check (using mount) if the storage domain is properly mounted?
also include vdsm.log from the time when the HE VM was started on that host as it contains the exact libvirt definition. And qemu log would be nice too.
- NFS - The file does not exist under /var/run directory but exists under mount directory # cd /var/run/vdsm/storage/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/09ad8671-2448-46b0-8f83-57d821d3fff8/ -bash: cd: /var/run/vdsm/storage/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/09ad8671-2448-46b0-8f83-57d821d3fff8/: No such file or directory # pwd /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__he__2__he/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/images/09ad8671-2448-46b0-8f83-57d821d3fff8 # ll total 4009548 -rw-rw----. 1 vdsm kvm 53687091200 Dec 12 12:20 61d3bcf5-3415-4539-a835-50eea4f8c2b8 -rw-rw----. 1 vdsm kvm 1048576 Dec 12 11:11 61d3bcf5-3415-4539-a835-50eea4f8c2b8.lease -rw-r--r--. 1 vdsm kvm 285 Dec 11 13:02 61d3bcf5-3415-4539-a835-50eea4f8c2b8.meta - I can see mount of HE storage on the host vserver-nas01.qa.lab.tlv.redhat.com:/nas01/ge_he_2_he on /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__he__2__he type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.35.66.202,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.35.66.202) QEMU logs 2017-12-12T08:48:48.132890Z qemu-kvm: terminating on signal 15 from pid 1067 (<unknown process>) 2017-12-12 09:14:57.575+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-11-06-09:44:21, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.12), hostname: puma29.scl.lab.tlv.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-67-HostedEngine/master-key.aes -machine rhel6.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere,x2apic=on,hypervisor=on -m 16384 -realtime mlock=off -smp 4,maxcpus=6,sockets=6,cores=1,threads=1 -uuid 24128148-d2b8-4c5b-959c-10f55a43676d -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.4-18.el7,serial=30393036-3539-5A43-3332-313142363550,uuid=24128148-d2b8-4c5b-959c-10f55a43676d' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-67-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-12-12T09:14:56,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/09ad8671-2448-46b0-8f83-57d821d3fff8/61d3bcf5-3415-4539-a835-50eea4f8c2b8,format=raw,if=none,id=drive-virtio-disk0,serial=09ad8671-2448-46b0-8f83-57d821d3fff8,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:27:30,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/24128148-d2b8-4c5b-959c-10f55a43676d.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/24128148-d2b8-4c5b-959c-10f55a43676d.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/24128148-d2b8-4c5b-959c-10f55a43676d.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x2 -incoming defer -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on 2017-12-12 09:14:57.583+0000: 11993: info : libvirt version: 3.2.0, package: 14.el7_4.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-11-06-09:44:21, x86-039.build.eng.bos.redhat.com) 2017-12-12 09:14:57.583+0000: 11993: info : hostname: puma29.scl.lab.tlv.redhat.com 2017-12-12 09:14:57.583+0000: 11993: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7ffaec114f80 2017-12-12T09:14:57.814558Z qemu-kvm: -chardev pty,id=charconsole0: char device redirected to /dev/pts/0 (label charconsole0)
Created attachment 1366545 [details] hosted-engine dumpxml
Created attachment 1366624 [details] logs with debug mode Updated logs with DEBUG mode, I see that problem re-occurred after we had some short connectivity problem with VDSM. Also strange that we have a lot lines for the same JSONRPC message at the same time Thread-6::DEBUG::2017-12-12 14:14:03,601::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response Thread-6::DEBUG::2017-12-12 14:14:03,609::util::374::ovirt_hosted_engine_ha.lib.util._EventDistributor::(__log_debug) VDSM jsonrpc connection is not ready Thread-6::ERROR::2017-12-12 14:14:03,609::util::274::ovirt_hosted_engine_ha.lib.util._EventDistributor::(_thread_func) Couldn't connect to VDSM within 15 seconds Thread-4::DEBUG::2017-12-12 14:14:07,384::cpu_load_no_engine::76::cpu_load_no_engine.CpuLoadNoEngine::(refresh_ticks) Ticks: total=199397071, busy=1876509 Thread-4::DEBUG::2017-12-12 14:14:07,437::util::374::cpu_load_no_engine.CpuLoadNoEngine::(__log_debug) Creating a new json-rpc connection to VDSM Thread-4::DEBUG::2017-12-12 14:14:07,569::util::374::cpu_load_no_engine.CpuLoadNoEngine::(__log_debug) Waiting for VDSM to connect Thread-4::DEBUG::2017-12-12 14:14:08,706::util::374::cpu_load_no_engine.CpuLoadNoEngine::(__log_debug) Waiting for VDSM to connect Client localhost:54321::DEBUG::2017-12-12 14:14:09,048::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,049::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,051::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,052::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,052::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,052::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,053::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,053::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,055::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,056::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,057::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,058::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,058::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,059::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,060::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,061::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,061::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect Client localhost:54321::DEBUG::2017-12-12 14:14:09,062::stomp::617::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect ...more
Looks like vdsm has (died and) restarted and it removed all symlinks during cleanup on start. At the same time, broker, for some reason, failed to restart correctly and restore symlinks on it's startup. This should not happen very often, at least we do not expect vdsm to restart at all. And in my tests broker was always restarting successfully alongside with vdsm. Therefore, even as we have a good reproducer, i don't think it is a blocker for 4.2, as it should happen rarely.
So, the VM uses an NFS disk with this path: <source file='/var/run/vdsm/storage/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/09ad8671-2448-46b0-8f83-57d821d3fff8/61d3bcf5-3415-4539-a835-50eea4f8c2b8'/> And the path does not exist on the destination when vdsm dies and is restarted. How comes VDSM does not prepare the symlink during migration? And neither does engine it seems since it also fails when engine handles the migration.
(In reply to Martin Sivák from comment #9) > So, the VM uses an NFS disk with this path: > > <source > file='/var/run/vdsm/storage/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/09ad8671- > 2448-46b0-8f83-57d821d3fff8/61d3bcf5-3415-4539-a835-50eea4f8c2b8'/> > > > And the path does not exist on the destination when vdsm dies and is > restarted. How comes VDSM does not prepare the symlink during migration? And > neither does engine it seems since it also fails when engine handles the > migration. It totally should, nothing has changed in this regard. Does this means that any migration involving NFS disks is broken? Or is it only hosted engine (if so, why is it special?) Also, which was the last 4.20.z version that worked?
after reviewing the conditions it seems as a duplicate of bug 1522901 please retry with fixed build
(In reply to Martin Sivák from comment #9) > So, the VM uses an NFS disk with this path: > > <source > file='/var/run/vdsm/storage/cd6f0e9d-07fb-4751-8eb4-69a4d1497fc4/09ad8671- > 2448-46b0-8f83-57d821d3fff8/61d3bcf5-3415-4539-a835-50eea4f8c2b8'/> > > > And the path does not exist on the destination when vdsm dies and is > restarted. What do you mean does not exist on the destination? > How comes VDSM does not prepare the symlink during migration? And > neither does engine it seems since it also fails when engine handles the > migration. Vdsm create the symlinks in /run/vdsm/storage when disks are prepared. If the symlink was not created, maybe the disk was not prepared.
With new VDSM I have another error on destination host 2017-12-12 18:57:41,973+0200 INFO (vm/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') Enabling drive monitoring (drivemonitor:53) 2017-12-12 18:57:41,974+0200 ERROR (vm/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') The vm start process failed (vm:914) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2632, in _run self._devices = self._make_devices() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2595, in _make_devices if dev.type != hwclass.DISK: AttributeError: type
(In reply to Artyom from comment #19) > With new VDSM I have another error on destination host > 2017-12-12 18:57:41,973+0200 INFO (vm/346b8d80) [virt.vm] > (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') Enabling drive monitoring > (drivemonitor:53) > 2017-12-12 18:57:41,974+0200 ERROR (vm/346b8d80) [virt.vm] > (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') The vm start process failed > (vm:914) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843, in > _startUnderlyingVm > self._run() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2632, in _run > self._devices = self._make_devices() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2595, in > _make_devices > if dev.type != hwclass.DISK: > AttributeError: type And? How does that affect migration? Is that a different BZ?
please supply source and destination vdsm log containing VM start and the migration
Created attachment 1367106 [details] new_vdsm_logs Source host alma07 2017-12-13 08:51:08,912+0200 ERROR (migsrc/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') migration destination error: Error creating the requested VM (migration:290) Destination host master-vds10 2017-12-13 08:50:32,957+0200 ERROR (vm/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') The vm start process failed (vm:914) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2632, in _run self._devices = self._make_devices() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2595, in _make_devices if dev.type != hwclass.DISK: AttributeError: type
Migration failed because it can not start VM on the destination host.
(In reply to Artyom from comment #22) > Created attachment 1367106 [details] > new_vdsm_logs > > Source host alma07 > 2017-12-13 08:51:08,912+0200 ERROR (migsrc/346b8d80) [virt.vm] > (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') migration destination error: > Error creating the requested VM (migration:290) > > Destination host master-vds10 > 2017-12-13 08:50:32,957+0200 ERROR (vm/346b8d80) [virt.vm] > (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') The vm start process failed > (vm:914) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843, in > _startUnderlyingVm > self._run() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2632, in _run > self._devices = self._make_devices() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2595, in > _make_devices > if dev.type != hwclass.DISK: > AttributeError: type Unfortunately this is a new bug, exposed by https://gerrit.ovirt.org/#/c/85200/ Fix available: https://gerrit.ovirt.org/#/c/85370/
*** Bug 1524989 has been marked as a duplicate of this bug. ***
Checked on: # rpm -qa | grep vdsm vdsm-hook-vmfex-dev-4.20.9.2-1.el7ev.noarch vdsm-hook-openstacknet-4.20.9.2-1.el7ev.noarch vdsm-api-4.20.9.2-1.el7ev.noarch vdsm-jsonrpc-4.20.9.2-1.el7ev.noarch vdsm-hook-fcoe-4.20.9.2-1.el7ev.noarch vdsm-hook-vhostmd-4.20.9.2-1.el7ev.noarch vdsm-yajsonrpc-4.20.9.2-1.el7ev.noarch vdsm-4.20.9.2-1.el7ev.x86_64 vdsm-client-4.20.9.2-1.el7ev.noarch vdsm-python-4.20.9.2-1.el7ev.noarch vdsm-hook-ethtool-options-4.20.9.2-1.el7ev.noarch vdsm-http-4.20.9.2-1.el7ev.noarch vdsm-hook-vfio-mdev-4.20.9.2-1.el7ev.noarch # rpm -qa | grep hosted ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch I still have the issue that I described in the bug description 2017-12-14 13:01:14,866+0200 ERROR (migsrc/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') Failed to migrate (migration:455) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run self._startUnderlyingMigration(time.time()) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 510, in _startUnderlyingMigration self._perform_with_downtime_thread(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 579, in _perform_with_downtime_thread self._perform_migration(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 528, in _perform_migration self._migration_flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1679, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: Cannot access storage file '/var/run/vdsm/storage/7a9a1478-6a7c-465b-b0e5-598205be7faf/76b07256-29ce-4634-9f4a-db35d3bc61c6/0fa79d54-74ac-41bb-9936-c6ecdd947150' (as uid:107, gid:107): No such file or directory 2017-12-14 13:01:14,905+0200 INFO (jsonrpc/3) [api.virt] START getMigrationStatus() from=::ffff:10.35.72.159,58386 (api:46) 2017-12-14 13:01:14,905+0200 INFO (jsonrpc/3) [api.virt] FINISH getMigrationStatus return={'status': {'message': 'Done', 'code': 0}, 'migrationStats': {'status': {'message': 'Fatal error during migration', 'code': 12}, 'progress': 0}} from=::ffff:10.35.72.159,58386 (api:52) I think it does not connect to vdsmd restart: # date Thu Dec 14 13:08:29 IST 2017 # systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2017-12-14 09:22:05 IST; 3h 46min ago ovirt-ha-broker log: MainThread::INFO::2017-12-14 12:56:07,182::storage_backends::355::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Preparing images MainThread::INFO::2017-12-14 12:56:07,182::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::DEBUG::2017-12-14 12:56:07,182::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,186::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,193::image::104::ovirt_hosted_engine_ha.lib.image.Image::(get_images_list) getImagesList: ['45276ef6-0b41-4cde-a614-6651d0984302', '2b3ea455-941f-4b84-8821-62431afeec20', '76b07256-29ce-4634-9f4a-db35d3bc61c6', 'a405d488-cc8e-490d-a2cf-aa7507306453', '221b120c-f326-4340-9e78-61c9e55bace9', '0c7c0bfc-e32f-42f0-aad0-d4526df918ad'] MainThread::DEBUG::2017-12-14 12:56:07,193::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,201::image::140::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) ['46e91ef3-1ee7-459f-bc1e-d5a79fa9014d'] MainThread::DEBUG::2017-12-14 12:56:07,201::image::156::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Prepare image 00000000-0000-0000-0000-000000000000 7a9a1478-6a7c-465b-b0e5-598205be7faf 45276ef6-0b41-4cde-a614-6651d0984302 46e91ef3-1ee7-459f-bc1e-d5a79fa9014d MainThread::DEBUG::2017-12-14 12:56:07,201::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,216::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,224::image::140::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) ['72297b7d-ab36-4511-acfe-28e4d4a7e165'] MainThread::DEBUG::2017-12-14 12:56:07,224::image::156::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Prepare image 00000000-0000-0000-0000-000000000000 7a9a1478-6a7c-465b-b0e5-598205be7faf 2b3ea455-941f-4b84-8821-62431afeec20 72297b7d-ab36-4511-acfe-28e4d4a7e165 MainThread::DEBUG::2017-12-14 12:56:07,224::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,240::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,247::image::140::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) ['0fa79d54-74ac-41bb-9936-c6ecdd947150'] MainThread::DEBUG::2017-12-14 12:56:07,247::image::156::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Prepare image 00000000-0000-0000-0000-000000000000 7a9a1478-6a7c-465b-b0e5-598205be7faf 76b07256-29ce-4634-9f4a-db35d3bc61c6 0fa79d54-74ac-41bb-9936-c6ecdd947150 MainThread::DEBUG::2017-12-14 12:56:07,247::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,263::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,270::image::140::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) ['bbb4b206-7e37-46d0-b80a-4d6161c99c94'] MainThread::DEBUG::2017-12-14 12:56:07,270::image::156::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Prepare image 00000000-0000-0000-0000-000000000000 7a9a1478-6a7c-465b-b0e5-598205be7faf a405d488-cc8e-490d-a2cf-aa7507306453 bbb4b206-7e37-46d0-b80a-4d6161c99c94 MainThread::DEBUG::2017-12-14 12:56:07,270::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,286::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,293::image::140::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) ['f864d1d1-f32a-4bea-bd28-c6bcc1ccd51c'] MainThread::DEBUG::2017-12-14 12:56:07,293::image::156::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Prepare image 00000000-0000-0000-0000-000000000000 7a9a1478-6a7c-465b-b0e5-598205be7faf 221b120c-f326-4340-9e78-61c9e55bace9 f864d1d1-f32a-4bea-bd28-c6bcc1ccd51c MainThread::DEBUG::2017-12-14 12:56:07,294::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,309::stompreactor::479::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2017-12-14 12:56:07,316::image::140::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) ['fe1a3c75-5581-46fe-9b36-5f98bac0e03c'] MainThread::DEBUG::2017-12-14 12:56:07,317::image::156::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Prepare image 00000000-0000-0000-0000-000000000000 7a9a1478-6a7c-465b-b0e5-598205be7faf 0c7c0bfc-e32f-42f0-aad0-d4526df918ad fe1a3c75-5581-46fe-9b36-5f98bac0e03c
Looks I found reproduce: HE VM runs on the host_1 1) Restart ovirt-ha-broker Host has correct folder under /var/run/vdsm/storage # ll total 24 lrwxrwxrwx. 1 vdsm kvm 175 Dec 7 14:35 0c7c0bfc-e32f-42f0-aad0-d4526df918ad -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/0c7c0bfc-e32f-42f0-aad0-d4526df918ad lrwxrwxrwx. 1 vdsm kvm 175 Dec 7 14:35 221b120c-f326-4340-9e78-61c9e55bace9 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/221b120c-f326-4340-9e78-61c9e55bace9 lrwxrwxrwx. 1 vdsm kvm 175 Dec 7 14:35 2b3ea455-941f-4b84-8821-62431afeec20 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/2b3ea455-941f-4b84-8821-62431afeec20 lrwxrwxrwx. 1 vdsm kvm 175 Dec 12 15:03 45276ef6-0b41-4cde-a614-6651d0984302 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/45276ef6-0b41-4cde-a614-6651d0984302 lrwxrwxrwx. 1 vdsm kvm 175 Dec 14 13:11 76b07256-29ce-4634-9f4a-db35d3bc61c6 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/76b07256-29ce-4634-9f4a-db35d3bc61c6 lrwxrwxrwx. 1 vdsm kvm 175 Dec 12 15:03 a405d488-cc8e-490d-a2cf-aa7507306453 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/a405d488-cc8e-490d-a2cf-aa7507306453 2) Migrate HE VM - PASS 3) Recheck /var/run/vdsm/storage # ll total 20 lrwxrwxrwx. 1 vdsm kvm 175 Dec 7 14:35 0c7c0bfc-e32f-42f0-aad0-d4526df918ad -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/0c7c0bfc-e32f-42f0-aad0-d4526df918ad lrwxrwxrwx. 1 vdsm kvm 175 Dec 7 14:35 221b120c-f326-4340-9e78-61c9e55bace9 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/221b120c-f326-4340-9e78-61c9e55bace9 lrwxrwxrwx. 1 vdsm kvm 175 Dec 7 14:35 2b3ea455-941f-4b84-8821-62431afeec20 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/2b3ea455-941f-4b84-8821-62431afeec20 lrwxrwxrwx. 1 vdsm kvm 175 Dec 12 15:03 45276ef6-0b41-4cde-a614-6651d0984302 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/45276ef6-0b41-4cde-a614-6651d0984302 lrwxrwxrwx. 1 vdsm kvm 175 Dec 12 15:03 a405d488-cc8e-490d-a2cf-aa7507306453 -> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/7a9a1478-6a7c-465b-b0e5-598205be7faf/images/a405d488-cc8e-490d-a2cf-aa7507306453 Looks like image 76b07256-29ce-4634-9f4a-db35d3bc61c6 dissaperars after HE VM migration.
please attach vdsm logs, both source and destination sides of the failed migration.
so what did you reproduce, actually? In comment #28 you say that the migration passed - so is this new issue for another bug?
In https://bugzilla.redhat.com/show_bug.cgi?id=1524119#c28 I gave exactly reproduce steps for initial problem with HE VM migration. I passed only once after ovirt-ha-broker restart, if I will try to migrate HE VM again from the same host, I will receive Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run self._startUnderlyingMigration(time.time()) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 510, in _startUnderlyingMigration self._perform_with_downtime_thread(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 579, in _perform_with_downtime_thread self._perform_migration(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 528, in _perform_migration self._migration_flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1679, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: Cannot access storage file '/var/run/vdsm/storage/7a9a1478-6a7c-465b-b0e5-598205be7faf/76b07256-29ce-4634-9f4a-db35d3bc61c6/0fa79d54-74ac-41bb-9936-c6ecdd947150' (as uid:107, gid:107): No such file or directory You can find logs with the problem under https://bugzilla.redhat.com/attachment.cgi?id=1366624. The fix http://gerrit.ovirt.org/85372 does not fix the initial bug(please see https://bugzilla.redhat.com/show_bug.cgi?id=1524119#c0)
(In reply to Artyom from comment #31) > In https://bugzilla.redhat.com/show_bug.cgi?id=1524119#c28 I gave exactly > reproduce steps for initial problem with HE VM migration. > > I passed only once after ovirt-ha-broker restart, if I will try to migrate > HE VM again from the same host, I will receive > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, > in _regular_run > self._startUnderlyingMigration(time.time()) > File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 510, > in _startUnderlyingMigration > self._perform_with_downtime_thread(duri, muri) > File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 579, > in _perform_with_downtime_thread > self._perform_migration(duri, muri) > File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 528, > in _perform_migration > self._migration_flags) > File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, > in f > ret = attr(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line > 126, in wrapper > ret = f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in wrapper > return func(inst, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1679, in > migrateToURI3 > if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', > dom=self) > libvirtError: Cannot access storage file > '/var/run/vdsm/storage/7a9a1478-6a7c-465b-b0e5-598205be7faf/76b07256-29ce- > 4634-9f4a-db35d3bc61c6/0fa79d54-74ac-41bb-9936-c6ecdd947150' (as uid:107, > gid:107): No such file or directory > > > You can find logs with the problem under > https://bugzilla.redhat.com/attachment.cgi?id=1366624. > > The fix http://gerrit.ovirt.org/85372 does not fix the initial bug(please > see https://bugzilla.redhat.com/show_bug.cgi?id=1524119#c0) Thanks. Unfortunately in https://bugzilla.redhat.com/attachment.cgi?id=1366624 I see only one vdsm log, the source side if I'm not mistaken, so I can't really tell why it fails. Best option would be jave one environment on which it easily reproduce, I don't have it handy.
Surprising, I managed to have access to the environment on which this reproduces and it seems the real cause of error is actually 2017-12-14 13:23:06,764+0200 ERROR (vm/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') Failed to setup device vnc (vm:2519) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2516, in _setup_devices dev_object.setup() File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/graphics.py", line 65, in setup display_network = self.specParams['displayNetwork'] KeyError: 'displayNetwork' 2017-12-14 13:23:06,765+0200 ERROR (vm/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') The vm start process failed (vm:914) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2644, in _run self._setup_devices() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2516, in _setup_devices dev_object.setup() File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/graphics.py", line 65, in setup display_network = self.specParams['displayNetwork'] KeyError: 'displayNetwork' 2017-12-14 13:23:06,765+0200 INFO (vm/346b8d80) [virt.vm] (vmId='346b8d80-b4f2-4a81-bf60-a662220c27a4') Changed state to Down: 'displayNetwork' (code=1) (vm:1633) This is surely one issue - not sure it is the last step needed, and not sure this means it is always broken. Still, needs to be fixed, and https://gerrit.ovirt.org/85446 wants to do that.
With the patch https://gerrit.ovirt.org/85446. migration works well.
Verified on vdsm-4.20.9.3-1.el7ev.x86_64
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
*** Bug 1502768 has been marked as a duplicate of this bug. ***