+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1466332 +++ ====================================================================== +++ This bug is an upstream to downstream clone. The original bug is: +++ +++ bug 1461811 +++ ====================================================================== Created attachment 1288037 [details] engine, vdsm and libvirt logs Description of problem: After trying to start a VM that was restored with snapshot RAM the operation fails and the Version-Release number of selected component (if applicable): How reproducible: 50% Steps to Reproduce: 1. Create a VM, add a snapshot 2. Start the VM, start a process on it (cat /dev/zero > /dev/null &) and take a ram snapshot 3. poweroff the vm 4. Restore the snapshot 5. Wait until the operation finishes, then start the VM Actual results: VM fails to start on the host, and it starts in a different host but the process is not running there (it's not in preview status) Expected results: VM starts in preview status and process is there Additional info: 2017-06-15 13:44:01,958+03 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (default task-13) [vms_syncAction_0ed17069-0150-4e64] FINISH, CreateVmVDSCommand, return: RestoringState, log id: 54e4fb6b 2017-06-15 13:44:01,959+03 INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-13) [vms_syncAction_0ed17069-0150-4e64] Lock freed to object 'EngineLock:{exclusiveLocks='[952f2562-8645-448c-af70-66da4b4e7af5=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-06-15 13:44:01,964+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [vms_syncAction_0ed17069-0150-4e64] EVENT_ID: USER_STARTED_VM(153), Correlation ID: vms_syncAction_0ed17069-0150-4e64, Job ID: c803e793-36ad-45ac-8aec-2a932a8a90e8, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5138_REST_ISCSI_1511364439 was started by admin@internal-authz (Host: host_mixed_2). 2017-06-15 13:44:07,268+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '952f2562-8645-448c-af70-66da4b4e7af5' was reported as Down on VDS '6fc6bef3-dfb3-4657-ab70-93af08324757'(host_mixed_2) 2017-06-15 13:44:07,269+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{runAsync='true', hostId='6fc6bef3-dfb3-4657-ab70-93af08324757', vmId='952f2562-8645-448c-af70-66da4b4e7af5', force='false', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 6456b912 2017-06-15 13:44:08,256+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] FINISH, DestroyVDSCommand, log id: 6456b912 2017-06-15 13:44:08,256+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '952f2562-8645-448c-af70-66da4b4e7af5'(vm_TestCase5138_REST_ISCSI_1511364439) moved from 'RestoringState' --> 'Down' 2017-06-15 13:44:08,267+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-3) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5138_REST_ISCSI_1511364439 is down with error. Exit message: Wake up from hibernation failed:operation failed: job: unexpectedly failed. 2017-06-15 13:44:08,268+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] add VM '952f2562-8645-448c-af70-66da4b4e7af5'(vm_TestCase5138_REST_ISCSI_1511364439) to rerun treatment 2017-06-15 13:44:06,252+0300 DEBUG (vm/952f2562) [storage.ResourceManager] No one is waiting for resource '00_storage.f79553b2-4f4a-4236-8d5d-1764e2969ef6', Clearing records. (resourceManager:599) 2017-06-15 13:44:06,253+0300 DEBUG (vm/952f2562) [storage.TaskManager.Task] (Task='166bfdc2-3e90-4a1c-b2fd-b08618d50257') ref 0 aborting False (task:997) 2017-06-15 13:44:06,253+0300 ERROR (vm/952f2562) [virt.vm] (vmId='952f2562-8645-448c-af70-66da4b4e7af5') The vm start process failed (vm:631) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 1993, in _run self._connection.restoreFlags(fname, srcDomXML, 0) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4351, in restoreFlags if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self) libvirtError: operation failed: job: unexpectedly failed 2017-06-15 13:44:06,265+0300 INFO (vm/952f2562) [virt.vm] (vmId='952f2562-8645-448c-af70-66da4b4e7af5') Changed state to Down: operation failed: job: unexpectedly failed (code=1) (vm:1221) (Originally by Carlos Mestre Gonzalez) (Originally by rhev-integ)
Unspecified severity. Does this happen on RHEL 7.4? Anything interesting in libvirt logs? (Originally by Yaniv Kaul) (Originally by rhev-integ)
7.4 hosts: 2017-06-15 10:44:05.104+0000: 3352: error : virProcessRunInMountNamespace:1159 : internal error: child reported: unable to set security context 'system_u:object_r:virt_content_t:s0' on '/rhev/data-center/c7ca34b4-786a-4105-8e6a-9e6002ec633c/f79553b2-4f4a-4236-8d5d-1764e2969ef6/images/71e6c6ef-d1c9-41e2-9a54-0752718ebda0/e3389fb0-6ebb-411e-8a71-57cdfa7b2419': No such file or directory 2017-06-15 10:44:05.447+0000: 3352: error : qemuMigrationCheckJobStatus:1425 : operation failed: job: unexpectedly failed 2017-06-15 10:44:05.501+0000: 3345: error : qemuMonitorIORead:588 : Unable to read from monitor: Connection reset by peer (Originally by Carlos Mestre Gonzalez) (Originally by rhev-integ)
could you please provide libvirt debug logs? (Originally by Tomas Jelinek) (Originally by rhev-integ)
and qemu log (Originally by michal.skrivanek) (Originally by rhev-integ)
and retest with both hosts in SELinux Permissive mode, please (Originally by michal.skrivanek) (Originally by rhev-integ)
Created attachment 1289537 [details] vdsm, engine, qemu, libvirt logs. Logs from new run: 2017-06-20 11:54:08,136+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5138_REST_ISCSI_2009450198 is down with error. Exit message: Wake up from hibernation failed:operation failed: job: unexpectedly failed. vdsm: 2017-06-20 05:31:09,171+0300 ERROR (vm/0003f234) [virt.vm] (vmId='0003f234-603f-4f43-8c8f-89c3b7941bfb') The vm start process failed (vm:631) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 1993, in _run self._connection.restoreFlags(fname, srcDomXML, 0) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4351, in restoreFlags if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self) libvirtError: operation failed: job: unexpectedly failed I'll re-run in permissive mode to check (Originally by Carlos Mestre Gonzalez) (Originally by rhev-integ)
Tested with permissive mode in the only host in the cluster with same result. (Originally by Carlos Mestre Gonzalez) (Originally by rhev-integ)
I've also seen this in last 4.2 execution with RHEL 7.4: 4.2.0-0.0.master.20170619142113 (Originally by Carlos Mestre Gonzalez) (Originally by rhev-integ)
I see this bug with some different error - engine.log: 2017-06-21 23:33:23,625+03 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (default task-18) [vms_syncAction_cf349996-d9b5-4dd9] FINISH, CreateVmVDSCommand, return: RestoringState, log id: 7ec92f8c 2017-06-21 23:33:23,625+03 INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-18) [vms_syncAction_cf349996-d9b5-4dd9] Lock freed to object 'EngineLock:{exclusiveLocks='[9c7ea8a6-85bc-4475-a249-b6e399255735=VM]', sharedLocks=''}' 2017-06-21 23:33:23,634+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-18) [vms_syncAction_cf349996-d9b5-4dd9] EVENT_ID: USER_STARTED_VM(153), Correlation ID: vms_syncAction_cf349996-d9b5-4dd9, Job ID: a61ab977-183c-4fcb-9289-beb397cee56d, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5134_REST_ISCSI_2123273644 was started by admin@internal-authz (Host: host_mixed_1). 2017-06-21 23:33:25,533+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM '9c7ea8a6-85bc-4475-a249-b6e399255735' was reported as Down on VDS '04f1d1a0-9006-47a4-9a0b-10ea0262c17a'(host_mixed_1) 2017-06-21 23:33:25,534+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-14) [] START, DestroyVDSCommand(HostName = host_mixed_1, DestroyVmVDSCommandParameters:{runAsync='true', hostId='04f1d1a0-9006-47a4-9a0b-10ea0262c17a', vmId='9c7ea8a6-85bc-4475-a249-b6e399255735', force='false', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 7e3bdb33 2017-06-21 23:33:25,926+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-14) [] FINISH, DestroyVDSCommand, log id: 7e3bdb33 2017-06-21 23:33:25,926+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM '9c7ea8a6-85bc-4475-a249-b6e399255735'(vm_TestCase5134_REST_ISCSI_2123273644) moved from 'RestoringState' --> 'Down' 2017-06-21 23:33:25,941+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-14) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5134_REST_ISCSI_2123273644 is down with error. Exit message: : Target CPU feature count 3 does not match source 0. 2017-06-21 23:33:25,942+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] add VM '9c7ea8a6-85bc-4475-a249-b6e399255735'(vm_TestCase5134_REST_ISCSI_2123273644) to rerun treatment 2017-06-21 23:33:25,946+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-14) [] Rerun VM '9c7ea8a6-85bc-4475-a249-b6e399255735'. Called from VDS 'host_mixed_1' 2017-06-21 23:33:25,954+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Correlation ID: vms_syncAction_cf349996-d9b5-4dd9, Job ID: a61ab977-183c-4fcb-9289-beb397cee56d, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vm_TestCase5134_REST_ISCSI_2123273644 on Host host_mixed_1. vdsm.log: 2017-06-21 23:33:25,310+0300 ERROR (vm/9c7ea8a6) [virt.vm] (vmId='9c7ea8a6-85bc-4475-a249-b6e399255735') The vm start process failed (vm:631) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 1993, in _run self._connection.restoreFlags(fname, srcDomXML, 0) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4351, in restoreFlags if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self) libvirtError: unsupported configuration: Target CPU feature count 3 does not match source 0 2017-06-21 23:33:25,313+0300 INFO (vm/9c7ea8a6) [virt.vm] (vmId='9c7ea8a6-85bc-4475-a249-b6e399255735') Changed state to Down: unsupported configuration: Target CPU feature count 3 does not match source 0 (code=1) (vm:1221) (Originally by Raz Tamir) (Originally by rhev-integ)
Created attachment 1290652 [details] new_logs (Originally by Raz Tamir) (Originally by rhev-integ)
too bad we missed that in 4.1.3 but this is indeed a serious issue. The errors mentioned in previous comments are due to libvirt regression fixed in bug 1441662 But there is another error afterwards in vdsm which needs to be fixed (reducing severity since this is not a released libvirt version yet, workaround is to use 7.3 libvirt) (Originally by michal.skrivanek) (Originally by rhev-integ)
Reproduce it by virsh on libvirt-3.2.0-15.el7a.x86_64: 1. Prepare a running VM with following xml: <cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>Penryn</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='ds'/> <feature policy='require' name='acpi'/> <feature policy='require' name='ss'/> <feature policy='require' name='ht'/> <feature policy='require' name='tm'/> <feature policy='require' name='pbe'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='monitor'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='vmx'/> <feature policy='require' name='smx'/> <feature policy='require' name='est'/> <feature policy='require' name='tm2'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='xsave'/> <feature policy='require' name='osxsave'/> </cpu> 2. Dump vm's xml # virsh dumpxml BZ1441662 > /tmp/BZ1441662.xml 3. Create memory only snapshot: # virsh snapshot-create-as BZ1441662 s1 --memspec /tmp/BZ1441662-s1.mem --no-metadata --diskspec vda,snapshot=no Domain snapshot s1 created 4. Restore the memory snapshot with dumped xml: # virsh restore /tmp/BZ1441662-s1.mem --xml /tmp/BZ1441662.xml error: Failed to restore domain from /tmp/BZ1441662-s1.mem error: unsupported configuration: Target CPU feature count 4 does not match source 0 (Originally by Han Han) (Originally by rhev-integ)
After further research, snapshot or save are both "save to file", migratable xml was used. When do restore, the saved xml and --xml will be compared, it will succeed only when the two xml are same. So need to use "--migratable" when dumpxml domain xml for restoring. #virsh dumpxml V > /tmp/V.xml #virsh restore /tmp/V-s1.mem --xml /tmp/V.xml error: Failed to restore domain from /tmp/V-s1.mem error: unsupported configuration: Target CPU feature count 19 does not match source 17 #virsh dumpxml V --migratable > /tmp/V-mig.xml #virsh restore /tmp/V-s1.mem --xml /tmp/V-mig.xml Domain restored from /tmp/V-s1.mem "# virsh save-image-edit /tmp/V-s1.mem" can be used to check how the saved xml in the snapshot looks like. So it seems the rhevm product didn't use "--migratable" flag when call libvirt function. Since the "http://gerrit.ovirt.org/78526" was posted, this issue maybe solved. Note: Update the pkg version in comment 14 to libvirt-3.2.0-14.el7.x86_64 (Originally by Yanqiu Zhang) (Originally by rhev-integ)
Verify with: Engine: RHEVM Version: 3.6.11.3-0.1.el6 Host: OS Version:RHEL - 7.4 - 11.el7 Kernel Version:3.10.0 - 663.el7.x86_64 KVM Version:2.9.0 - 14.el7 LIBVIRT Version:libvirt-3.2.0-4.el7 VDSM Version:vdsm-4.17.41-1.el7ev SPICE Version:0.12.8 - 2.el7 Steps:(Setup with one host) 1. Create a VM, add a snapshot 2. Start the VM, start a process on it (cat /dev/zero > /dev/null &) and take a ram snapshot 3. power off the vm 4. Restore the snapshot 5. Wait until the operation finishes, then start the VM Results: VM is up and running. I can connect to VM via console and ssh.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1824