Bug 1461811 - RHEL7.4 | Trying to start a VM after restore RAM snapshot fails - 'virDomainRestoreFlags() failed'
RHEL7.4 | Trying to start a VM after restore RAM snapshot fails - 'virDomainR...
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
4.19.20
Unspecified Unspecified
high Severity high (vote)
: ovirt-4.1.4
: 4.19.21
Assigned To: Milan Zamazal
Israel Pinto
: Automation, Regression, ZStream
Depends On:
Blocks: 1466332
  Show dependency treegraph
 
Reported: 2017-06-15 07:28 EDT by Carlos Mestre González
Modified: 2018-01-14 06:54 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
VM snapshots that couldn't be restored could be created under certain circumstances, depending on software versions in the underlying stack. This has been fixed and the snapshots are created and restored in a compatible way now.
Story Points: ---
Clone Of:
: 1466332 (view as bug list)
Environment:
Last Closed: 2017-07-28 10:10:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ylavi: ovirt‑4.1+
ylavi: blocker+


Attachments (Terms of Use)
engine, vdsm and libvirt logs (312.02 KB, application/x-gzip)
2017-06-15 07:28 EDT, Carlos Mestre González
no flags Details
vdsm, engine, qemu, libvirt logs. (1.05 MB, application/x-gzip)
2017-06-20 05:21 EDT, Carlos Mestre González
no flags Details
new_logs (5.37 MB, application/x-gzip)
2017-06-22 07:12 EDT, Raz Tamir
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 78526 master MERGED virt: Use MIGRATABLE flag for _srcDomXML 2017-06-23 12:10 EDT
oVirt gerrit 78527 master MERGED virt: Make sure VMs are restored as paused 2017-06-23 12:18 EDT
oVirt gerrit 78650 ovirt-4.1 MERGED virt: Use MIGRATABLE flag for _srcDomXML 2017-06-26 08:01 EDT
oVirt gerrit 78651 ovirt-4.1 MERGED virt: Make sure VMs are restored as paused 2017-06-26 08:01 EDT

  None (edit)
Description Carlos Mestre González 2017-06-15 07:28:18 EDT
Created attachment 1288037 [details]
engine, vdsm and libvirt logs

Description of problem:
After trying to start a VM that was restored with snapshot RAM the operation fails and the 

Version-Release number of selected component (if applicable):


How reproducible:
50%

Steps to Reproduce:
1. Create a VM, add a snapshot
2. Start the VM, start a process on it (cat /dev/zero > /dev/null &) and take a ram snapshot
3. poweroff the vm
4. Restore the snapshot
5. Wait until the operation finishes, then start the VM

Actual results:
VM fails to start on the host, and it starts in a different host but the process is not running there (it's not in preview status)

Expected results:
VM starts in preview status and process is there 

Additional info:
2017-06-15 13:44:01,958+03 INFO  [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (default task-13) [vms_syncAction_0ed17069-0150-4e64] FINISH, CreateVmVDSCommand, return: RestoringState, log id: 54e4fb6b
2017-06-15 13:44:01,959+03 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-13) [vms_syncAction_0ed17069-0150-4e64] Lock freed to object 'EngineLock:{exclusiveLocks='[952f2562-8645-448c-af70-66da4b4e7af5=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-15 13:44:01,964+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [vms_syncAction_0ed17069-0150-4e64] EVENT_ID: USER_STARTED_VM(153), Correlation ID: vms_syncAction_0ed17069-0150-4e64, Job ID: c803e793-36ad-45ac-8aec-2a932a8a90e8, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5138_REST_ISCSI_1511364439 was started by admin@internal-authz (Host: host_mixed_2).
2017-06-15 13:44:07,268+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '952f2562-8645-448c-af70-66da4b4e7af5' was reported as Down on VDS '6fc6bef3-dfb3-4657-ab70-93af08324757'(host_mixed_2)
2017-06-15 13:44:07,269+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{runAsync='true', hostId='6fc6bef3-dfb3-4657-ab70-93af08324757', vmId='952f2562-8645-448c-af70-66da4b4e7af5', force='false', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 6456b912
2017-06-15 13:44:08,256+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] FINISH, DestroyVDSCommand, log id: 6456b912
2017-06-15 13:44:08,256+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '952f2562-8645-448c-af70-66da4b4e7af5'(vm_TestCase5138_REST_ISCSI_1511364439) moved from 'RestoringState' --> 'Down'
2017-06-15 13:44:08,267+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-3) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5138_REST_ISCSI_1511364439 is down with error. Exit message: Wake up from hibernation failed:operation failed: job: unexpectedly failed.
2017-06-15 13:44:08,268+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-3) [] add VM '952f2562-8645-448c-af70-66da4b4e7af5'(vm_TestCase5138_REST_ISCSI_1511364439) to rerun treatment



2017-06-15 13:44:06,252+0300 DEBUG (vm/952f2562) [storage.ResourceManager] No one is waiting for resource '00_storage.f79553b2-4f4a-4236-8d5d-1764e2969ef6', Clearing records. (resourceManager:599)
2017-06-15 13:44:06,253+0300 DEBUG (vm/952f2562) [storage.TaskManager.Task] (Task='166bfdc2-3e90-4a1c-b2fd-b08618d50257') ref 0 aborting False (task:997)
2017-06-15 13:44:06,253+0300 ERROR (vm/952f2562) [virt.vm] (vmId='952f2562-8645-448c-af70-66da4b4e7af5') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1993, in _run
    self._connection.restoreFlags(fname, srcDomXML, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4351, in restoreFlags
    if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self)
libvirtError: operation failed: job: unexpectedly failed
2017-06-15 13:44:06,265+0300 INFO  (vm/952f2562) [virt.vm] (vmId='952f2562-8645-448c-af70-66da4b4e7af5') Changed state to Down: operation failed: job: unexpectedly failed (code=1) (vm:1221)
Comment 1 Yaniv Kaul 2017-06-15 08:09:47 EDT
Unspecified severity.
Does this happen on RHEL 7.4?
Anything interesting in libvirt logs?
Comment 2 Carlos Mestre González 2017-06-15 08:30:37 EDT
7.4 hosts:

2017-06-15 10:44:05.104+0000: 3352: error : virProcessRunInMountNamespace:1159 : internal error: child reported: unable to set security context 'system_u:object_r:virt_content_t:s0' on '/rhev/data-center/c7ca34b4-786a-4105-8e6a-9e6002ec633c/f79553b2-4f4a-4236-8d5d-1764e2969ef6/images/71e6c6ef-d1c9-41e2-9a54-0752718ebda0/e3389fb0-6ebb-411e-8a71-57cdfa7b2419': No such file or directory

2017-06-15 10:44:05.447+0000: 3352: error : qemuMigrationCheckJobStatus:1425 : operation failed: job: unexpectedly failed

2017-06-15 10:44:05.501+0000: 3345: error : qemuMonitorIORead:588 : Unable to read from monitor: Connection reset by peer
Comment 3 Tomas Jelinek 2017-06-15 09:01:29 EDT
could you please provide libvirt debug logs?
Comment 4 Michal Skrivanek 2017-06-16 01:20:30 EDT
and qemu log
Comment 5 Michal Skrivanek 2017-06-16 09:15:52 EDT
and retest with both hosts in SELinux Permissive mode, please
Comment 6 Carlos Mestre González 2017-06-20 05:21 EDT
Created attachment 1289537 [details]
vdsm, engine, qemu, libvirt logs.

Logs from new run: 

2017-06-20 11:54:08,136+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5138_REST_ISCSI_2009450198 is down with error. Exit message: Wake up from hibernation failed:operation failed: job: unexpectedly failed.

vdsm:
2017-06-20 05:31:09,171+0300 ERROR (vm/0003f234) [virt.vm] (vmId='0003f234-603f-4f43-8c8f-89c3b7941bfb') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1993, in _run
    self._connection.restoreFlags(fname, srcDomXML, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4351, in restoreFlags
    if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self)
libvirtError: operation failed: job: unexpectedly failed


I'll re-run in permissive mode to check
Comment 7 Carlos Mestre González 2017-06-20 07:15:28 EDT
Tested with permissive mode in the only host in the cluster with same result.
Comment 9 Carlos Mestre González 2017-06-21 06:31:42 EDT
I've also seen this in last 4.2 execution with RHEL 7.4:

4.2.0-0.0.master.20170619142113
Comment 11 Raz Tamir 2017-06-22 07:07:54 EDT
I see this bug with some different error - 

engine.log:
2017-06-21 23:33:23,625+03 INFO  [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (default task-18) [vms_syncAction_cf349996-d9b5-4dd9] FINISH, CreateVmVDSCommand, return: RestoringState, log id: 7ec92f8c
2017-06-21 23:33:23,625+03 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-18) [vms_syncAction_cf349996-d9b5-4dd9] Lock freed to object 'EngineLock:{exclusiveLocks='[9c7ea8a6-85bc-4475-a249-b6e399255735=VM]', sharedLocks=''}'
2017-06-21 23:33:23,634+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-18) [vms_syncAction_cf349996-d9b5-4dd9] EVENT_ID: USER_STARTED_VM(153), Correlation ID: vms_syncAction_cf349996-d9b5-4dd9, Job ID: a61ab977-183c-4fcb-9289-beb397cee56d, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5134_REST_ISCSI_2123273644 was started by admin@internal-authz (Host: host_mixed_1).
2017-06-21 23:33:25,533+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM '9c7ea8a6-85bc-4475-a249-b6e399255735' was reported as Down on VDS '04f1d1a0-9006-47a4-9a0b-10ea0262c17a'(host_mixed_1)
2017-06-21 23:33:25,534+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-14) [] START, DestroyVDSCommand(HostName = host_mixed_1, DestroyVmVDSCommandParameters:{runAsync='true', hostId='04f1d1a0-9006-47a4-9a0b-10ea0262c17a', vmId='9c7ea8a6-85bc-4475-a249-b6e399255735', force='false', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 7e3bdb33
2017-06-21 23:33:25,926+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-14) [] FINISH, DestroyVDSCommand, log id: 7e3bdb33
2017-06-21 23:33:25,926+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM '9c7ea8a6-85bc-4475-a249-b6e399255735'(vm_TestCase5134_REST_ISCSI_2123273644) moved from 'RestoringState' --> 'Down'
2017-06-21 23:33:25,941+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-14) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_TestCase5134_REST_ISCSI_2123273644 is down with error. Exit message: : Target CPU feature count 3 does not match source 0.
2017-06-21 23:33:25,942+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] add VM '9c7ea8a6-85bc-4475-a249-b6e399255735'(vm_TestCase5134_REST_ISCSI_2123273644) to rerun treatment
2017-06-21 23:33:25,946+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-14) [] Rerun VM '9c7ea8a6-85bc-4475-a249-b6e399255735'. Called from VDS 'host_mixed_1'
2017-06-21 23:33:25,954+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Correlation ID: vms_syncAction_cf349996-d9b5-4dd9, Job ID: a61ab977-183c-4fcb-9289-beb397cee56d, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vm_TestCase5134_REST_ISCSI_2123273644 on Host host_mixed_1.

vdsm.log:
2017-06-21 23:33:25,310+0300 ERROR (vm/9c7ea8a6) [virt.vm] (vmId='9c7ea8a6-85bc-4475-a249-b6e399255735') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1993, in _run
    self._connection.restoreFlags(fname, srcDomXML, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4351, in restoreFlags
    if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self)
libvirtError: unsupported configuration: Target CPU feature count 3 does not match source 0
2017-06-21 23:33:25,313+0300 INFO  (vm/9c7ea8a6) [virt.vm] (vmId='9c7ea8a6-85bc-4475-a249-b6e399255735') Changed state to Down: unsupported configuration: Target CPU feature count 3 does not match source 0 (code=1) (vm:1221)
Comment 12 Raz Tamir 2017-06-22 07:12 EDT
Created attachment 1290652 [details]
new_logs
Comment 13 Michal Skrivanek 2017-06-22 14:40:12 EDT
too bad we missed that in 4.1.3 but this is indeed a serious issue.
The errors mentioned in previous comments are due to libvirt regression fixed in  bug 1441662
But there is another error afterwards in vdsm which needs to be fixed

(reducing severity since this is not a released libvirt version yet, workaround is to use 7.3 libvirt)
Comment 14 Han Han 2017-06-26 02:23:58 EDT
Reproduce it by virsh on libvirt-3.2.0-15.el7a.x86_64:
1. Prepare a running VM with following xml:
 <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>Penryn</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='monitor'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='smx'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='xsave'/>
    <feature policy='require' name='osxsave'/>
  </cpu>

2. Dump vm's xml
# virsh dumpxml BZ1441662 > /tmp/BZ1441662.xml

3. Create memory only snapshot:
# virsh snapshot-create-as BZ1441662 s1 --memspec /tmp/BZ1441662-s1.mem --no-metadata --diskspec vda,snapshot=no                               
Domain snapshot s1 created

4. Restore the memory snapshot with dumped xml:
# virsh restore /tmp/BZ1441662-s1.mem --xml /tmp/BZ1441662.xml
error: Failed to restore domain from /tmp/BZ1441662-s1.mem
error: unsupported configuration: Target CPU feature count 4 does not match source 0
Comment 15 yanqzhan@redhat.com 2017-06-26 04:23:12 EDT
After further research, snapshot or save are both "save to file", migratable xml was used. When do restore, the saved xml and --xml will be compared, it will succeed only when the two xml are same. So need to use "--migratable" when dumpxml domain xml for restoring.

#virsh dumpxml V > /tmp/V.xml

#virsh restore /tmp/V-s1.mem --xml /tmp/V.xml
error: Failed to restore domain from /tmp/V-s1.mem
error: unsupported configuration: Target CPU feature count 19 does not match source 17

#virsh dumpxml V --migratable > /tmp/V-mig.xml

#virsh restore /tmp/V-s1.mem --xml /tmp/V-mig.xml
Domain restored from /tmp/V-s1.mem

"# virsh save-image-edit /tmp/V-s1.mem" can be used to check how the saved xml  in the snapshot looks like.

So it seems the rhevm product didn't use "--migratable" flag when call libvirt function. Since the "http://gerrit.ovirt.org/78526" was posted, this issue maybe solved.

Note: Update the pkg version in comment 14 to libvirt-3.2.0-14.el7.x86_64
Comment 17 rhev-integ 2017-07-07 08:24:56 EDT
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'v4.19.21' doesn't contain patch 'https://gerrit.ovirt.org/78651']
gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.21

For more info please contact: infra@ovirt.org
Comment 18 Eyal Edri 2017-07-10 08:27:11 EDT
This bug is in the v4.19.21 tag and should have moved to ON_QA.
We've identified the problem in the bot which should move it and it will be fixed for next build.
Comment 19 Israel Pinto 2017-07-12 03:51:49 EDT
Verify with:
Engine:
Red Hat Virtualization Manager Version: 4.1.4-0.2.el7
Host:
OS Version:RHEL - 7.4 - 11.el7
Kernel Version:3.10.0 - 663.el7.x86_64
KVM Version:2.9.0 - 16.el7
LIBVIRT Version:libvirt-3.2.0-4.el7
VDSM Version:vdsm-4.19.21-1.el7ev


Steps:(Setup with one host)
1. Create a VM, add a snapshot
2. Start the VM, start a process on it (cat /dev/zero > /dev/null &) and take a ram snapshot
3. power off the vm
4. Restore the snapshot
5. Wait until the operation finishes, then start the VM

Results: VM is up and running.
I can connect to VM via console and ssh.

Note You need to log in before you can comment on or make changes to this bug.