Bug 1523399 - KeyError: 'vmName' after starting VM that was restored from snapshot with memory
Summary: KeyError: 'vmName' after starting VM that was restored from snapshot with memory
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.9
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-4.2.0
: 4.20.9.1
Assignee: Milan Zamazal
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-07 21:30 UTC by Raz Tamir
Modified: 2020-07-02 09:10 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:25:11 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
engine and vdsm logs (3.58 MB, application/x-gzip)
2017-12-07 21:30 UTC, Raz Tamir
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 85208 0 master MERGED virt: Make sure XML configuration is present on resume from snapshot 2020-07-02 06:39:18 UTC
oVirt gerrit 85264 0 ovirt-4.2.0 MERGED virt: Make sure XML configuration is present on resume from snapshot 2020-07-02 06:39:18 UTC

Description Raz Tamir 2017-12-07 21:30:43 UTC
Created attachment 1364482 [details]
engine and vdsm logs

Description of problem:
Start VM that was restored from a snapshot with memory is failing.

engine.log:

2017-12-07 23:17:15,155+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] VM {pitReinjection=false, memGuaranteedSize=1024, smpThreadsPerCore=1, cpuType=Conroe
, vmId=0139b240-d208-4d7b-8f0b-e8981ce26988, acpiEnable=true, vmType=kvm, smp=1, smpCoresPerSocket=1, emulatedMachine=pc-i440fx-rhel7.3.0, smartcardEnable=false, guestNumaNodes=[{memory=1024, cpus=0, nodeIndex=0}], transparentHugePages=tr
ue, displayNetwork=ovirtmgmt, vmName=vm_TestCase5138_0723125633, hiberVolHandle=650155db-bb01-4234-a974-724658ea8365,93512097-d821-43d7-806d-1b7cb44091b4,48dd6512-a488-4ba1-ae00-db5de01b6c9a,b101fce4-7f77-4e4d-83eb-b1674e8ea2ca,ef009c18-0
c8a-4888-8b0c-d23f61c97c0c,f30b55cb-9377-4634-8e8d-fbe480d07e66, maxVCpus=16, kvmEnable=true, devices=[{address={bus=0x00, domain=0x0000, function=0x0, slot=0x02, type=pci}, type=video, specParams={vgamem=16384, heads=1, vram=8192, ram=65
536}, device=qxl, deviceId=ad4b78b4-ecc0-466b-90ef-b25d1e0c8506}, {type=graphics, specParams={fileTransferEnable=true, spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir, copyPasteEnable=true}, devic
e=spice, deviceId=6b4563ca-b43b-439e-a797-9552469d5fb5}, {iface=ide, shared=false, path=, address={bus=1, controller=0, unit=0, type=drive, target=0}, readonly=true, index=2, type=disk, specParams={}, device=cdrom, deviceId=9de754c3-c9cd-
4018-a307-425ae9a2bda0}, {discard=false, shared=false, address={bus=0x00, domain=0x0000, function=0x0, slot=0x07, type=pci}, imageID=e0908c0b-69ce-4e43-b4d0-fd4e4bddb3ef, format=cow, index=0, optional=false, type=disk, deviceId=e0908c0b-6
9ce-4e43-b4d0-fd4e4bddb3ef, domainID=650155db-bb01-4234-a974-724658ea8365, propagateErrors=off, iface=virtio, readonly=false, bootOrder=1, poolID=93512097-d821-43d7-806d-1b7cb44091b4, volumeID=8e3d6391-5987-435d-b4f2-5bb1c1892e38, diskTyp
e=file, specParams={}, device=disk}, {filter=vdsm-no-mac-spoofing, nicModel=pv, address={bus=0x00, domain=0x0000, function=0x0, slot=0x03, type=pci}, filterParameters=[], type=interface, specParams={inbound={}, outbound={}}, device=bridge
, linkActive=true, deviceId=29a6d2f8-4a8c-4ce2-a7db-069540ab0ab3, macAddr=00:1a:4a:16:25:9e, network=ovirtmgmt}, {address={bus=0x00, domain=0x0000, function=0x0, slot=0x04, type=pci}, type=sound, specParams={}, device=ich6, deviceId=887c8
033-c85c-4102-bcb3-07f0d494bbaf}, {address={bus=0x00, domain=0x0000, function=0x2, slot=0x01, type=pci}, index=0, model=piix3-uhci, type=controller, specParams={}, device=usb, deviceId=2e7f5d47-f728-4df8-be10-faf51e7c1102}, {address={bus=
0x00, domain=0x0000, function=0x0, slot=0x08, type=pci}, type=balloon, specParams={model=virtio}, device=memballoon, deviceId=92744e14-a7b6-42cd-b0f3-a65e606a430f}, {address={bus=0x00, domain=0x0000, function=0x0, slot=0x05, type=pci}, in
dex=0, model=virtio-scsi, type=controller, specParams={}, device=scsi, deviceId=9deba246-6f03-4e39-8213-6265d6520399}, {address={bus=0x00, domain=0x0000, function=0x0, slot=0x06, type=pci}, type=controller, specParams={}, device=virtio-se
rial, deviceId=efedebcc-37fd-4771-82c2-a40d97e39567}, {model=virtio, type=rng, specParams={source=urandom}, device=virtio, deviceId=0c7b8ac9-e134-4b24-8141-e9365f4414f9}], custom={device_6c492117-858e-4f15-a319-801339270d03device_9ae8e066
-6afd-4575-a525-816f5444a9cbdevice_be9e350c-6556-4c00-9373-9d4528cb0b71=VmDevice:{id='VmDeviceId:{deviceId='be9e350c-6556-4c00-9373-9d4528cb0b71', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988'}', device='ide', type='CONTROLLER', specParams=
'[index=0]', address='{type=pci, slot=0x01, bus=0x00, domain=0x0000, function=0x1}', managed='false', plugged='true', readOnly='false', deviceAlias='ide', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}, d
evice_6c492117-858e-4f15-a319-801339270d03device_9ae8e066-6afd-4575-a525-816f5444a9cbdevice_be9e350c-6556-4c00-9373-9d4528cb0b71device_5c00be2a-429b-4c97-953e-23dad0207fbd=VmDevice:{id='VmDeviceId:{deviceId='5c00be2a-429b-4c97-953e-23dad0
207fbd', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988'}', device='spicevmc', type='CHANNEL', specParams='[]', address='{type=virtio-serial, bus=0, controller=0, port=3}', managed='false', plugged='true', readOnly='false', deviceAlias='chann
el2', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}, device_6c492117-858e-4f15-a319-801339270d03device_9ae8e066-6afd-4575-a525-816f5444a9cb=VmDevice:{id='VmDeviceId:{deviceId='9ae8e066-6afd-4575-a525-816
f5444a9cb', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988'}', device='unix', type='CHANNEL', specParams='[]', address='{type=virtio-serial, bus=0, controller=0, port=2}', managed='false', plugged='true', readOnly='false', deviceAlias='channe
l1', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}, device_6c492117-858e-4f15-a319-801339270d03=VmDevice:{id='VmDeviceId:{deviceId='6c492117-858e-4f15-a319-801339270d03', vmId='0139b240-d208-4d7b-8f0b-e8
981ce26988'}', device='unix', type='CHANNEL', specParams='[]', address='{type=virtio-serial, bus=0, controller=0, port=1}', managed='false', plugged='true', readOnly='false', deviceAlias='channel0', customProperties='[]', snapshotId='null
', logicalName='null', hostDevice='null'}}, timeOffset=0, nice=0, maxMemSize=4096, maxMemSlots=16, bootMenuEnable=false, memSize=1024, agentChannelName=ovirt-guest-agent.0}
2017-12-07 23:17:15,506+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] FINISH, CreateBrokerVDSCommand, log id: 3de9145
2017-12-07 23:17:15,510+02 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] FINISH, CreateVDSCommand, return: RestoringState, log id: 3ade8afd
2017-12-07 23:17:15,510+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] Lock freed to object 'EngineLock:{exclusiveLocks='[0139b240-d208-4d7b-8f0b-e8981ce26988=VM]', sharedLocks=''}'
2017-12-07 23:17:15,517+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] VM '0139b240-d208-4d7b-8f0b-e8981ce26988' was reported as Down on VDS '801e4f72-1885-4fee-8cc2-afc31a6d6068'(host_mixed_
2)
2017-12-07 23:17:15,520+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-7) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='801e4f72-1885-4fee-8cc2-a
fc31a6d6068', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 1ed2dac4
2017-12-07 23:17:15,524+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] EVENT_ID: USER_STARTED_VM(153), VM vm_TestCase5138_0723125633 was started by admi
n@internal-authz (Host: host_mixed_2).
2017-12-07 23:17:15,621+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] Setting new tasks map. The map contains now 2 tasks
2017-12-07 23:17:17,554+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-7) [] FINISH, DestroyVDSCommand, log id: 1ed2dac4
2017-12-07 23:17:17,555+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] VM '0139b240-d208-4d7b-8f0b-e8981ce26988'(vm_TestCase5138_0723125633) moved from 'RestoringState' --> 'Down'
2017-12-07 23:17:17,574+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase5138_0723125633 is down with error. Exit message: Wake up fro
m hibernation failed:'vmName'.
2017-12-07 23:17:17,575+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] add VM '0139b240-d208-4d7b-8f0b-e8981ce26988'(vm_TestCase5138_0723125633) to rerun treatment
2017-12-07 23:17:17,582+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-7) [] Rerun VM '0139b240-d208-4d7b-8f0b-e8981ce26988'. Called from VDS 'host_mixed_2'
2017-12-07 23:17:17,603+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-15422) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM vm_TestCase5138_0723125
633 on Host host_mixed_2.


vdsm.log:

2017-12-07 23:17:15,501+0200 INFO  (jsonrpc/0) [api.virt] FINISH create return={'status': {'message': 'Done', 'code': 0}, 'vmList': {'status': 'Restoring state', 'vmName': 'vm_TestCase5138_0723125633', 'vmId': '0139b240-d208-4d7b-8f0b-e89
81ce26988', 'devices': [], 'guestDiskMapping': {}, 'clientIp': '', 'statusTime': '4314659320', 'arch': 'x86_64'}} from=::ffff:10.35.161.183,58774, flow_id=vms_syncAction_9e520e8a-0d11-4b6f (api:52)
2017-12-07 23:17:15,503+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.create succeeded in 0.34 seconds (__init__:573)
2017-12-07 23:17:15,505+0200 INFO  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') VM wrapper has started (vm:2605)
2017-12-07 23:17:15,507+0200 INFO  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') Enabling drive monitoring (drivemonitor:53)
2017-12-07 23:17:15,508+0200 WARN  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') No balloon device present (vm:715)
2017-12-07 23:17:15,509+0200 ERROR (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') The vm start process failed (vm:911)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 840, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2652, in _run
    hooks.before_vm_start(self._buildDomainXML(), self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2166, in _buildDomainXML
    return self._make_domain_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2172, in _make_domain_xml
    libvirtxml.Domain(self.conf, self.log, self.arch)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/libvirtxml.py", line 73, in __init__
    self.dom.appendChildWithArgs('name', text=self.conf['vmName'])
KeyError: 'vmName'
2017-12-07 23:17:15,509+0200 INFO  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') Changed state to Down: 'vmName' (code=1) (vm:1630)


Version-Release number of selected component (if applicable):
vdsm-4.20.9-1.el7ev.x86_64
rhvm-4.2.0-0.6.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a live snapshot with memory
2. Restore to that snapshot (preview + commit)
3. Start the VM

Actual results:
Ram snapshot is not working as expected

Expected results:


Additional info:
Regression - bug 1516405 was verified

Comment 1 Michal Skrivanek 2017-12-08 10:24:18 UTC
works for me

In which version was the snapshot created?

Comment 2 Michal Skrivanek 2017-12-08 10:48:36 UTC
(In reply to Michal Skrivanek from comment #1)
ah, the debug logging is not set up correctly, nevermind the previous question.

Raz, can you please make sure the debug log level is enable for api.virt as well? Despite all the storage debug noise we're missing the crucial logs for many other calls

Comment 3 Raz Tamir 2017-12-08 11:35:04 UTC
(In reply to Michal Skrivanek from comment #2)
> (In reply to Michal Skrivanek from comment #1)
> ah, the debug logging is not set up correctly, nevermind the previous
> question.

The snapshot was created in the same version as the engine and vdsm
> 
> Raz, can you please make sure the debug log level is enable for api.virt as
> well? Despite all the storage debug noise we're missing the crucial logs for
> many other calls

Sure,

Do you want me to reproduce with DEBUG level for virt?

Comment 4 Michal Skrivanek 2017-12-08 11:42:13 UTC
Seems the assumption in https://gerrit.ovirt.org/#/c/84266/ is wrong. The hiberVolHandle volume doesn't contain vmconf, it just contains the libvirt xml (obtained from libvirt on hibernation in both 4.1 and 4.2)

Arik, please doublecheck the code in executeVdsBrokerCommand checks properly the compatibility version of that snapshot and not the current VM's compat. (i.e. snapshot taken in 4.1, now the VM is in 4.2 cluster, on preview there should be a warning/note that the VM is going to be restored as a 4.1 compat VM (with cluster level override set to 4.1) - please make sure in that case the vmconf is sent.

Once that's correct the vdsm side can safely assume the snapshot is either: - created in 4.1, then engine sends vmconf and the code works as is. The _srcDomXML is used only for libvirt create call.
- created in 4.2 and engine doesn't send vmconf - it needs to replace(create) "xml" with _srcDomXML as the current way how to differentiate old from new (if "xml" in params or self.conf) relies on it

Comment 5 Michal Skrivanek 2017-12-08 11:46:45 UTC
(In reply to Raz Tamir from comment #3)
> Sure,

Thanks, it would greatly help with these cases
 
> Do you want me to reproduce with DEBUG level for virt?

Currently there's no need to, I believe I was able to deduce what's happening. Let's wait for a fix and try then!:)

Comment 6 Evgheni Dereveanchin 2017-12-08 15:23:47 UTC
Applied the patch to one of the hosts and went through the reproducer steps to create the snapshot and switch to it. Attempts to run the VM on a non-patched host fail and succeed on the patched one.

Comment 7 Raz Tamir 2017-12-14 08:56:47 UTC
Passed automation run on:
rhvm-4.2.0.2-0.1.el7.noarch
vdsm-4.20.9.2-1.el7ev.x86_64

Verified

Comment 8 Sandro Bonazzola 2017-12-20 11:25:11 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.