Bug 1523399

Summary: KeyError: 'vmName' after starting VM that was restored from snapshot with memory
Product: [oVirt] vdsm Reporter: Raz Tamir <ratamir>
Component: CoreAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.20.9CC: ahadas, bugs, ederevea, michal.skrivanek, ratamir
Target Milestone: ovirt-4.2.0Keywords: Automation, Regression
Target Release: 4.20.9.1Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-20 11:25:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs none

Description Raz Tamir 2017-12-07 21:30:43 UTC
Created attachment 1364482 [details]
engine and vdsm logs

Description of problem:
Start VM that was restored from a snapshot with memory is failing.

engine.log:

2017-12-07 23:17:15,155+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] VM {pitReinjection=false, memGuaranteedSize=1024, smpThreadsPerCore=1, cpuType=Conroe
, vmId=0139b240-d208-4d7b-8f0b-e8981ce26988, acpiEnable=true, vmType=kvm, smp=1, smpCoresPerSocket=1, emulatedMachine=pc-i440fx-rhel7.3.0, smartcardEnable=false, guestNumaNodes=[{memory=1024, cpus=0, nodeIndex=0}], transparentHugePages=tr
ue, displayNetwork=ovirtmgmt, vmName=vm_TestCase5138_0723125633, hiberVolHandle=650155db-bb01-4234-a974-724658ea8365,93512097-d821-43d7-806d-1b7cb44091b4,48dd6512-a488-4ba1-ae00-db5de01b6c9a,b101fce4-7f77-4e4d-83eb-b1674e8ea2ca,ef009c18-0
c8a-4888-8b0c-d23f61c97c0c,f30b55cb-9377-4634-8e8d-fbe480d07e66, maxVCpus=16, kvmEnable=true, devices=[{address={bus=0x00, domain=0x0000, function=0x0, slot=0x02, type=pci}, type=video, specParams={vgamem=16384, heads=1, vram=8192, ram=65
536}, device=qxl, deviceId=ad4b78b4-ecc0-466b-90ef-b25d1e0c8506}, {type=graphics, specParams={fileTransferEnable=true, spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir, copyPasteEnable=true}, devic
e=spice, deviceId=6b4563ca-b43b-439e-a797-9552469d5fb5}, {iface=ide, shared=false, path=, address={bus=1, controller=0, unit=0, type=drive, target=0}, readonly=true, index=2, type=disk, specParams={}, device=cdrom, deviceId=9de754c3-c9cd-
4018-a307-425ae9a2bda0}, {discard=false, shared=false, address={bus=0x00, domain=0x0000, function=0x0, slot=0x07, type=pci}, imageID=e0908c0b-69ce-4e43-b4d0-fd4e4bddb3ef, format=cow, index=0, optional=false, type=disk, deviceId=e0908c0b-6
9ce-4e43-b4d0-fd4e4bddb3ef, domainID=650155db-bb01-4234-a974-724658ea8365, propagateErrors=off, iface=virtio, readonly=false, bootOrder=1, poolID=93512097-d821-43d7-806d-1b7cb44091b4, volumeID=8e3d6391-5987-435d-b4f2-5bb1c1892e38, diskTyp
e=file, specParams={}, device=disk}, {filter=vdsm-no-mac-spoofing, nicModel=pv, address={bus=0x00, domain=0x0000, function=0x0, slot=0x03, type=pci}, filterParameters=[], type=interface, specParams={inbound={}, outbound={}}, device=bridge
, linkActive=true, deviceId=29a6d2f8-4a8c-4ce2-a7db-069540ab0ab3, macAddr=00:1a:4a:16:25:9e, network=ovirtmgmt}, {address={bus=0x00, domain=0x0000, function=0x0, slot=0x04, type=pci}, type=sound, specParams={}, device=ich6, deviceId=887c8
033-c85c-4102-bcb3-07f0d494bbaf}, {address={bus=0x00, domain=0x0000, function=0x2, slot=0x01, type=pci}, index=0, model=piix3-uhci, type=controller, specParams={}, device=usb, deviceId=2e7f5d47-f728-4df8-be10-faf51e7c1102}, {address={bus=
0x00, domain=0x0000, function=0x0, slot=0x08, type=pci}, type=balloon, specParams={model=virtio}, device=memballoon, deviceId=92744e14-a7b6-42cd-b0f3-a65e606a430f}, {address={bus=0x00, domain=0x0000, function=0x0, slot=0x05, type=pci}, in
dex=0, model=virtio-scsi, type=controller, specParams={}, device=scsi, deviceId=9deba246-6f03-4e39-8213-6265d6520399}, {address={bus=0x00, domain=0x0000, function=0x0, slot=0x06, type=pci}, type=controller, specParams={}, device=virtio-se
rial, deviceId=efedebcc-37fd-4771-82c2-a40d97e39567}, {model=virtio, type=rng, specParams={source=urandom}, device=virtio, deviceId=0c7b8ac9-e134-4b24-8141-e9365f4414f9}], custom={device_6c492117-858e-4f15-a319-801339270d03device_9ae8e066
-6afd-4575-a525-816f5444a9cbdevice_be9e350c-6556-4c00-9373-9d4528cb0b71=VmDevice:{id='VmDeviceId:{deviceId='be9e350c-6556-4c00-9373-9d4528cb0b71', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988'}', device='ide', type='CONTROLLER', specParams=
'[index=0]', address='{type=pci, slot=0x01, bus=0x00, domain=0x0000, function=0x1}', managed='false', plugged='true', readOnly='false', deviceAlias='ide', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}, d
evice_6c492117-858e-4f15-a319-801339270d03device_9ae8e066-6afd-4575-a525-816f5444a9cbdevice_be9e350c-6556-4c00-9373-9d4528cb0b71device_5c00be2a-429b-4c97-953e-23dad0207fbd=VmDevice:{id='VmDeviceId:{deviceId='5c00be2a-429b-4c97-953e-23dad0
207fbd', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988'}', device='spicevmc', type='CHANNEL', specParams='[]', address='{type=virtio-serial, bus=0, controller=0, port=3}', managed='false', plugged='true', readOnly='false', deviceAlias='chann
el2', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}, device_6c492117-858e-4f15-a319-801339270d03device_9ae8e066-6afd-4575-a525-816f5444a9cb=VmDevice:{id='VmDeviceId:{deviceId='9ae8e066-6afd-4575-a525-816
f5444a9cb', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988'}', device='unix', type='CHANNEL', specParams='[]', address='{type=virtio-serial, bus=0, controller=0, port=2}', managed='false', plugged='true', readOnly='false', deviceAlias='channe
l1', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}, device_6c492117-858e-4f15-a319-801339270d03=VmDevice:{id='VmDeviceId:{deviceId='6c492117-858e-4f15-a319-801339270d03', vmId='0139b240-d208-4d7b-8f0b-e8
981ce26988'}', device='unix', type='CHANNEL', specParams='[]', address='{type=virtio-serial, bus=0, controller=0, port=1}', managed='false', plugged='true', readOnly='false', deviceAlias='channel0', customProperties='[]', snapshotId='null
', logicalName='null', hostDevice='null'}}, timeOffset=0, nice=0, maxMemSize=4096, maxMemSlots=16, bootMenuEnable=false, memSize=1024, agentChannelName=ovirt-guest-agent.0}
2017-12-07 23:17:15,506+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] FINISH, CreateBrokerVDSCommand, log id: 3de9145
2017-12-07 23:17:15,510+02 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] FINISH, CreateVDSCommand, return: RestoringState, log id: 3ade8afd
2017-12-07 23:17:15,510+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] Lock freed to object 'EngineLock:{exclusiveLocks='[0139b240-d208-4d7b-8f0b-e8981ce26988=VM]', sharedLocks=''}'
2017-12-07 23:17:15,517+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] VM '0139b240-d208-4d7b-8f0b-e8981ce26988' was reported as Down on VDS '801e4f72-1885-4fee-8cc2-afc31a6d6068'(host_mixed_
2)
2017-12-07 23:17:15,520+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-7) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='801e4f72-1885-4fee-8cc2-a
fc31a6d6068', vmId='0139b240-d208-4d7b-8f0b-e8981ce26988', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 1ed2dac4
2017-12-07 23:17:15,524+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [vms_syncAction_9e520e8a-0d11-4b6f] EVENT_ID: USER_STARTED_VM(153), VM vm_TestCase5138_0723125633 was started by admi
n@internal-authz (Host: host_mixed_2).
2017-12-07 23:17:15,621+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] Setting new tasks map. The map contains now 2 tasks
2017-12-07 23:17:17,554+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-7) [] FINISH, DestroyVDSCommand, log id: 1ed2dac4
2017-12-07 23:17:17,555+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] VM '0139b240-d208-4d7b-8f0b-e8981ce26988'(vm_TestCase5138_0723125633) moved from 'RestoringState' --> 'Down'
2017-12-07 23:17:17,574+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase5138_0723125633 is down with error. Exit message: Wake up fro
m hibernation failed:'vmName'.
2017-12-07 23:17:17,575+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] add VM '0139b240-d208-4d7b-8f0b-e8981ce26988'(vm_TestCase5138_0723125633) to rerun treatment
2017-12-07 23:17:17,582+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-7) [] Rerun VM '0139b240-d208-4d7b-8f0b-e8981ce26988'. Called from VDS 'host_mixed_2'
2017-12-07 23:17:17,603+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-15422) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM vm_TestCase5138_0723125
633 on Host host_mixed_2.


vdsm.log:

2017-12-07 23:17:15,501+0200 INFO  (jsonrpc/0) [api.virt] FINISH create return={'status': {'message': 'Done', 'code': 0}, 'vmList': {'status': 'Restoring state', 'vmName': 'vm_TestCase5138_0723125633', 'vmId': '0139b240-d208-4d7b-8f0b-e89
81ce26988', 'devices': [], 'guestDiskMapping': {}, 'clientIp': '', 'statusTime': '4314659320', 'arch': 'x86_64'}} from=::ffff:10.35.161.183,58774, flow_id=vms_syncAction_9e520e8a-0d11-4b6f (api:52)
2017-12-07 23:17:15,503+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.create succeeded in 0.34 seconds (__init__:573)
2017-12-07 23:17:15,505+0200 INFO  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') VM wrapper has started (vm:2605)
2017-12-07 23:17:15,507+0200 INFO  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') Enabling drive monitoring (drivemonitor:53)
2017-12-07 23:17:15,508+0200 WARN  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') No balloon device present (vm:715)
2017-12-07 23:17:15,509+0200 ERROR (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') The vm start process failed (vm:911)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 840, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2652, in _run
    hooks.before_vm_start(self._buildDomainXML(), self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2166, in _buildDomainXML
    return self._make_domain_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2172, in _make_domain_xml
    libvirtxml.Domain(self.conf, self.log, self.arch)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/libvirtxml.py", line 73, in __init__
    self.dom.appendChildWithArgs('name', text=self.conf['vmName'])
KeyError: 'vmName'
2017-12-07 23:17:15,509+0200 INFO  (vm/0139b240) [virt.vm] (vmId='0139b240-d208-4d7b-8f0b-e8981ce26988') Changed state to Down: 'vmName' (code=1) (vm:1630)


Version-Release number of selected component (if applicable):
vdsm-4.20.9-1.el7ev.x86_64
rhvm-4.2.0-0.6.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a live snapshot with memory
2. Restore to that snapshot (preview + commit)
3. Start the VM

Actual results:
Ram snapshot is not working as expected

Expected results:


Additional info:
Regression - bug 1516405 was verified

Comment 1 Michal Skrivanek 2017-12-08 10:24:18 UTC
works for me

In which version was the snapshot created?

Comment 2 Michal Skrivanek 2017-12-08 10:48:36 UTC
(In reply to Michal Skrivanek from comment #1)
ah, the debug logging is not set up correctly, nevermind the previous question.

Raz, can you please make sure the debug log level is enable for api.virt as well? Despite all the storage debug noise we're missing the crucial logs for many other calls

Comment 3 Raz Tamir 2017-12-08 11:35:04 UTC
(In reply to Michal Skrivanek from comment #2)
> (In reply to Michal Skrivanek from comment #1)
> ah, the debug logging is not set up correctly, nevermind the previous
> question.

The snapshot was created in the same version as the engine and vdsm
> 
> Raz, can you please make sure the debug log level is enable for api.virt as
> well? Despite all the storage debug noise we're missing the crucial logs for
> many other calls

Sure,

Do you want me to reproduce with DEBUG level for virt?

Comment 4 Michal Skrivanek 2017-12-08 11:42:13 UTC
Seems the assumption in https://gerrit.ovirt.org/#/c/84266/ is wrong. The hiberVolHandle volume doesn't contain vmconf, it just contains the libvirt xml (obtained from libvirt on hibernation in both 4.1 and 4.2)

Arik, please doublecheck the code in executeVdsBrokerCommand checks properly the compatibility version of that snapshot and not the current VM's compat. (i.e. snapshot taken in 4.1, now the VM is in 4.2 cluster, on preview there should be a warning/note that the VM is going to be restored as a 4.1 compat VM (with cluster level override set to 4.1) - please make sure in that case the vmconf is sent.

Once that's correct the vdsm side can safely assume the snapshot is either: - created in 4.1, then engine sends vmconf and the code works as is. The _srcDomXML is used only for libvirt create call.
- created in 4.2 and engine doesn't send vmconf - it needs to replace(create) "xml" with _srcDomXML as the current way how to differentiate old from new (if "xml" in params or self.conf) relies on it

Comment 5 Michal Skrivanek 2017-12-08 11:46:45 UTC
(In reply to Raz Tamir from comment #3)
> Sure,

Thanks, it would greatly help with these cases
 
> Do you want me to reproduce with DEBUG level for virt?

Currently there's no need to, I believe I was able to deduce what's happening. Let's wait for a fix and try then!:)

Comment 6 Evgheni Dereveanchin 2017-12-08 15:23:47 UTC
Applied the patch to one of the hosts and went through the reproducer steps to create the snapshot and switch to it. Attempts to run the VM on a non-patched host fail and succeed on the patched one.

Comment 7 Raz Tamir 2017-12-14 08:56:47 UTC
Passed automation run on:
rhvm-4.2.0.2-0.1.el7.noarch
vdsm-4.20.9.2-1.el7ev.x86_64

Verified

Comment 8 Sandro Bonazzola 2017-12-20 11:25:11 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.