Bug 1567617

Summary: Failure to resume VM, Error: Wake up from hibernation failed:'type'.
Product: [oVirt] ovirt-engine Reporter: Israel Pinto <ipinto>
Component: BLL.VirtAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.2.2CC: bugs, michal.skrivanek, pmatyas
Target Milestone: ovirt-4.2.3Keywords: Automation, AutomationBlocker, Regression
Target Release: 4.2.3.2Flags: michal.skrivanek: ovirt-4.2?
ykaul: blocker+
ipinto: planning_ack?
rule-engine: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.3.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-10 06:29:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine,vdsm logs none

Description Israel Pinto 2018-04-15 14:28:46 UTC
Created attachment 1422194 [details]
engine,vdsm logs

Description of problem:
Failed to resume VM after suspension 

Version-Release number of selected component (if applicable):
Engine version:4.2.3-0.1.el7
Host:RHEL - 7.5 - 8.el7
Kernel Version:3.10.0 - 861.el7.x86_64
KVM Version:2.10.0 - 21.el7_5.2
LIBVIRT Version:libvirt-3.9.0-14.el7_5.2
VDSM Version:vdsm-4.20.25-1.el7ev


How reproducible:
100 %

Steps to Reproduce:
1. Start VM
2. Suspend VM
3. Resume VM

Actual results:
VM failed to start.

Additional info:
Engine log:
2018-04-15 17:16:53,276+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-0) [] VM 'd6a8fd3f-6bdf-4030-9e31-ae193543e1c6'(suspend_resume_vm) moved from 'RestoringState' --> 'Down'
2018-04-15 17:16:53,387+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-0) [] EVENT_ID: VM_DOWN_ERROR(119), VM suspend_resume_vm is down with error. Exit message: Wake up from hibernation failed:'type'.

VDSM log:
2018-04-15 17:16:52,477+0300 ERROR (vm/d6a8fd3f) [virt.vm] (vmId='d6a8fd3f-6bdf-4030-9e31-ae193543e1c6') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2811, in _run
    hooks.before_vm_start(self._buildDomainXML(), self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2246, in _buildDomainXML
    self, dom, self._devices[hwclass.DISK])
  File "/usr/lib/python2.7/site-packages/vdsm/virt/domxml_preprocess.py", line 197, in update_disks_xml_from_objs
    dev_elem, disk_obj, vm.log, replace_attribs=True)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/storagexml.py", line 302, in update_disk_element_from_object
    old_drive_format = driver.attrib['type']
KeyError: 'type'
2018-04-15 17:16:52,478+0300 INFO  (vm/d6a8fd3f) [virt.vm] (vmId='d6a8fd3f-6bdf-4030-9e31-ae193543e1c6') Changed state to Down: 'type' (code=1) (vm:1683)
2018-04-15 17:16:52,478+0300 DEBUG (vm/d6a8fd3f) [virt.metadata.Descriptor] values: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 'startTime': 1523801666.38, 'destroy_on_reboot': False, 'resumeBehavior': 'auto_resume', 'launchPaused': 'false', 'memGuaranteedSize': 1024} (metadata:596)
2018-04-15 17:16:52,478+0300 DEBUG (vm/d6a8fd3f) [virt.metadata.Descriptor] values updated: {'guestAgentAPIVersion': 3, 'clusterVersion': '4.2', 'exitMessage': "Wake up from hibernation failed:'type'", 'resumeBehavior': 'auto_resume', 'exitReason': 1, 'memGuaranteedSize': 1024, 'minGuaranteedMemoryMb': 1024, 'startTime': 1523801770.477857, 'destroy_on_reboot': False, 'launchPaused': 'false', 'exitCode': 1} (metadata:601)

Comment 1 Red Hat Bugzilla Rules Engine 2018-04-16 05:08:32 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Yaniv Kaul 2018-04-16 06:33:36 UTC
This is regularly tested in OST.
Please try to understand how it was not caught there.

Comment 3 Francesco Romani 2018-04-16 07:22:30 UTC
(In reply to Israel Pinto from comment #0)
> Created attachment 1422194 [details]
> engine,vdsm logs
> 
> Description of problem:
> Failed to resume VM after suspension 
> 
> Version-Release number of selected component (if applicable):
> Engine version:4.2.3-0.1.el7
> Host:RHEL - 7.5 - 8.el7
> Kernel Version:3.10.0 - 861.el7.x86_64
> KVM Version:2.10.0 - 21.el7_5.2
> LIBVIRT Version:libvirt-3.9.0-14.el7_5.2
> VDSM Version:vdsm-4.20.25-1.el7ev

Looks like libvirt 3.9.0 is giving back XML which is valid and legal but omits the data Vdsm used to find and expects there. Examples are, in both cases for cdroms:

1. <source file="" startup="optional"> became
   <source startup="optional"> (fixed in Ia4dac75678b58b2f11f3cced2a88a78b17a76488)

2. now <driver name="qemu" type="raw" error_policy="report">
   became  <driver error_policy='report'>
   which is fine (from libvirt POV) because name=qemu and type=raw are QEMU's defaults AFAIR

It escaped OST because OST/CI workers are still on EL7.4 IIUC. I for myself did very limited testing on EL7.5 and that contributed to the bug, I'll update my env and do more tests ASAP.

The fix for this specific bz is simple, but we need more testing on 7.5, even though a simple OST run would be enough.

Comment 4 Michal Skrivanek 2018-04-17 05:04:34 UTC
*** Bug 1567773 has been marked as a duplicate of this bug. ***

Comment 5 Israel Pinto 2018-04-22 14:53:48 UTC
Verify with: 
Engine version: 4.2.3.2-0.1.el7

Steps to Reproduce:
1. Start VM
2. Suspend VM
3. Resume VM

PASS

Comment 6 Sandro Bonazzola 2018-05-10 06:29:49 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.