Bug 1284559

Summary: [cinder] VM failed to start immediately after an operation of 'preview live snapshot'
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Maor <mlipchuk>
Status: CLOSED NOTABUG QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.0.2CC: ahadas, amureini, bugs, derez, eshenitz, gklein, tnisan
Target Milestone: ovirt-3.6.2Flags: tnisan: ovirt-3.6.z?
eshenitz: planning_ack?
eshenitz: devel_ack?
eshenitz: testing_ack?
Target Release: 3.6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-01 16:31:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm + engine + cinder log
none
updated vdsm and engine logs none

Description Eyal Shenitzky 2015-11-23 15:12:25 UTC
Created attachment 1097702 [details]
vdsm + engine + cinder log

Description of problem:

After preview of live snapshot with memory enabled to a VM with cinder disk and os, The VM failed  to start.
The engine ends successfully the preview and displays the snapshot as "ok", but the vm fails to start when attempting to do so immediately as the preview operation finished.
When trying to start the VM after a short period of time (2 min approximately) the VM starts successfully.
When taking a live snapshot and preview without "Restore memory" option, the 
VM start without any failure.

Version-Release number of selected component (if applicable):
3.6.0.3-0.1.el6

How reproducible:
100%

Steps to Reproduce:
1. Create VM with cinder disk
2. run the VM
3. take live snapshot
4. stop the VM
5. preview the snapshot with 'Restore memory' checked
6. start the VM immediately as the preview operation finished

Actual results:
The VM fails to start after preview the live snapshot.
The VM starts successfully after a short period time (about 2 min).

Expected results:
VM should run immediately after preview operation finished successfully,
The engine should inform only the preview as 'ok' only after the hole operation actually finished.  

Additional info:
vdsm + engine + cinder log attached

Comment 1 Red Hat Bugzilla Rules Engine 2015-11-23 17:00:09 UTC
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset.
Please set the correct milestone or add the z-stream flag.

Comment 2 Maor 2015-11-23 18:34:18 UTC
it looks that the thaw operation failed to be executed:

VDSM log:
Thread-1929::WARNING::2015-11-23 14:59:11,232::vm::2948::virt.vm::(thaw) vmId=`97c47e43-9f97-46cd-a74b-a22acad59f2c`::Unable to thaw guest filesystems: Guest agent is not responding: Guest agent not available for now
Thread-1929::DEBUG::2015-11-23 14:59:11,232::bindingxmlrpc::1264::vds::(wrapper) return vmThaw with {'status': {'message': 'Guest agent is not responding: Guest agent not available for now', 'code': 19}}

Engine log:
12:59:08,852 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ThawVDSCommand] (org.ovirt.thread.pool-7-thread-44) [8e39098] Command 'ThawVDSCommand(HostName = adder, VdsAndVmIDVDSParametersBase:{runAsync='true', hostId='91e57057-818b-4679-b4aa-6d83b8c67232', vmId='97c47e43-9f97-46cd-a74b-a22acad59f2c'})' execution failed: VDSGenericException: VDSErrorException: Failed to ThawVDS, error = Guest agent is not responding: Guest agent not available for now, code = 19

You should see an audit log saying:
FAILED_TO_THAW_VM=Failed to thaw guest filesystems on VM ${VmName}. The filesystems might be unresponsive until the VM is restarted.

If the thaw operation is succeeded does it still gets reproduced?

Comment 3 Daniel Erez 2015-11-24 09:33:28 UTC
Hi Eyal,

Another couple of questions:
* Is the issue reproduces without taking a memory snapshot as well?
* Can you please check if the issue reproduces constantly and attach the full logs (the current engine log lack the second VM run).
* In order to determine whether it's related to thaw operation, can you please check if the issue reproduces also when thaw succeeded.
* Which OS did you use?
* Was guest agent installed on the OS?

Thanks!

Comment 4 Allon Mureinik 2015-11-26 13:42:36 UTC
This scenario self-corrects after two minutes, not urgent.

Comment 5 Eyal Shenitzky 2015-12-01 12:39:48 UTC
Created attachment 1100898 [details]
updated vdsm and engine logs

Comment 6 Eyal Shenitzky 2015-12-01 12:47:45 UTC
Hi Daniel,

* I attached updated logs.

* When taking a snapshot without a memory the bug isn't reproduce.

* The VM doesn't have an OS.

* The VM doesn't have Guest agent.

the bug related to thaw operation is - 
https://bugzilla.redhat.com/show_bug.cgi?id=1287066

thanks,

Comment 7 Allon Mureinik 2015-12-01 16:31:19 UTC
(In reply to eyal shenitzky from comment #6)
> Hi Daniel,
> 
> * I attached updated logs.
> 
> * When taking a snapshot without a memory the bug isn't reproduce.
> 
> * The VM doesn't have an OS.
> 
> * The VM doesn't have Guest agent.
> 
> the bug related to thaw operation is - 
> https://bugzilla.redhat.com/show_bug.cgi?id=1287066
> 
> thanks,
For a VM with no guest agent and no OS a live snapshot is meaningless.
Closing.