Created attachment 1690165 [details] Logs Description of problem: VM gets stuck after previewing memory snapshot with the following error: VDSM: 2020-05-20 13:31:44,172+0300 ERROR (vm/f7c88d3e) [virt.vm] (vmId='f7c88d3e-60bb-4c08-be9f-1b41cb63ea41') Failed to set time: internal error: unable to execute QEMU agent command 'guest-set-time': The command guest-set-time has been disabled for this instance (vm:1621) Attaching engine, vdsm and qemu logs. Version-Release number of selected component (if applicable): vdsm-4.30.46-1.el7ev.x86_64 qemu-img-rhev-2.12.0-44.el7_8.2.x86_64 ovirt-engine-4.3.10.3-0.1.master.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Create VM from template (latest-rhel-guest-image-8.2-infra) 2. Create snapshot 3. Run VM 4. Create memory snapshop 5. Power off VM 6. Preview memory snapshot 7. Run VM Actual results: VM gets stuck after powering up. Expected results: VM should not get stuck. Additional info: Relevant Logs are attached.
please get the exact qemu-guest-agent version from the guest, and the exact arguments it's running with (ps ax output or soemthing)
(In reply to Michal Skrivanek from comment #1) > please get the exact qemu-guest-agent version from the guest, and the exact > arguments it's running with (ps ax output or soemthing) Root cause as explained by Liran R. is this patch[1] . To fix: change in LiveSnapshotPerformFreezeInEngine to false in engine-config. Same Test passed and VM goes up without issues (tested on both templates 8.2 and 7.6) If other info is still necessary please re-add the NEEDINFO. [1] https://gerrit.ovirt.org/#/c/108673
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
The timing of the freeze and thaw using the engine when doing a snapshot with memory looks problematic. In VDSM we thaw right after the libvirt command finishes, and executing more actions on the drivers. When doing it only in the engine the FS is still frozen at this time.The QEMU error - 'guest-set-time' seems to relate exactly to this. Possible workarounds: 1. Change the config of LiveSnapshotPerformFreezeInEngine to false. 2. Shutting down the VM and starting it again. 3. Doing preview, starting the VM(it will be frozen), shutting it down, committing the snapshot and start. 4. Running: # vdsm-client VM thaw vmID=<uuid> on the host the VM is running and the FS is frozen. 5. Create the snapshot without memory. For now we should switch back the default LiveSnapshotPerformFreezeInEngine to false. Follow-up bug: BZ 1838493
this is just to fix unintended discrepancy between 4.4 and 4.3
Verified. The VM running successfully after taking a preview snapshot on memory snapshot. Verified it with the following versions: ovirt-engine-4.3.10.4-0.1.el7.noarch qemu-img-rhev-2.12.0-44.el7_8.2.x86_64 vdsm-4.30.46-1.el7ev.x86_64
Hi Liran, please review this release notes text, I need it approved as soon as possible for the Erratum going into the 4.3.10 release: Previously,creating a live snapshot with memory while LiveSnapshotPerformFreezeInEngine was set to True, resulted in a virtual machine file system that is frozen when previewing or committing the snapshot with memory restore. In this release, the virtual machine runs successfully after creating a preview snapshot from a memory snapshot.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2401
On upgrade, the value of LiveSnapshotPerformFreezeInEngine doesn't change. If customer has non-HE environment, and they update their system, the value remains as it was - 'false', and that is OK because on 4.3.10.4 the value changed back to 'false'. If customer installs from scratch, it still OK because the value is 'false'. But if the customer installs from scratch HE environment, the installation here works in a different way - the appliance version is 4.3.10.3 and to get 4.3.10.4, there is a need to upgrade it - and here is the problem, the value remains 'true' as it was defined in 4.3.10.3. Moving back to 'Assigned'. Tested on HE env: ovirt-engine-4.3.10.4-0.1.el7.noarch after upgrade from 4.3.10.3.
There is nothing to do with it now. In 4.3.10.3 we had a problem, by setting LiveSnapshotPerformFreezeInEngine=true. Users getting this version should change that value by: engine-config -s LiveSnapshotPerformFreezeInEngine=false. The LiveSnapshotPerformFreezeInEngine is persistent through upgrades. The 4.3.10.3 had a respin, unfortunately as I understand from you, the appliance didn't respin. We have BZ 1838493 which tagged to be in 4.3.11. In this bug, we solved the real problem. This means, LiveSnapshotPerformFreezeInEngine can be set to either 'true' or 'false' and you won't have any problem. Therefore, it doesn't make sense to change this value to 'false' on each upgrade(making the customer manually set this value every upgrade) nor a one time change. Bottom line: Upgrading from 4.3.10.3 to 4.3.11 should solve any issue, also if LiveSnapshotPerformFreezeInEngine is 'true'. If the user have 4.3.10.3 and doesn't want to upgrade or upgrade to version < 4.3.11, then he needs to set LiveSnapshotPerformFreezeInEngine to 'false'. I'm closing the bug. Marina, FYI.