Bug 1557448
Summary: | remoteDispatchDomainFSFreeze hangs when taking a snapshot | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Javier Coscia <jcoscia> |
Component: | ovirt-engine | Assignee: | Benny Zlotnik <bzlotnik> |
Status: | CLOSED ERRATA | QA Contact: | Kevin Alon Goldblatt <kgoldbla> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.9 | CC: | bzlotnik, daniel-oliveira, ebenahar, jcoscia, lsurette, mkenneth, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi |
Target Milestone: | ovirt-4.2.0 | Flags: | lsvaty:
testing_plan_complete-
|
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-15 17:48:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1506697 | ||
Bug Blocks: |
Description
Javier Coscia
2018-03-16 15:16:33 UTC
This has little to do with our guest agent. In VDSM we use libvirt command virDomainFSFreeze() which in turn calls guest-fsfreeze-freeze command of QEMU Guest Agent. From the VDSM position there's not much we can do. We cannot even timeout the request to freeze, but that wouldn't help much anyway. I'm not even sure there's much libvirt can do about it. Even if libvirt timeouts on the request there's no way of knowing in which state the VM and it's disks are (notice that the thaw request later failed). It seems libvirt itself does not know in which state the VM is (moreover, my guess is that the domain remains internaly locked) and that is why the VM is reported as unresponsive in Engine (again, this has nothing to do with guest agent). I'm moving this to storage for review of the storage flow. Javier, can you please: 1) describe how to reproduce this 2) Make sure at what state is the QEMU GA service on the guest *before* the request to LSM? From event logs it looks like the service manager times out waiting for QEMU GA to start, but the service finishes starting 50 seconds later. 3) share libvirt logs for the VM, we may need to open another bug on libvirt/qemu Frankly, there's no point in the virDomainFSFreeze call in this flow. This isn't a snapshot we're ever going to use as a such, and we don't care about its consistency. The call was removed by bug 1506697, which should make this bz a mute point. As a workaround, user shared that he was able to perform the LSM operation by stopping ovirt-guest-agent & qemu-ga inside the guest, this way the vDisk was moved between storage domains, he also recorded a video of the operation, let me know if this is relevant so user can upload into our FTP server. (In reply to Javier Coscia from comment #11) > As a workaround, user shared that he was able to perform the LSM operation > by stopping ovirt-guest-agent & qemu-ga inside the guest, this way the vDisk > was moved between storage domains, he also recorded a video of the > operation, let me know if this is relevant so user can upload into our FTP > server. Yes please Verified with the following code: ---------------------------------------- ovirt-engine-4.2.2.5-0.1.el7.noarch vdsm-4.20.23-1.el7ev.x86_64 Verified with the following scenario: ---------------------------------------- 1. Start the VM and connect to the console 2. Ran LSM >>>> The files system of the VM did not freeze and all operations run continued Moving to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488 BZ<2>Jira Resync sync2jira sync2jira |