Summary: | [RFE] Time sync in VM after resuming from PAUSE state | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Andrea Perotti <aperotti> | |
Component: | vdsm | Assignee: | Steven Rosenberg <srosenbe> | |
Status: | CLOSED ERRATA | QA Contact: | Lukas Svaty <lsvaty> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 4.2.0 | CC: | aperotti, emarcus, gscott, lleistne, lsurette, mavital, michal.skrivanek, mkalinin, mperina, mtessun, omachace, rdlugyhe, rmcswain, srevivo, srosenbe, ycui | |
Target Milestone: | ovirt-4.3.0 | Keywords: | FutureFeature, ZStream | |
Target Release: | 4.3.0 | Flags: | vyerys:
testing_plan_complete+
|
|
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | v4.30.3 | Doc Type: | Enhancement | |
Doc Text: |
Making large snapshots and other abnormal events can pause virtual machines, impacting their system time, and other functions, such as timestamps. The current release provides Guest Time Synchronization, which, after a snapshot is created and the virtual machine is un-paused, uses VDSM and the guest agent to synchronize the system time of the virtual machine with that of the host. The time_sync_snapshot_enable option enables synchronization for snapshots. The time_sync_cont_enable option enables synchronization for abnormal events that may pause virtual machines. By default, these features are disabled for backward compatibility.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1620573 (view as bug list) | Environment: | ||
Last Closed: | 2019-05-08 12:35:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Bug Depends On: | ||||
Bug Blocks: | 1417161, 1520566, 1620573 |
Description
Andrea Perotti
2017-11-08 10:23:54 UTC
possible, but not sure it needs to be configurable or not can you add more details about the actual scenario? Even when VM pauses due to drive extension it shouldn't really take too much time, not more than few seconds which are better handled by NTP inside the guest rather than abrupt clock changes done externally. The scenario we are talking about is very time sensitive app, an in-memory app, like a jboss datagrid installation, running on VM with huge amount of RAM. Dealing with a pause of that VM *can* be worked out with ntp, but require a constant aggressive configuration of the tool, while having the same behaviour for paused like for suspended VMs can be more practical for some users. Eventually this can be make configurable, like, setting after how many seconds of pause state RHV should enforce the clock changes. (In reply to Andrea Perotti from comment #4) > The scenario we are talking about is very time sensitive app, an in-memory > app, like a jboss datagrid installation, running on VM with huge amount of > RAM. > > Dealing with a pause of that VM *can* be worked out with ntp, but require a > constant aggressive configuration of the tool, while having the same > behaviour for paused like for suspended VMs can be more practical for some > users. if it is a time sensitive app, wouldn't it be better to avoid ENOSPC paused states? Either bigger allocation chunks, lower watermark so it starts extending the drive sooner, bigger initial size of the thin provisioned disk, etc. > Eventually this can be make configurable, like, setting after how many > seconds of pause state RHV should enforce the clock changes. Creating a config option to do a time sync after resume from pause is feasible. I would still leave it off by default though. Implementing a configurable interval when it should be set is more complicated and will delay this RFE, but if that's required it's doable too. Still, before starting on this I believe we should check if we are really solving the right thing, making VMs not to pause in the first place might make more sense. (In reply to Michal Skrivanek from comment #5) > if it is a time sensitive app, wouldn't it be better to avoid ENOSPC paused > states? Either bigger allocation chunks, lower watermark so it starts > extending the drive sooner, bigger initial size of the thin provisioned > disk, etc. Customer is triggering this event when doing a full snapshot of VM included with memory, but also transient connectivity storage issues can lead to Pause. > Creating a config option to do a time sync after resume from pause is > feasible. I would still leave it off by default though. I think just having it would be good enough for my customer, and having it now is more important than having it perfectly configurable. Overall I believe that we should address the reason for the pausing in the first place. If that happens for snapshots, we should probably also check if we can get rid of the pausing that does happen. perhaps after_vm_cont hook can be used? Hopefully it's not suffering from the same problem as the after_vm_pause hook in bug 1543103 Other than that, the solution could look similar to Openstack's https://review.openstack.org/#/c/316116/ I'll add another large application to this one too. In this case, storage goes offline, the VM pauses, storage comes back, the VM resumes, but now its clock is way off, and that sets a whole bad cascade of events. Nothing we can do about the storage going offline, and pausing the VM is the correct action when that happens. If we can inject the correct time into that VM when it resumes, we can make lots of people happy. - Greg Verified upstream: ovirt-engine-4.3.0-0.0.master.20180903111244.git94dce75.el7.noarch vdsm-4.30.0-554.git4594d97.el7.x86_64 QE verification bot: the bug was verified upstream Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1077 |