EL7 QEMU feature of migration autoconvergence uses cgroups to slow down the source guest when it is dirtying memory pages too fast. This conflicts with the (not only) default CPU QoS set by oVirt and these two features fight against each other. One proposal was to make MOM aware that migration is going on and do not enforce CPU QoS during that time, allowing autoconvergence to kick in. This should be race-free when not going through engine, e.g. MOM will be aware of the VDSM's Migration Source state and skip the policy checks for the timebeing.
(In reply to Michal Skrivanek from comment #0) > EL7 QEMU feature of migration autoconvergence uses cgroups to slow down the > source guest when it is dirtying memory pages too fast. This conflicts with > the (not only) default CPU QoS set by oVirt and these two features fight > against each other. > > One proposal was to make MOM aware that migration is going on and do not > enforce CPU QoS during that time, allowing autoconvergence to kick in. This > should be race-free when not going through engine, e.g. MOM will be aware of > the VDSM's Migration Source state and skip the policy checks for the > timebeing. Michal, autoconvergence is changing cgroups to throttle CPU consumption (several times). We should expect it to be less than the QoS limitation, so it should not be an issue. Even if MoM updates the cgroups during migration, autoconvergence is repeatedly updating the cgroups which means it should be fine. Do you have any samples to show where MoM settings are actually interrupting migration?
Yes, preliminary tests by srao indicated it's getting in teh way. Martin confirmed the code is resetting the values every 10s to the configured value (i.e. increasing it back when qemu slows down the guest) rendering the feature nonfunctional, the VM still can't converge. This corresponds with the observation. It may get tricky when some limit is applied by oVirt - need to verify QEMU side if they take into account anyone else setting an arbitrary limit in advance.
There is one thing you all are confused about. MOM does not set cgroups. MOM sets the cpu limit by calling libvirt and passing the settings (measurement period, maximum used time) there. Only libvirt knows about how that is converted to cgroups.
(In reply to Martin Sivák from comment #3) > There is one thing you all are confused about. MOM does not set cgroups. MOM > sets the cpu limit by calling libvirt and passing the settings (measurement > period, maximum used time) there. > > Only libvirt knows about how that is converted to cgroups. Martin, can we stop calling libvirt to set CPU tuning for migrating VMs? This should resolve the issue.
> Martin, > can we stop calling libvirt to set CPU tuning for migrating VMs? Yes we can stop issuing the CPU QoS updates when the VM is migrating (if we have a way of detecting that from MOM). But that is just a workaround for a bigger and hidden issue. We are not supposed to know that the limitation is implemented using cgroups… and so we should ask libvirt for a solution from their side as well. Workarounding this on our side will cause situations where the user updates the CPU limit to be more strict (60% -> 40% for example) and we won't reflect it during migration (which can take while). It will also let the VM do whatever it wishes if no auto-converge is happening (I suppose qemu will release the limitation if it is not needed).
Martin, can you please provide steps to verify the bug? thanks
Shira: Well I am not the reporter so actually no I do not have the reproducer.. Michal: Qemu is not using cgroups according to the developers, neither is libvirt apart from what we tell it to do. So who is changing the values? Might it be kernel via some indirect way when Qemu requests the throttling?
since the QoS limiting was only a proof of concept and AFAIK it is not actively used/deployed we can keep this for later release. Pending discussion about what layer should actually control QoS during migration
Martin, is this still relevant?
Hmm I do not know.. Michal?
That original proposal is not going to get implemented as of now. So we don't really need that. The autoconvergence algorithm in qemu has been improved instead, as well as our own downtime magic in 4.0
Closing based on comment 11. If relevant please reopen with all relevant information.