Description of problem: When a cluster admin wants to apply a new MachineConfig onto the cluster nodes, typically the MachineConfig Daemon would evict all pods from the node before it can continue with the node reboot, which is a part of the process. However, if there is at least one running VM which is not Live-Migratable on the node in question, the drain process is being blocked. This is also blocking a cluster upgrade, since part of the upgrade is the machine-config cluster operator, which is eventually cascading-reboot all of the nodes in the cluster when completing the upgrade. MCD Log example: I0724 17:05:23.773732 2383453 update.go:99] error when evicting pod "virt-launcher-rhel82-rt-w9lt5" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. Version-Release number of selected component (if applicable): Occurred on official CNV 2.3.0 with OCP 4.4.11 and 4.5.3 How reproducible: 100% Steps to Reproduce: 1. Create and run a VM with non-RWX PVC, or with bridge interface binding (which can't be live-migrated). 2. Upgrade the OCP cluster (both z-stream and minor version [4.4.x-->4.5.x]) 3. Watch that some of the nodes are stuck in "Scheduling Disabled", enter the MCD logs (machine-config-daemon container) and see above error. Actual results: Cluster upgrade process / applying new machine config on the cluster is stuck due to running non-live-migratable virtual machines. Expected results: * Cluster upgrade is complete successfully, at least the machine-config cluster operator part. * MachineConfigs can be applied successfully, and worker MCP reports eventually "Updated=True". Additional info:
One important note here (at least for me) is that a PDB is created only if the VM spec have "evictionStrategy: LiveMigrate" defined. (https://github.com/kubevirt/kubevirt/blob/b5df6b571e413cbf0d3aacdf68595c3286e949f3/pkg/virt-controller/watch/drain/disruptionbudget/disruptionbudget.go#L341) I was thinking that a PDB is being created when virt-handler decides that VM migration method is live migration (https://github.com/kubevirt/kubevirt/blob/08be23f95744a08cd9295c2f647027fb4b2abfe2/pkg/virt-handler/vm.go#L728)
Just a note: This is much about user expectation. By default we say, if a user is creating a VM, that the VM will be running. In order to meet this promise, the evictioStartegy is set to live migrate - this iwll ensure that the VM will keep running (we keep our promise). Imagine a different startegy, then a user might be surprised why at some day (admin is doing an update) his VM is getting shut down, this would be a big loss (for solitaire highscores) for the user - and a surprise. Thus: We can not change the default, but we should see that we a) make this situation visible b) alert the user and admin c) allow the user to change the situation.
Following the conversation with Fabian, here are the possible steps for resolution: 1. We need to keep the "evictionStrategy: LiveMigrate" as the default in the VM templates (bug will not go away) 2. Should send events and alerts in case of non migratable VM that has "evictionStrategy: LiveMigrate" set. 3. In the UI make it simple to change the eviction strategy 4. Prominently display UI alerts about eviction strategy and migratability of a VM 5. implement the new eviction startegy called "LiveMigrateOrShutdown" so users can opt out of live migrate 6. For the non template case, introduce the "Shutdown" eviction strategy which is the default nowadays.
I would like to suggest that in case of "Shutdown" eviction strategy, the VM will immediately be launched on another node (if such node available according to nodePlacement / tolerations), to reduce downtime for the VM.
Oren, can you send an email to kubevirt-dev. I think both startegies (shutdown and restart) have their merits.
Oren - mainly to discuss the design on upstream. Thanks Igor. Scope of this bug should be 2. Should send events and alerts in case of non migratable VM that has "evictionStrategy: LiveMigrate" set. 5. implement the new eviction startegy called "LiveMigrateOrShutdown" so users can opt out of live migrate to be discussed is 6. For the non template case, introduce the "Shutdown" eviction strategy which is the default nowadays. here we need to decide on the term, I agree with Oren, that this should probably be rather be named "reschedule" The two following items would be addressed in other bugs: 4. Prominently display UI alerts about eviction strategy and migratability of a VM --> bug 1879059 3. In the UI make it simple to change the eviction strategy --> bug 1879056
(In reply to Fabian Deutsch from comment #11) > Oren - mainly to discuss the design on upstream. > > Thanks Igor. > Scope of this bug should be > > 2. Should send events and alerts in case of non migratable VM that has > "evictionStrategy: LiveMigrate" set. By alert you mean prometheus? A warning event like for mismatching PDBs should be good enough. > 5. implement the new eviction startegy called "LiveMigrateOrShutdown" so > users can opt out of live migrate They could opt out right now if it would be exposed in the UI. Having the LiveMigrateOrShutdown strategy may be an option to like on e.g. google VMs, where it is tried to migrate your VMs to minimize your disruption, but it may be rebooted any time. We could think about it, it would however have to be very clear that it is much more just for convenience (basically no guarantee) than a LiveMigrate strategy. > > to be discussed is > 6. For the non template case, introduce the "Shutdown" eviction strategy > which is the default nowadays. > here we need to decide on the term, I agree with Oren, that this should > probably be rather be named "reschedule" The evictionStrategy is set on the VMI, it has no notion of reschedule. It will be restarted if the VM has a runStrategy of Always for instance. > > The two following items would be addressed in other bugs: > 4. Prominently display UI alerts about eviction strategy and migratability > of a VM --> bug 1879059 > 3. In the UI make it simple to change the eviction strategy --> bug 1879056
>> 2. Should send events and alerts in case of non migratable VM that has >> "evictionStrategy: LiveMigrate" set. > > By alert you mean prometheus? A warning event like for mismatching PDBs should be good enough. I'm actually pretty keen on alerts, because this will help us to understand how many customer clusters are blocked by issues like this. Alerts are also making it easier for admin to identify the VMs affected by this issue. > They could opt out right now if it would be exposed in the UI. Agreed, that's why we created --> bug 1879056 > The evictionStrategy is set on the VMI, it has no notion of reschedule. It will be restarted if the VM has a runStrategy of Always for instance. Argh, I always mix this up. Then I still wonder what the right value for evictionStrategy would be: I suppose "shutdown"?
(In reply to Fabian Deutsch from comment #13) > >> 2. Should send events and alerts in case of non migratable VM that has > >> "evictionStrategy: LiveMigrate" set. > > > > By alert you mean prometheus? A warning event like for mismatching PDBs should be good enough. > > I'm actually pretty keen on alerts, because this will help us to understand > how many customer clusters are blocked by issues like this. > Alerts are also making it easier for admin to identify the VMs affected by > this issue. Let me rephrase. This may make sense for CNV due to the decisions done in the templates and the UI, but in general I think it is preferable if admins just monitor the events, but for kubevirt/kubevirt it does not necessarily make sense to expose such a metric. I am not saying that it is not useful, but it is in general not more useful than a metric to any other k8s event warning which may hint serious issues. Does OpenShift have a built-in solution to transform warnings to alerts? > > > They could opt out right now if it would be exposed in the UI. > > Agreed, that's why we created --> bug 1879056 > > > The evictionStrategy is set on the VMI, it has no notion of reschedule. It will be restarted if the VM has a runStrategy of Always for instance. > > Argh, I always mix this up. > Then I still wonder what the right value for evictionStrategy would be: I > suppose "shutdown"?
> I am not saying that it is not useful, but it is in general not more useful than a metric to any other k8s event warning which may hint serious issues. I agree that an alert is particularly interesting to CNV due to the fact that alerts are treated with the relevant attention in the UI and elsewhere. And along these lines I also agree that this is not necessarily the case for every cluster. If a prom alert is less relvant for kubevirt, would you still consider it helpful to have a metric to at least get a cluster wide overview ove the issue? You don't get this overview easily with events. > Does OpenShift have a built-in solution to transform warnings to alerts? That's a good question!
To verify: create a VM that uses bridge mode networking, and also specify evictionStrategy LiveMigrate. Observe prometheus alert.
Creatd a VM that uses bridge mode networking and which also specifies "evictionStrategy: LiveMigrate". We do observe prometheus alerts as seen from the UI. "Eviction policy for vm2-rhel84-bridge-nw (on node NodeN node05.redhat.com ) is set to Live Migration but the VM is not migratable" Will be moving this bug to VERIFIED state. Will be attaching the vm.yaml file and Alert screen-shots shortly.
Created attachment 1788729 [details] VM with bridge network and eviction-strategy as LiveMigrate
Created attachment 1788730 [details] Alerting Rule for VM not migratable
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920