Bug 1860671 - Node draining is blocked by virt-launcher eviction error due to PDB in a MachineConfig update
Summary: Node draining is blocked by virt-launcher eviction error due to PDB in a Mach...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: sgott
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks: 1879059
TreeView+ depends on / blocked
 
Reported: 2020-07-26 13:09 UTC by Oren Cohen
Modified: 2021-07-27 14:21 UTC (History)
7 users (show)

Fixed In Version: hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 14:20:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
VM with bridge network and eviction-strategy as LiveMigrate (1.47 KB, text/plain)
2021-06-02 16:05 UTC, Kedar Bidarkar
no flags Details
Alerting Rule for VM not migratable (122.59 KB, image/png)
2021-06-02 16:10 UTC, Kedar Bidarkar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 5392 0 None closed Generate an Event, and a Prometheus based Alert for non evictable VMs 2021-04-28 13:20:22 UTC
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:21:52 UTC

Description Oren Cohen 2020-07-26 13:09:37 UTC
Description of problem:
When a cluster admin wants to apply a new MachineConfig onto the cluster nodes, typically the MachineConfig Daemon would evict all pods from the node before it can continue with the node reboot, which is a part of the process.
However, if there is at least one running VM which is not Live-Migratable on the node in question, the drain process is being blocked.

This is also blocking a cluster upgrade, since part of the upgrade is the machine-config cluster operator, which is eventually cascading-reboot all of the nodes in the cluster when completing the upgrade.

MCD Log example:
I0724 17:05:23.773732 2383453 update.go:99] error when evicting pod "virt-launcher-rhel82-rt-w9lt5" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

Version-Release number of selected component (if applicable):
Occurred on official CNV 2.3.0 with OCP 4.4.11 and 4.5.3

How reproducible:
100%

Steps to Reproduce:
1. Create and run a VM with non-RWX PVC, or with bridge interface binding (which can't be live-migrated).
2. Upgrade the OCP cluster (both z-stream and minor version [4.4.x-->4.5.x])
3. Watch that some of the nodes are stuck in "Scheduling Disabled", enter the MCD logs (machine-config-daemon container) and see above error.

Actual results:
Cluster upgrade process / applying new machine config on the cluster is stuck due to running non-live-migratable virtual machines.

Expected results:
* Cluster upgrade is complete successfully, at least the machine-config cluster operator part.
* MachineConfigs can be applied successfully, and worker MCP reports eventually "Updated=True".

Additional info:

Comment 6 Igor Bezukh 2020-09-10 08:04:43 UTC
One important note here (at least for me) is that a PDB is created only if the VM spec have "evictionStrategy: LiveMigrate" defined. (https://github.com/kubevirt/kubevirt/blob/b5df6b571e413cbf0d3aacdf68595c3286e949f3/pkg/virt-controller/watch/drain/disruptionbudget/disruptionbudget.go#L341)

I was thinking that a PDB is being created when virt-handler decides that VM migration method is live migration
(https://github.com/kubevirt/kubevirt/blob/08be23f95744a08cd9295c2f647027fb4b2abfe2/pkg/virt-handler/vm.go#L728)

Comment 7 Fabian Deutsch 2020-09-15 08:02:22 UTC
Just a note: This is much about user expectation.
By default we say, if a user is creating a VM, that the VM will be running. In order to meet this promise, the evictioStartegy is set to live migrate - this iwll ensure that the VM will keep running (we keep our promise).
Imagine a different startegy, then a user might be surprised why at some day (admin is doing an update) his VM is getting shut down, this would be a big loss (for solitaire highscores) for the user - and a surprise.

Thus: We can not change the default, but we should see that we a) make this situation visible b) alert the user and admin c) allow the user to change the situation.

Comment 8 Igor Bezukh 2020-09-15 08:17:52 UTC
Following the conversation with Fabian, here are the possible steps for resolution:

1. We need to keep the "evictionStrategy: LiveMigrate" as the default in the VM templates (bug will not go away)
2. Should send events and alerts in case of non migratable VM that has "evictionStrategy: LiveMigrate" set.
3. In the UI make it simple to change the eviction strategy
4. Prominently display UI alerts about eviction strategy and migratability of a VM
5. implement the new eviction startegy called "LiveMigrateOrShutdown" so users can opt out of live migrate
6. For the non template case, introduce the "Shutdown" eviction strategy which is the default nowadays.

Comment 9 Oren Cohen 2020-09-15 08:24:32 UTC
I would like to suggest that in case of "Shutdown" eviction strategy, the VM will immediately be launched on another node (if such node available according to nodePlacement / tolerations), to reduce downtime for the VM.

Comment 10 Fabian Deutsch 2020-09-15 08:35:58 UTC
Oren, can you send an email to kubevirt-dev. I think both startegies (shutdown and restart) have their merits.

Comment 11 Fabian Deutsch 2020-09-15 11:02:22 UTC
Oren - mainly to discuss the design on upstream.

Thanks Igor.
Scope of this bug should be

2. Should send events and alerts in case of non migratable VM that has "evictionStrategy: LiveMigrate" set.
5. implement the new eviction startegy called "LiveMigrateOrShutdown" so users can opt out of live migrate

to be discussed is
6. For the non template case, introduce the "Shutdown" eviction strategy which is the default nowadays.
here we need to decide on the term, I agree with Oren, that this should probably be rather be named "reschedule"

The two following items would be addressed in other bugs:
4. Prominently display UI alerts about eviction strategy and migratability of a VM --> bug 1879059
3. In the UI make it simple to change the eviction strategy --> bug 1879056

Comment 12 Roman Mohr 2020-09-15 15:00:59 UTC
(In reply to Fabian Deutsch from comment #11)
> Oren - mainly to discuss the design on upstream.
> 
> Thanks Igor.
> Scope of this bug should be
> 
> 2. Should send events and alerts in case of non migratable VM that has
> "evictionStrategy: LiveMigrate" set.

By alert you mean prometheus? A warning event like for mismatching PDBs should be good enough.

> 5. implement the new eviction startegy called "LiveMigrateOrShutdown" so
> users can opt out of live migrate

They could opt out right now if it would be exposed in the UI.
Having the LiveMigrateOrShutdown strategy may be an option to like on e.g. google VMs, where it is tried to migrate your VMs to minimize your disruption, but it may be rebooted any time. We could think about it, it would however have to be very clear that it is much more just for convenience (basically no guarantee) than a LiveMigrate strategy.

> 
> to be discussed is
> 6. For the non template case, introduce the "Shutdown" eviction strategy
> which is the default nowadays.
> here we need to decide on the term, I agree with Oren, that this should
> probably be rather be named "reschedule"

The evictionStrategy is set on the VMI, it has no notion of reschedule. It will be restarted if the VM has a runStrategy of Always for instance.

> 
> The two following items would be addressed in other bugs:
> 4. Prominently display UI alerts about eviction strategy and migratability
> of a VM --> bug 1879059
> 3. In the UI make it simple to change the eviction strategy --> bug 1879056

Comment 13 Fabian Deutsch 2020-09-15 15:06:06 UTC
>> 2. Should send events and alerts in case of non migratable VM that has
>> "evictionStrategy: LiveMigrate" set.
>
> By alert you mean prometheus? A warning event like for mismatching PDBs should be good enough.

I'm actually pretty keen on alerts, because this will help us to understand how many customer clusters are blocked by issues like this.
Alerts are also making it easier for admin to identify the VMs affected by this issue.

> They could opt out right now if it would be exposed in the UI.

Agreed, that's why we created --> bug 1879056

> The evictionStrategy is set on the VMI, it has no notion of reschedule. It will be restarted if the VM has a runStrategy of Always for instance.

Argh, I always mix this up.
Then I still wonder what the right value for evictionStrategy would be: I suppose "shutdown"?

Comment 14 Roman Mohr 2020-09-15 15:25:50 UTC
(In reply to Fabian Deutsch from comment #13)
> >> 2. Should send events and alerts in case of non migratable VM that has
> >> "evictionStrategy: LiveMigrate" set.
> >
> > By alert you mean prometheus? A warning event like for mismatching PDBs should be good enough.
> 
> I'm actually pretty keen on alerts, because this will help us to understand
> how many customer clusters are blocked by issues like this.
> Alerts are also making it easier for admin to identify the VMs affected by
> this issue.

Let me rephrase. This may make sense for CNV due to the decisions done in the templates and the UI, but in general I think it is preferable if admins just monitor the events, but for kubevirt/kubevirt it does not necessarily make sense to expose such a metric. I am not saying that it is not useful, but it is in general not more useful than a metric to any other k8s event warning which may hint serious issues. Does OpenShift have a built-in solution to transform warnings to alerts? 

 
> 
> > They could opt out right now if it would be exposed in the UI.
> 
> Agreed, that's why we created --> bug 1879056
> 
> > The evictionStrategy is set on the VMI, it has no notion of reschedule. It will be restarted if the VM has a runStrategy of Always for instance.
> 
> Argh, I always mix this up.
> Then I still wonder what the right value for evictionStrategy would be: I
> suppose "shutdown"?

Comment 15 Fabian Deutsch 2020-09-16 07:53:41 UTC
> I am not saying that it is not useful, but it is in general not more useful than a metric to any other k8s event warning which may hint serious issues.

I agree that an alert is particularly interesting to CNV due to the fact that alerts are treated with the relevant attention in the UI and elsewhere.
And along these lines I also agree that this is not necessarily the case for every cluster.

If a prom alert is less relvant for kubevirt, would you still consider it helpful to have a metric to at least get a cluster wide overview ove the issue?
You don't get this overview easily with events.

> Does OpenShift have a built-in solution to transform warnings to alerts?

That's a good question!

Comment 18 sgott 2021-05-19 12:01:05 UTC
To verify: create a VM that uses bridge mode networking, and also specify evictionStrategy LiveMigrate. Observe prometheus alert.

Comment 19 Kedar Bidarkar 2021-06-02 15:58:17 UTC
Creatd a VM that uses bridge mode networking and which also specifies "evictionStrategy: LiveMigrate".

We do observe prometheus alerts as seen from the UI.

"Eviction policy for vm2-rhel84-bridge-nw (on node NodeN
node05.redhat.com
) is set to Live Migration but the VM is not migratable"

Will be moving this bug to VERIFIED state.

Will be attaching the vm.yaml file and Alert screen-shots shortly.

Comment 20 Kedar Bidarkar 2021-06-02 16:05:46 UTC
Created attachment 1788729 [details]
VM with bridge network and eviction-strategy as LiveMigrate

Comment 21 Kedar Bidarkar 2021-06-02 16:10:29 UTC
Created attachment 1788730 [details]
Alerting Rule for VM not migratable

Comment 25 errata-xmlrpc 2021-07-27 14:20:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920


Note You need to log in before you can comment on or make changes to this bug.