1887484 – [CNV][Chaos] Node disruption do not affect its workload

Bug 1887484 - [CNV][Chaos] Node disruption do not affect its workload

Summary: [CNV][Chaos] Node disruption do not affect its workload

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.12.0
Assignee:	sgott
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1908661
TreeView+	depends on / blocked

Reported:	2020-10-12 15:30 UTC by Piotr Kliczewski
Modified:	2022-09-09 15:59 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-09 15:59:17 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Piotr Kliczewski 2020-10-12 15:30:51 UTC

Description of problem:
As part of cnv chaos testing we explored scenarios where we explored how node disruptions like power loss, reboots, suspends (node as vm) affect the workload. We noticed consistent behaviour where workload is rescheduled only when the node is back again. No matter if runStrategy is set to always or the vms are migratable the platform only reacts when the node is back (recovered).


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Schedule workload (vm) to run on specific node
2. Reboot the node 
3.

Actual results:
Vms are rescheduled only when the node is back

Expected results:
The cluster workload is rebalanced when node failure, reboot is detected.

Additional info:

Comment 1 Ryan Phillips 2020-10-12 16:36:37 UTC

The controller manager has a timeout (usually 5 minutes) to wait for a node to comeback on it's own. If the timeout expires then reschedule pods. Did you wait over 5 minutes to see if worksloads migrated?

Note: Daemonsets and ReplicaSets will not migrate due to how they work.

Comment 3 David Vossel 2020-10-13 14:50:51 UTC

For context, KubeVirt is only going to re-schedule the VM once the VMI's pod has completely terminated and is in a finalized state.  If the VMI pods on the restarted node do not transition to a finalized state, the VM controller won't proceed with rescheduling the VM workload somewhere else.

Comment 4 Piotr Kliczewski 2020-10-13 14:56:42 UTC

Do you suggest it is CNV specific issue? In this scenario we can't assume we will see vmi pod completely terminated.

Comment 5 David Vossel 2020-10-13 15:53:54 UTC

> Do you suggest it is CNV specific issue? In this scenario we can't assume we will see vmi pod completely terminated.

This would only be a CNV specific issue if the Pod reaches a finalized state and the VM controller does not attempt to reschedule the workload.

I believe it's likely that CNV is behaving correctly based on the state of the Pod that it observes. It's up to OCP to determine that the pod has terminated due to node failure and mark it as finalized.

The only way to know for sure is to capture the VM/VMI pod's yaml during the time period that you'd expect the reschedule to occur. From that we can gain an understanding of how the Pod's status is reported and then infer what the correct action CNV should take based on that Pod's status.

Comment 6 Piotr Kliczewski 2020-10-21 14:45:25 UTC

I tested node shutdown with workload pod and vm. I observed different behavior for both. After ~5mins the pod is rescheduled whereas vm stays in Running.
Based on those findings I am changing the product to CNV since it seems to be CNV specific issue.

Comment 7 David Vossel 2020-10-21 20:46:21 UTC

> I tested node shutdown with workload pod and vm. I observed different behavior for both. After ~5mins the pod is rescheduled whereas vm stays in Running.
Based on those findings I am changing the product to CNV since it seems to be CNV specific issue.

Try the same experiment with a StatfulSet of size 1. That's what we're modeled after. I believe this works differently than DaemonSets and Deployments [1] with regards to node failure. 

1. https://github.com/kubernetes/kubernetes/issues/54368#issuecomment-339537281

Comment 9 sgott 2022-06-15 18:37:52 UTC

Removing the target release from this BZ to ensure we re-triage it.

Comment 10 sgott 2022-09-09 15:59:17 UTC

Re-reading Comment #7, I think this is the ancient confusion of "Running" vs "RunStrategy". Running is a request for a state, not a status field--Which is why we re-named it. With that, I am closing this as NOTABUG. Please feel free to re-open if you feel this is in error.

Note You need to log in before you can comment on or make changes to this bug.