Description of problem: As part of cnv chaos testing we explored scenarios where we explored how node disruptions like power loss, reboots, suspends (node as vm) affect the workload. We noticed consistent behaviour where workload is rescheduled only when the node is back again. No matter if runStrategy is set to always or the vms are migratable the platform only reacts when the node is back (recovered). Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Schedule workload (vm) to run on specific node 2. Reboot the node 3. Actual results: Vms are rescheduled only when the node is back Expected results: The cluster workload is rebalanced when node failure, reboot is detected. Additional info:
The controller manager has a timeout (usually 5 minutes) to wait for a node to comeback on it's own. If the timeout expires then reschedule pods. Did you wait over 5 minutes to see if worksloads migrated? Note: Daemonsets and ReplicaSets will not migrate due to how they work.
For context, KubeVirt is only going to re-schedule the VM once the VMI's pod has completely terminated and is in a finalized state. If the VMI pods on the restarted node do not transition to a finalized state, the VM controller won't proceed with rescheduling the VM workload somewhere else.
Do you suggest it is CNV specific issue? In this scenario we can't assume we will see vmi pod completely terminated.
> Do you suggest it is CNV specific issue? In this scenario we can't assume we will see vmi pod completely terminated. This would only be a CNV specific issue if the Pod reaches a finalized state and the VM controller does not attempt to reschedule the workload. I believe it's likely that CNV is behaving correctly based on the state of the Pod that it observes. It's up to OCP to determine that the pod has terminated due to node failure and mark it as finalized. The only way to know for sure is to capture the VM/VMI pod's yaml during the time period that you'd expect the reschedule to occur. From that we can gain an understanding of how the Pod's status is reported and then infer what the correct action CNV should take based on that Pod's status.
I tested node shutdown with workload pod and vm. I observed different behavior for both. After ~5mins the pod is rescheduled whereas vm stays in Running. Based on those findings I am changing the product to CNV since it seems to be CNV specific issue.
> I tested node shutdown with workload pod and vm. I observed different behavior for both. After ~5mins the pod is rescheduled whereas vm stays in Running. Based on those findings I am changing the product to CNV since it seems to be CNV specific issue. Try the same experiment with a StatfulSet of size 1. That's what we're modeled after. I believe this works differently than DaemonSets and Deployments [1] with regards to node failure. 1. https://github.com/kubernetes/kubernetes/issues/54368#issuecomment-339537281
Removing the target release from this BZ to ensure we re-triage it.
Re-reading Comment #7, I think this is the ancient confusion of "Running" vs "RunStrategy". Running is a request for a state, not a status field--Which is why we re-named it. With that, I am closing this as NOTABUG. Please feel free to re-open if you feel this is in error.