Bug 2232320

Summary:	Virt-template-validator pods are getting scheduled on same node
Product:	Container Native Virtualization (CNV)	Reporter:	Akriti Gupta <akrgupta>
Component:	SSP	Assignee:	Dominik Holler <dholler>
Status:	CLOSED MIGRATED	QA Contact:	Geetika Kapoor <gkapoor>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.14.0	CC:	dholler, gkapoor
Target Milestone:	---	Keywords:	Regression
Target Release:	future
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-12-14 16:05:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Akriti Gupta 2023-08-16 09:53:53 UTC

Description of problem:
virt-template-validator pods are getting scheduled on same nodes

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. oc describe pod <virt-template-validator pod1>
2. oc describe pod <virt-template-validator pod2>

Actual results:
both scheduled on same node

Expected results:
both scheduled on different nodes

Additional info:
similar bug fixed for 4.11 https://bugzilla.redhat.com/show_bug.cgi?id=2056467

Comment 1 Geetika Kapoor 2023-08-16 12:00:14 UTC

Hi, Do you see same behavior when multiple nodes are up? 
Also if you kill the virt-validator pod and check if it gets started on other pod or if nodes goes down make sure it gets up on other node. TIA
This behavior is expected in case a single node is available.(in upgrade situation)

Comment 2 Kedar Bidarkar 2023-08-17 13:00:42 UTC

(In reply to Geetika Kapoor from comment #1)
> Hi, Do you see same behavior when multiple nodes are up? 
> Also if you kill the virt-validator pod and check if it gets started on
> other pod or if nodes goes down make sure it gets up on other node. TIA
> This behavior is expected in case a single node is available.(in upgrade
> situation)

Yes, seeing this issue when running when multiple nodes are up.
This is seen on a fresh_install setup and not running tests post upgrade.

Will update here, if we see this again.

Comment 3 Dominik Holler 2023-08-17 14:59:34 UTC

Please note that preferredDuringSchedulingIgnoredDuringExecution is used.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity says

> preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.

This means that if during the point in time when the virt-validator pods are scheduled only a single host is available, they will be scheduled to the same host. This is expected to happen in small clusters during the upgrade or in single node clusters.

If there are multiple nodes available, the two virt-validator pods are spread across two nodes, but only if possible.
This means if one pod is killed, it should be restarted on another node. If the second pod would be restarted on the same node like the first pod, it would be a bug.

Comment 4 Dominik Holler 2023-08-30 14:36:17 UTC

targeting to future for now, let's see if the behavior is reproducible if the scheduler has a chance to avoid it.

Comment 5 Akriti Gupta 2023-09-01 08:17:01 UTC

currently this issue is more visible on a compact cluster