2232320 – Virt-template-validator pods are getting scheduled on same node

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

Bug 2232320 - Virt-template-validator pods are getting scheduled on same node

Summary: Virt-template-validator pods are getting scheduled on same node

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	SSP
Sub Component:
Version:	4.14.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	future
Assignee:	Dominik Holler
QA Contact:	Geetika Kapoor
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-08-16 09:53 UTC by Akriti Gupta
Modified:	2023-12-14 16:05 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-12-14 16:05:28 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2224552	0	medium	CLOSED	Virt-template-validator pods are getting scheduled on same node on Upgraded cluster	2023-12-14 16:05:25 UTC
Red Hat Issue Tracker	CNV-32104	0	None	None	None	2023-12-14 16:05:28 UTC

Description Akriti Gupta 2023-08-16 09:53:53 UTC

Description of problem:
virt-template-validator pods are getting scheduled on same nodes

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. oc describe pod <virt-template-validator pod1>
2. oc describe pod <virt-template-validator pod2>

Actual results:
both scheduled on same node

Expected results:
both scheduled on different nodes

Additional info:
similar bug fixed for 4.11 https://bugzilla.redhat.com/show_bug.cgi?id=2056467

Comment 1 Geetika Kapoor 2023-08-16 12:00:14 UTC

Hi, Do you see same behavior when multiple nodes are up? 
Also if you kill the virt-validator pod and check if it gets started on other pod or if nodes goes down make sure it gets up on other node. TIA
This behavior is expected in case a single node is available.(in upgrade situation)

Comment 2 Kedar Bidarkar 2023-08-17 13:00:42 UTC

(In reply to Geetika Kapoor from comment #1)
> Hi, Do you see same behavior when multiple nodes are up? 
> Also if you kill the virt-validator pod and check if it gets started on
> other pod or if nodes goes down make sure it gets up on other node. TIA
> This behavior is expected in case a single node is available.(in upgrade
> situation)

Yes, seeing this issue when running when multiple nodes are up.
This is seen on a fresh_install setup and not running tests post upgrade.

Will update here, if we see this again.

Comment 3 Dominik Holler 2023-08-17 14:59:34 UTC

Please note that preferredDuringSchedulingIgnoredDuringExecution is used.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity says

> preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.

This means that if during the point in time when the virt-validator pods are scheduled only a single host is available, they will be scheduled to the same host. This is expected to happen in small clusters during the upgrade or in single node clusters.

If there are multiple nodes available, the two virt-validator pods are spread across two nodes, but only if possible.
This means if one pod is killed, it should be restarted on another node. If the second pod would be restarted on the same node like the first pod, it would be a bug.

Comment 4 Dominik Holler 2023-08-30 14:36:17 UTC

targeting to future for now, let's see if the behavior is reproducible if the scheduler has a chance to avoid it.

Comment 5 Akriti Gupta 2023-09-01 08:17:01 UTC

currently this issue is more visible on a compact cluster

Note You need to log in before you can comment on or make changes to this bug.