1906199 – [CNV-2.5] CNV Tries to Install on Windows Workers

Bug 1906199 - [CNV-2.5] CNV Tries to Install on Windows Workers

Summary: [CNV-2.5] CNV Tries to Install on Windows Workers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	SSP
Sub Component:
Version:	2.5.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	2.6.0
Assignee:	Kevin Wiesmueller
QA Contact:	guy chen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-09 21:38 UTC by Benjamin Schmaus
Modified:	2021-10-01 15:58 UTC (History)
CC List:	9 users (show)
Fixed In Version:	virt-operator-container-v2.6.0-99
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-10 11:21:27 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Benjamin Schmaus 2020-12-09 21:38:51 UTC

Description of problem: Customer has a 4.6.6 OCP environment with a combination of Linux and Windows worker nodes.  When CNV is installed the operator attempts to run pods to provide the services for CNV on the Windows worker nodes.  


Version-Release number of selected component (if applicable):
2.5.1

How reproducible:
100%

Steps to Reproduce:
1. Install OCP 4.6.6 with both Linux and Windows workers
2. Attempt to install CNV 2.5.1
3.

Actual results:
Operator ends up in a perpetual installing state since the pods never come up on the Windows workers

Expected results:
Operator should skip trying to install necessary pods on Windows workers.  I know node selectors could be used here but the customer should not have to do anything additional out of the box.

Additional info:

Comment 2 Peter Lauterbach 2020-12-10 20:02:46 UTC

This will become a much bigger problem when Windows Containers are available at the end of the month.

Comment 3 Inbar Rose 2020-12-17 09:22:04 UTC

@oyahud 

Should this be automatically done by node labeller at run-time?

Comment 4 Omer Yahud 2020-12-17 10:28:16 UTC

What a coincidence, we've started working on a fix for this last week unrelated to this bug.
We have an open PR for it upstream: https://github.com/kubevirt/kubevirt/pull/4669

Since we are already working on this, I will take it on our team

Comment 5 Dan Kenigsberg 2020-12-21 09:24:43 UTC

(In reply to Omer Yahud from comment #4)
> What a coincidence, we've started working on a fix for this last week
> unrelated to this bug.
> We have an open PR for it upstream:
> https://github.com/kubevirt/kubevirt/pull/4669

What about non-kubevirt/kubevirt component? Would they run well on nodes with kubernetes.io/os=windows ?

Can we use the new placement API to avoid running there (if needed)?

Comment 6 Kevin Wiesmueller 2020-12-21 13:18:35 UTC

The Kubernetes Documentation states:
> This can be problematic since a Windows container can only run on Windows and a Linux container can only run on Linux. The best practice is to use a nodeSelector.
https://kubernetes.io/docs/setup/production-environment/windows/user-guide-windows-containers/

So it seems like none of the components which are Linux containers would run on Windows VMs.
The PR mentioned above will pin all the components deployed by the kubevirt-operator to linux VMs.
I don't know about other components thought.
The placement API (if it is what I think) should be supported, so if HCO sets placement labels they will be propagated.

Comment 7 aschuett 2020-12-21 15:43:42 UTC

There is also a Jira ticket tracking this issue https://issues.redhat.com/browse/CNV-9017

Comment 8 Kevin Wiesmueller 2021-01-04 13:45:39 UTC

@bschmaus the upstream PR is merged.
If the customer needs a short term workaround they could try what I mentioned in this comment: https://github.com/kubevirt/kubevirt/issues/3134#issuecomment-746411058

Comment 9 guy chen 2021-01-19 10:04:28 UTC

I've tested the new OS label was added to the worker's :

oc describe node virt01-d2rcr-worker-0-7qwgk | grep "kubernetes.io/os"
                    beta.kubernetes.io/os=linux
                    kubernetes.io/os=linux

oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v2.6.0   OpenShift Virtualization   2.6.0     kubevirt-hyperconverged-operator.v2.5.2   Succeeded

Comment 12 errata-xmlrpc 2021-03-10 11:21:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Note You need to log in before you can comment on or make changes to this bug.