Description of problem: Customer has a 4.6.6 OCP environment with a combination of Linux and Windows worker nodes. When CNV is installed the operator attempts to run pods to provide the services for CNV on the Windows worker nodes. Version-Release number of selected component (if applicable): 2.5.1 How reproducible: 100% Steps to Reproduce: 1. Install OCP 4.6.6 with both Linux and Windows workers 2. Attempt to install CNV 2.5.1 3. Actual results: Operator ends up in a perpetual installing state since the pods never come up on the Windows workers Expected results: Operator should skip trying to install necessary pods on Windows workers. I know node selectors could be used here but the customer should not have to do anything additional out of the box. Additional info:
This will become a much bigger problem when Windows Containers are available at the end of the month.
@oyahud Should this be automatically done by node labeller at run-time?
What a coincidence, we've started working on a fix for this last week unrelated to this bug. We have an open PR for it upstream: https://github.com/kubevirt/kubevirt/pull/4669 Since we are already working on this, I will take it on our team
(In reply to Omer Yahud from comment #4) > What a coincidence, we've started working on a fix for this last week > unrelated to this bug. > We have an open PR for it upstream: > https://github.com/kubevirt/kubevirt/pull/4669 What about non-kubevirt/kubevirt component? Would they run well on nodes with kubernetes.io/os=windows ? Can we use the new placement API to avoid running there (if needed)?
The Kubernetes Documentation states: > This can be problematic since a Windows container can only run on Windows and a Linux container can only run on Linux. The best practice is to use a nodeSelector. https://kubernetes.io/docs/setup/production-environment/windows/user-guide-windows-containers/ So it seems like none of the components which are Linux containers would run on Windows VMs. The PR mentioned above will pin all the components deployed by the kubevirt-operator to linux VMs. I don't know about other components thought. The placement API (if it is what I think) should be supported, so if HCO sets placement labels they will be propagated.
There is also a Jira ticket tracking this issue https://issues.redhat.com/browse/CNV-9017
@bschmaus the upstream PR is merged. If the customer needs a short term workaround they could try what I mentioned in this comment: https://github.com/kubevirt/kubevirt/issues/3134#issuecomment-746411058
I've tested the new OS label was added to the worker's : oc describe node virt01-d2rcr-worker-0-7qwgk | grep "kubernetes.io/os" beta.kubernetes.io/os=linux kubernetes.io/os=linux oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v2.6.0 OpenShift Virtualization 2.6.0 kubevirt-hyperconverged-operator.v2.5.2 Succeeded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799