Bug 1915312

Summary: Prevent schedule Linux openshift-network-diagnostics pod on Windows node
Product: OpenShift Container Platform Reporter: gaoshang <sgao>
Component: NetworkingAssignee: Ricardo Carrillo Cruz <ricarril>
Networking sub component: ovn-kubernetes QA Contact: gaoshang <sgao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ricarril
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:52:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description gaoshang 2021-01-12 12:29:50 UTC
Description of problem:
Install an OCP cluster with Windows node, found openshift-network-diagnostics is trying to schedule pod on Windows node and failed

Version-Release number of selected component (if applicable):
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-10-070949   True        False         9h      Error while reconciling 4.7.0-0.nightly-2021-01-10-070949: the cluster operator network is degraded

How reproducible:

Steps to Reproduce:
1.Install an OCP cluster and bootstrap a Windows node
2.Check pod in project openshift-network-diagnostics, it is trying to schedule pod on Windows node and failed

# oc get pod -n openshift-network-diagnostics
NAME                                   READY   STATUS             RESTARTS   AGE
network-check-source-f694fbf9d-lbhs9   1/1     Running            0          9h
network-check-target-6jst9             1/1     Running            0          9h
network-check-target-btmvt             0/1     ImagePullBackOff   0          161m
network-check-target-jgkdt             1/1     Running            0          9h
network-check-target-kfpf4             1/1     Running            0          9h
network-check-target-n2spd             1/1     Running            0          9h
network-check-target-qjp5r             1/1     Running            0          9h
network-check-target-rb4kx             1/1     Running            0          9h
network-check-target-zwm7p             0/1     ImagePullBackOff   0          166

# oc describe pod/network-check-target-btmvt -n openshift-network-diagnostics
Name:         network-check-target-btmvt
Namespace:    openshift-network-diagnostics
Priority:     0
Node:         ip-10-0-142-195.us-east-2.compute.internal/
Start Time:   Tue, 12 Jan 2021 03:58:48 -0500
Labels:       app=network-check-target
Annotations:  openshift.io/scc: restricted
Status:       Pending
Controlled By:  DaemonSet/network-check-target
    Container ID:   
    Image:          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a755920f6c5713d96fdd82d6a6270affdfbd4618e21ff20fff1d1444e0e3873c
    Image ID:       
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
      cpu:        10m
      memory:     150Mi
    Readiness:    http-get http://:8080/ delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dwb5z (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dwb5z
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     op=Exists
  Type     Reason                  Age                         From                                                 Message
  ----     ------                  ----                        ----                                                 -------
  Normal   Scheduled               152m                                                                             Successfully assigned openshift-network-diagnostics/network-check-target-btmvt to ip-10-0-142-195.us-east-2.compute.internal
  Warning  FailedCreatePodSandBox  150m (x13 over 152m)        kubelet, ip-10-0-142-195.us-east-2.compute.internal  Failed to create pod sandbox: open c:\k\etc\resolv.conf: The system cannot find the file specified.
  Normal   SandboxChanged          149m (x3 over 149m)         kubelet, ip-10-0-142-195.us-east-2.compute.internal  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling                 148m (x3 over 149m)         kubelet, ip-10-0-142-195.us-east-2.compute.internal  Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a755920f6c5713d96fdd82d6a6270affdfbd4618e21ff20fff1d1444e0e3873c"
  Warning  Failed                  148m (x3 over 149m)         kubelet, ip-10-0-142-195.us-east-2.compute.internal  Failed to pull image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a755920f6c5713d96fdd82d6a6270affdfbd4618e21ff20fff1d1444e0e3873c": rpc error: code = Unknown desc = Error response from daemon: unauthorized: access to the requested resource is not authorized
  Warning  Failed                  <invalid> (x659 over 149m)  kubelet, ip-10-0-142-195.us-east-2.compute.internal  Error: ImagePullBackOff

Actual results:
It's trying to scheduled linux pod on Windows node

Expected results:
Linux pod should not be scheduled to Windows node

Additional info:

Comment 3 Anurag saxena 2021-01-15 18:38:11 UTC
@sgao Can you verify this bug? Thanks

Comment 4 gaoshang 2021-01-18 03:54:42 UTC
(In reply to Anurag saxena from comment #3)
> @sgao Can you verify this bug? Thanks

This bug has been verified on OCP 4.7.0-0.nightly-2021-01-17-211555 and passed, thanks.

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-17-211555   True        False         81m     Cluster version is 4.7.0-0.nightly-2021-01-17-211555

Steps to Reproduce:
1.Install an OCP cluster and bootstrap a Windows node.
2.Check pod in project openshift-network-diagnostics, network-check-target no longer schedule to Windows node.

# oc get pod -n openshift-network-diagnostics
NAME                                  READY   STATUS    RESTARTS   AGE
network-check-source-8b577f64-g8tkr   1/1     Running   0          82m
network-check-target-2vz2r            1/1     Running   0          102m
network-check-target-46jh6            1/1     Running   0          102m
network-check-target-6cjrm            1/1     Running   0          109m
network-check-target-mh7gr            1/1     Running   0          109m
network-check-target-vj2g2            1/1     Running   0          100m
network-check-target-zzwbn            1/1     Running   0          109m

Comment 7 errata-xmlrpc 2021-02-24 15:52:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.