Bug 1920769

Summary: [Upgrade] OCP upgrade from 4.6.13 to 4.7.0-fc.4 for "network-check-target" failed when "defaultNodeSelector" is set
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Jacob Tanenbaum <jtanenba>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: jtanenba, piqin
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:56:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2021-01-27 02:08:17 UTC
Description of problem:

Version-Release number of selected component (if applicable):

4.6.13 to 4.7.0-fc.4

How reproducible:

Always

Steps to Reproduce:
1.Set up OpenShift Container Platform Cluster with version  4.6.13

2.  Apply 'node-workload=app' to Worker Nodes , then update the scheduler.

 oc get schedulers.config.openshift.io  cluster -ojson| jq .spec

  "defaultNodeSelector": "node-workload=app",
 
 3. do the upgrade to  4.7.0-fc.4
 
 Actual Results:
 Upgrade was blocked due to the pods in openshift-network-diagnostics were in pending status.

oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-fc.4   True        False         46m     Error while reconciling 4.7.0-fc.4: the cluster operator network is degraded
 
 
oc get -n openshift-network-diagnostics pod
NAME                                   READY   STATUS    RESTARTS   AGE
network-check-source-c587cc78f-29t6k   1/1     Running   0          14h
network-check-target-2zspv             1/1     Running   0          14h
network-check-target-5v52l             1/1     Running   0          14h
network-check-target-gcv2s             0/1     Pending   0          14h
network-check-target-jqqkd             0/1     Pending   0          14h
network-check-target-n42wt             0/1     Pending   0          14h
network-check-target-pxl29             1/1     Running   0          14h

 oc describe pod network-check-target-5v52l -n openshift-network-diagnostics
Name:         network-check-target-5v52l
Namespace:    openshift-network-diagnostics
Priority:     0
Node:         ip-10-0-212-65.ap-northeast-1.compute.internal/10.0.212.65
Start Time:   Tue, 26 Jan 2021 18:51:27 +0800
Labels:       app=network-check-target
              controller-revision-hash=8f9b6469
              kubernetes.io/os=linux
              pod-template-generation=1
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.44"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.44"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: restricted
Status:       Running
IP:           10.129.2.44
IPs:
  IP:           10.129.2.44
Controlled By:  DaemonSet/network-check-target
Containers:
  network-check-target-container:
    Container ID:   cri-o://b46a73c896b9aff667fdda2d71f2589621cec80edc1126785f1900a3947def7d
    Image:          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52181a2fba40eb3ddf1d3ee953633be686898ec34be819f50a15688420895a93
    Image ID:       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52181a2fba40eb3ddf1d3ee953633be686898ec34be819f50a15688420895a93
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 26 Jan 2021 18:51:29 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     150Mi
    Readiness:    http-get http://:8080/ delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dvrf9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-dvrf9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dvrf9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
                 node-workload=app
Tolerations:     
Events:          <none>
huiran-mac:script hrwang$ oc describe pod network-check-target-gcv2s -n openshift-network-diagnostics
Name:           network-check-target-gcv2s
Namespace:      openshift-network-diagnostics
Priority:       0
Node:           <none>
Labels:         app=network-check-target
                controller-revision-hash=8f9b6469
                kubernetes.io/os=linux
                pod-template-generation=1
Annotations:    openshift.io/scc: restricted
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  DaemonSet/network-check-target
Containers:
  network-check-target-container:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52181a2fba40eb3ddf1d3ee953633be686898ec34be819f50a15688420895a93
    Port:       8080/TCP
    Host Port:  0/TCP
    Requests:
      cpu:        10m
      memory:     150Mi
    Readiness:    http-get http://:8080/ delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dvrf9 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-dvrf9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dvrf9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
                 node-workload=app
Tolerations:     
Events:
  Type     Reason            Age        From  Message
  ----     ------            ----       ----  -------
  Warning  FailedScheduling  <unknown>        0/6 nodes are available: 6 node(s) didn't match Pod's node affinity.

Expected Results:
Upgrade successfully.

Comment 1 Qin Ping 2021-01-27 02:15:01 UTC
With the following workaround, Pods are scheduled successfully, and networking co is upgraded successfully.

$ oc annotate ns openshift-network-diagnostics openshift.io/node-selector=
$ oc get ns openshift-network-diagnostics -ojson|jq .metadata.annotations
{
  "openshift.io/node-selector": "",
  "openshift.io/sa.scc.mcs": "s0:c25,c15",
  "openshift.io/sa.scc.supplemental-groups": "1000630000/10000",
  "openshift.io/sa.scc.uid-range": "1000630000/10000"
}

$ oc get -n openshift-network-diagnostics pod
NAME                                   READY   STATUS    RESTARTS   AGE
network-check-source-c587cc78f-29t6k   1/1     Running   0          15h
network-check-target-2zspv             1/1     Running   0          15h
network-check-target-5v52l             1/1     Running   0          15h
network-check-target-92q4n             0/1     Running   0          34s
network-check-target-jswwq             0/1     Running   0          35s
network-check-target-pxl29             1/1     Running   0          15h
network-check-target-tj4fh             0/1     Running   0          35s

$ oc get co network
NAME      VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE
network   4.7.0-fc.4   True        False         False      3m35s

Comment 9 errata-xmlrpc 2021-02-24 15:56:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633