1920769 – [Upgrade] OCP upgrade from 4.6.13 to 4.7.0-fc.4 for "network-check-target" failed when "defaultNodeSelector" is set

Bug 1920769 - [Upgrade] OCP upgrade from 4.6.13 to 4.7.0-fc.4 for "network-check-target" failed when "defaultNodeSelector" is set

Summary: [Upgrade] OCP upgrade from 4.6.13 to 4.7.0-fc.4 for "network-check-target" fa...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jacob Tanenbaum
QA Contact:	huirwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-27 02:08 UTC by huirwang
Modified:	2021-02-24 15:57 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:56:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 965	None	closed	Bug 1920769: override the default node-selector for network-connectivity check	2021-02-07 01:19:25 UTC
Github	openshift cluster-network-operator pull 972	None	closed	Bug 1920769: Fix the spacing for the node-selector override annotation	2021-02-09 11:01:54 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:56:59 UTC

Description huirwang 2021-01-27 02:08:17 UTC

Description of problem:

Version-Release number of selected component (if applicable):

4.6.13 to 4.7.0-fc.4

How reproducible:

Always

Steps to Reproduce:
1.Set up OpenShift Container Platform Cluster with version  4.6.13

2.  Apply 'node-workload=app' to Worker Nodes , then update the scheduler.

 oc get schedulers.config.openshift.io  cluster -ojson| jq .spec

  "defaultNodeSelector": "node-workload=app",
 
 3. do the upgrade to  4.7.0-fc.4
 
 Actual Results:
 Upgrade was blocked due to the pods in openshift-network-diagnostics were in pending status.

oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-fc.4   True        False         46m     Error while reconciling 4.7.0-fc.4: the cluster operator network is degraded
 
 
oc get -n openshift-network-diagnostics pod
NAME                                   READY   STATUS    RESTARTS   AGE
network-check-source-c587cc78f-29t6k   1/1     Running   0          14h
network-check-target-2zspv             1/1     Running   0          14h
network-check-target-5v52l             1/1     Running   0          14h
network-check-target-gcv2s             0/1     Pending   0          14h
network-check-target-jqqkd             0/1     Pending   0          14h
network-check-target-n42wt             0/1     Pending   0          14h
network-check-target-pxl29             1/1     Running   0          14h

 oc describe pod network-check-target-5v52l -n openshift-network-diagnostics
Name:         network-check-target-5v52l
Namespace:    openshift-network-diagnostics
Priority:     0
Node:         ip-10-0-212-65.ap-northeast-1.compute.internal/10.0.212.65
Start Time:   Tue, 26 Jan 2021 18:51:27 +0800
Labels:       app=network-check-target
              controller-revision-hash=8f9b6469
              kubernetes.io/os=linux
              pod-template-generation=1
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.44"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.44"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: restricted
Status:       Running
IP:           10.129.2.44
IPs:
  IP:           10.129.2.44
Controlled By:  DaemonSet/network-check-target
Containers:
  network-check-target-container:
    Container ID:   cri-o://b46a73c896b9aff667fdda2d71f2589621cec80edc1126785f1900a3947def7d
    Image:          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52181a2fba40eb3ddf1d3ee953633be686898ec34be819f50a15688420895a93
    Image ID:       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52181a2fba40eb3ddf1d3ee953633be686898ec34be819f50a15688420895a93
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 26 Jan 2021 18:51:29 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     150Mi
    Readiness:    http-get http://:8080/ delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dvrf9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-dvrf9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dvrf9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
                 node-workload=app
Tolerations:     
Events:          <none>
huiran-mac:script hrwang$ oc describe pod network-check-target-gcv2s -n openshift-network-diagnostics
Name:           network-check-target-gcv2s
Namespace:      openshift-network-diagnostics
Priority:       0
Node:           <none>
Labels:         app=network-check-target
                controller-revision-hash=8f9b6469
                kubernetes.io/os=linux
                pod-template-generation=1
Annotations:    openshift.io/scc: restricted
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  DaemonSet/network-check-target
Containers:
  network-check-target-container:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52181a2fba40eb3ddf1d3ee953633be686898ec34be819f50a15688420895a93
    Port:       8080/TCP
    Host Port:  0/TCP
    Requests:
      cpu:        10m
      memory:     150Mi
    Readiness:    http-get http://:8080/ delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dvrf9 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-dvrf9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dvrf9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
                 node-workload=app
Tolerations:     
Events:
  Type     Reason            Age        From  Message
  ----     ------            ----       ----  -------
  Warning  FailedScheduling  <unknown>        0/6 nodes are available: 6 node(s) didn't match Pod's node affinity.

Expected Results:
Upgrade successfully.

Comment 1 Qin Ping 2021-01-27 02:15:01 UTC

With the following workaround, Pods are scheduled successfully, and networking co is upgraded successfully.

$ oc annotate ns openshift-network-diagnostics openshift.io/node-selector=
$ oc get ns openshift-network-diagnostics -ojson|jq .metadata.annotations
{
  "openshift.io/node-selector": "",
  "openshift.io/sa.scc.mcs": "s0:c25,c15",
  "openshift.io/sa.scc.supplemental-groups": "1000630000/10000",
  "openshift.io/sa.scc.uid-range": "1000630000/10000"
}

$ oc get -n openshift-network-diagnostics pod
NAME                                   READY   STATUS    RESTARTS   AGE
network-check-source-c587cc78f-29t6k   1/1     Running   0          15h
network-check-target-2zspv             1/1     Running   0          15h
network-check-target-5v52l             1/1     Running   0          15h
network-check-target-92q4n             0/1     Running   0          34s
network-check-target-jswwq             0/1     Running   0          35s
network-check-target-pxl29             1/1     Running   0          15h
network-check-target-tj4fh             0/1     Running   0          35s

$ oc get co network
NAME      VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE
network   4.7.0-fc.4   True        False         False      3m35s

Comment 9 errata-xmlrpc 2021-02-24 15:56:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.