Bug 1924383 - Degraded network operator during upgrade to 4.7.z
Summary: Degraded network operator during upgrade to 4.7.z
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Jacob Tanenbaum
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-03 03:30 UTC by Simon
Modified: 2021-02-24 15:58 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:58:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 973 0 None closed Bug 1924383: update the resource requests made by pods in openshift-network-diagnostics namespace 2021-02-16 21:14:26 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:58:57 UTC

Description Simon 2021-02-03 03:30:20 UTC
Description of problem:
After chain of upgrades 4.2.36 -> 4.3.40 -> 4.4.33-> 4.5.29 -> 4.6.13 -> 4.7.z network operator is in DEGRADED state.

Version-Release number of selected component (if applicable):
Cloud GCP (IPI), Clusterversion: 4.7.0-0.nightly-2021-02-02-164630

oc get machines -n openshift-machine-api
NAME                     PHASE     TYPE            REGION        ZONE            AGE
skorda-9xgxx-m-0         Running   n1-standard-4   us-central1   us-central1-a   12h
skorda-9xgxx-m-1         Running   n1-standard-4   us-central1   us-central1-b   12h
skorda-9xgxx-m-2         Running   n1-standard-4   us-central1   us-central1-c   12h
skorda-9xgxx-w-a-tlg5f   Running   n1-standard-4   us-central1   us-central1-a   12h
skorda-9xgxx-w-b-45s5n   Running   n1-standard-4   us-central1   us-central1-b   12h

How reproducible:
1 on 1 attempt. I'm planning to run more similar tests.

Steps to Reproduce:
1. Install OCP on GCP cloud:
install-config:

apiVersion: v1
controlPlane:
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 2
metadata:
  name: your-cluster-name
platform:
  gcp:
    region: us-central1
    projectID: openshift-qe
pullSecret: <your_pull_secret>
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
baseDomain: qe.gcp.devcluster.openshift.com
sshKey: 'your ssh-rsa key'

2. Load cluster with example projects:

for i in $(seq 1 20)
do
  oc new-project test$i
  oc label ns test$i purpose=test
  oc new-app nodejs-mongodb-example
done

3. upgrade cluster:

oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"}}' --type=merge
oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release@sha256:9ff90174a170379e90a9ead6e0d8cf6f439004191f80762764a5ca3dbaab01dc --allow-explicit-upgrade --force
# update OK now 4.3.40 -> 4.4.33
oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64 --allow-explicit-upgrade --force
# Update OK now 4.4.33 -> 4.5.29
oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release:4.5.29-x86_64 --allow-explicit-upgrade --force
# Update OK now 4.5.29 -> 4.6.13
oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release:4.6.13-x86_64 --allow-explicit-upgrade --force
# Update OK now 4.6.13 -> 4.7.0-0.nightly-2021-02-02-164630
oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.7.0-0.nightly-2021-02-02-164630 --allow-explicit-upgrade --force
# Update FAILED

Actual results:

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.13    True        True          3h41m   Unable to apply 4.7.0-0.nightly-2021-02-02-164630: the cluster operator network is degraded

oc get clusteroperators.config.openshift.io 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h48m
baremetal                                  4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h15m
cloud-credential                           4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
cluster-autoscaler                         4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
config-operator                            4.7.0-0.nightly-2021-02-02-164630   True        False         False      6h2m
console                                    4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h9m
csi-snapshot-controller                    4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h10m
dns                                        4.6.13                              True        False         False      4h16m
etcd                                       4.7.0-0.nightly-2021-02-02-164630   True        False         False      7h14m
image-registry                             4.7.0-0.nightly-2021-02-02-164630   True        False         False      4h3m
ingress                                    4.7.0-0.nightly-2021-02-02-164630   True        False         False      4h32m
insights                                   4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
kube-apiserver                             4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
kube-controller-manager                    4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
kube-scheduler                             4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
kube-storage-version-migrator              4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h9m
machine-api                                4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
machine-approver                           4.7.0-0.nightly-2021-02-02-164630   True        False         False      5h57m
machine-config                             4.6.13                              True        False         False      3h48m
marketplace                                4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h9m
monitoring                                 4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h56m
network                                    4.7.0-0.nightly-2021-02-02-164630   False       True          True       3h7m
node-tuning                                4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h10m
openshift-apiserver                        4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h24m
openshift-controller-manager               4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h8m
openshift-samples                          4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h10m
operator-lifecycle-manager                 4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h9m
service-ca                                 4.7.0-0.nightly-2021-02-02-164630   True        False         False      12h
storage                                    4.7.0-0.nightly-2021-02-02-164630   True        False         False      3h9m

oc describe clusteroperator network
Name:         network
Namespace:
Labels:       <none>
Annotations:  network.operator.openshift.io/last-seen-state:
                {"DaemonsetStates":[{"Namespace":"openshift-network-diagnostics","Name":"network-check-target","LastSeenStatus":{"currentNumberScheduled":...
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-02-02T14:05:01Z
  Generation:          1
  Resource Version:    303090
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/network
  UID:                 a407f9d8-655f-11eb-8c9f-42010a000005
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-02-03T00:08:40Z
    Message:               DaemonSet "openshift-network-diagnostics/network-check-target" rollout is not making progress - last change 2021-02-02T23:56:57Z
    Reason:                RolloutHung
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-02-02T14:05:01Z
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2021-02-02T23:53:40Z
    Message:               DaemonSet "openshift-network-diagnostics/network-check-target" is not available (awaiting 2 nodes)
    Reason:                Deploying
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2021-02-02T23:53:40Z
    Message:               The network is starting up
    Reason:                Startup
    Status:                False
    Type:                  Available
    Last Transition Time:  2021-02-02T23:53:40Z
    Status:                False
    Type:                  ManagementStateDegraded
  Extension:               <nil>
  Related Objects:
...


Expected results:
Network operator should be fully AVAILABLE

Comment 5 zhaozhanqi 2021-02-07 03:35:43 UTC
Check build 4.7.0-0.nightly-2021-02-06-084550, network-check-target now is using less memory  



            timeoutSeconds: 10
          resources:
            requests:
              cpu: 10m
              memory: 15Mi


Move this bug to verified.

Comment 8 errata-xmlrpc 2021-02-24 15:58:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.