Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1972984

Summary:	ovnkube-node pod is in CrashLoopBackoff state when upgrading from 4.7 to 4.8
Product:	OpenShift Container Platform	Reporter:	Yang Yang <yanyang>
Component:	Networking	Assignee:	Christoph Stäbler <cstabler>
Networking sub component:	ovn-kubernetes	QA Contact:	Anurag saxena <anusaxen>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	anbhat, jialiu, mifiedle, vpickard
Version:	4.8
Target Milestone:	---
Target Release:	4.8.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-01-11 22:31:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yang Yang 2021-06-17 02:50:57 UTC

Description of problem:
ovnkube-node pod is in CrashLoopBackoff state when upgrading from 4.7 to 4.8.
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-lxq7v is in CrashLoopBackOff State

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-06-14-145150 

How reproducible:
1/1

Steps to Reproduce:
1. Install an 4.7 OVN enabled OCP cluster with RHEL nodes
2. Upgrade from 4.7.0-0.nightly-2021-06-12-151209 to 4.8.0-0.nightly-2021-06-14-145150
3.

Actual results:
#oc get co:NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
06-15 10:11:51.088  authentication                             4.8.0-0.nightly-2021-06-14-145150   True        False         False      2m3s
06-15 10:11:51.088  baremetal                                  4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h40m
06-15 10:11:51.088  cloud-credential                           4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h38m
06-15 10:11:51.088  cluster-autoscaler                         4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h40m
06-15 10:11:51.088  config-operator                            4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h40m
06-15 10:11:51.088  console                                    4.8.0-0.nightly-2021-06-14-145150   True        False         False      51s
06-15 10:11:51.088  csi-snapshot-controller                    4.8.0-0.nightly-2021-06-14-145150   True        False         False      123m
06-15 10:11:51.088  dns                                        4.8.0-0.nightly-2021-06-14-145150   True        True          False      34m
06-15 10:11:51.088  etcd                                       4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h38m
06-15 10:11:51.088  image-registry                             4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h32m
06-15 10:11:51.088  ingress                                    4.8.0-0.nightly-2021-06-14-145150   True        False         False      74m
06-15 10:11:51.088  insights                                   4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h34m
06-15 10:11:51.088  kube-apiserver                             4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h37m
06-15 10:11:51.088  kube-controller-manager                    4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h38m
06-15 10:11:51.088  kube-scheduler                             4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h37m
06-15 10:11:51.088  kube-storage-version-migrator              4.8.0-0.nightly-2021-06-14-145150   True        False         False      34m
06-15 10:11:51.088  machine-api                                4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h34m
06-15 10:11:51.088  machine-approver                           4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h39m
06-15 10:11:51.088  machine-config                             4.8.0-0.nightly-2021-06-14-145150   False       False         True       5m54s
06-15 10:11:51.088  marketplace                                4.8.0-0.nightly-2021-06-14-145150   True        False         False      133m
06-15 10:11:51.088  monitoring                                 4.8.0-0.nightly-2021-06-14-145150   False       True          True       7m44s
06-15 10:11:51.088  network                                    4.8.0-0.nightly-2021-06-14-145150   True        True          True       3h40m
06-15 10:11:51.088  node-tuning                                4.8.0-0.nightly-2021-06-14-145150   True        False         False      75m
06-15 10:11:51.088  openshift-apiserver                        4.8.0-0.nightly-2021-06-14-145150   True        False         False      24m
06-15 10:11:51.088  openshift-controller-manager               4.8.0-0.nightly-2021-06-14-145150   True        False         False      74m
06-15 10:11:51.088  openshift-samples                          4.8.0-0.nightly-2021-06-14-145150   True        False         False      75m
06-15 10:11:51.089  operator-lifecycle-manager                 4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h40m
06-15 10:11:51.089  operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h39m
06-15 10:11:51.089  operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-06-14-145150   True        False         False      45m
06-15 10:11:51.089  service-ca                                 4.8.0-0.nightly-2021-06-14-145150   True        False         False      3h40m
06-15 10:11:51.089  storage                                    4.8.0-0.nightly-2021-06-14-145150   True        True          False      28m

06-15 10:13:02.732  Name:         network
06-15 10:13:02.732  Namespace:    
06-15 10:13:02.732  Labels:       <none>
06-15 10:13:02.732  Annotations:  include.release.openshift.io/ibm-cloud-managed: true
06-15 10:13:02.732                include.release.openshift.io/self-managed-high-availability: true
06-15 10:13:02.732                include.release.openshift.io/single-node-developer: true
06-15 10:13:02.732                network.operator.openshift.io/last-seen-state:
06-15 10:13:02.732                  {"DaemonsetStates":[{"Namespace":"openshift-ovn-kubernetes","Name":"ovnkube-node","LastSeenStatus":{"currentNumberScheduled":8,"numberMiss...
06-15 10:13:02.732  API Version:  config.openshift.io/v1
06-15 10:13:02.732  Kind:         ClusterOperator
06-15 10:13:02.732  Metadata:
06-15 10:13:02.732    Creation Timestamp:  2021-06-14T22:19:38Z
06-15 10:13:02.732    Generation:          1
06-15 10:13:02.732    Managed Fields:
06-15 10:13:02.732      API Version:  config.openshift.io/v1
06-15 10:13:02.732      Fields Type:  FieldsV1
06-15 10:13:02.732      fieldsV1:
06-15 10:13:02.732        f:metadata:
06-15 10:13:02.732          f:annotations:
06-15 10:13:02.732            .:
06-15 10:13:02.732            f:include.release.openshift.io/ibm-cloud-managed:
06-15 10:13:02.732            f:include.release.openshift.io/self-managed-high-availability:
06-15 10:13:02.732            f:include.release.openshift.io/single-node-developer:
06-15 10:13:02.732        f:spec:
06-15 10:13:02.732        f:status:
06-15 10:13:02.732          .:
06-15 10:13:02.732          f:extension:
06-15 10:13:02.732      Manager:      cluster-version-operator
06-15 10:13:02.732      Operation:    Update
06-15 10:13:02.732      Time:         2021-06-14T22:19:38Z
06-15 10:13:02.732      API Version:  config.openshift.io/v1
06-15 10:13:02.732      Fields Type:  FieldsV1
06-15 10:13:02.732      fieldsV1:
06-15 10:13:02.732        f:metadata:
06-15 10:13:02.732          f:annotations:
06-15 10:13:02.732            f:network.operator.openshift.io/last-seen-state:
06-15 10:13:02.732        f:status:
06-15 10:13:02.732          f:conditions:
06-15 10:13:02.732          f:relatedObjects:
06-15 10:13:02.732          f:versions:
06-15 10:13:02.732      Manager:         cluster-network-operator
06-15 10:13:02.732      Operation:       Update
06-15 10:13:02.732      Time:            2021-06-14T22:31:19Z
06-15 10:13:02.732    Resource Version:  154657
06-15 10:13:02.732    UID:               99b64d5c-0a0f-4856-8d0f-9335f1ba8d3d
06-15 10:13:02.732  Spec:
06-15 10:13:02.732  Status:
06-15 10:13:02.732    Conditions:
06-15 10:13:02.732      Last Transition Time:  2021-06-15T01:56:18Z
06-15 10:13:02.732      Message:               DaemonSet "openshift-multus/multus" rollout is not making progress - last change 2021-06-15T01:57:23Z
06-15 10:13:02.732  DaemonSet "openshift-multus/multus-additional-cni-plugins" rollout is not making progress - last change 2021-06-15T01:57:23Z
06-15 10:13:02.732  DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-lxq7v is in CrashLoopBackOff State
06-15 10:13:02.732  DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2021-06-15T01:54:44Z
06-15 10:13:02.732      Reason:                RolloutHung
06-15 10:13:02.732      Status:                True
06-15 10:13:02.732      Type:                  Degraded
06-15 10:13:02.732      Last Transition Time:  2021-06-14T22:27:49Z
06-15 10:13:02.732      Status:                False
06-15 10:13:02.732      Type:                  ManagementStateDegraded
06-15 10:13:02.732      Last Transition Time:  2021-06-14T22:27:49Z
06-15 10:13:02.732      Status:                True
06-15 10:13:02.732      Type:                  Upgradeable
06-15 10:13:02.732      Last Transition Time:  2021-06-15T01:54:43Z
06-15 10:13:02.732      Message:               DaemonSet "openshift-multus/multus" is not available (awaiting 1 nodes)
06-15 10:13:02.732  DaemonSet "openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
06-15 10:13:02.732  DaemonSet "openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
06-15 10:13:02.732  DaemonSet "openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
06-15 10:13:02.732  DaemonSet "openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
06-15 10:13:02.732      Reason:                Deploying
06-15 10:13:02.732      Status:                True
06-15 10:13:02.732      Type:                  Progressing
06-15 10:13:02.732      Last Transition Time:  2021-06-14T22:31:19Z
06-15 10:13:02.732      Status:                True
06-15 10:13:02.732      Type:                  Available
06-15 10:13:02.732    Extension:               <nil>
06-15 10:13:02.732    Related Objects:
06-15 10:13:02.732      Group:      
06-15 10:13:02.732      Name:       applied-cluster
06-15 10:13:02.732      Namespace:  openshift-network-operator
06-15 10:13:02.732      Resource:   configmaps
06-15 10:13:02.732      Group:      apiextensions.k8s.io
06-15 10:13:02.732      Name:       network-attachment-definitions.k8s.cni.cncf.io
06-15 10:13:02.732      Resource:   customresourcedefinitions
06-15 10:13:02.732      Group:      apiextensions.k8s.io
06-15 10:13:02.732      Name:       ippools.whereabouts.cni.cncf.io
06-15 10:13:02.732      Resource:   customresourcedefinitions
06-15 10:13:02.732      Group:      apiextensions.k8s.io
06-15 10:13:02.732      Name:       overlappingrangeipreservations.whereabouts.cni.cncf.io
06-15 10:13:02.732      Resource:   customresourcedefinitions
06-15 10:13:02.732      Group:      
06-15 10:13:02.732      Name:       openshift-multus
06-15 10:13:02.732      Resource:   namespaces
06-15 10:13:02.732      Group:      rbac.authorization.k8s.io
06-15 10:13:02.732      Name:       multus
06-15 10:13:02.732      Resource:   clusterroles
06-15 10:13:02.732      Group:      
06-15 10:13:02.732      Name:       multus
06-15 10:13:02.732      Namespace:  openshift-multus
06-15 10:13:02.732      Resource:   serviceaccounts
06-15 10:13:02.732      Group:      rbac.authorization.k8s.io
06-15 10:13:02.732      Name:       multus
06-15 10:13:02.732      Resource:   clusterrolebindings
06-15 10:13:02.732      Group:      rbac.authorization.k8s.io
06-15 10:13:02.732      Name:       multus-whereabouts
06-15 10:13:02.732      Resource:   clusterrolebindings
06-15 10:13:02.732      Group:      rbac.authorization.k8s.io
06-15 10:13:02.732      Name:       whereabouts-cni
06-15 10:13:02.732      Resource:   clusterroles
06-15 10:13:02.732      Group:      
06-15 10:13:02.732      Name:       cni-binary-copy-script
06-15 10:13:02.732      Namespace:  openshift-multus
06-15 10:13:02.732      Resource:   configmaps
06-15 10:13:02.732      Group:      apps
06-15 10:13:02.732      Name:       multus
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   daemonsets
06-15 10:13:02.733      Group:      apps
06-15 10:13:02.733      Name:       multus-additional-cni-plugins
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   daemonsets
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       metrics-daemon-sa
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   serviceaccounts
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       metrics-daemon-role
06-15 10:13:02.733      Resource:   clusterroles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       metrics-daemon-sa-rolebinding
06-15 10:13:02.733      Resource:   clusterrolebindings
06-15 10:13:02.733      Group:      apps
06-15 10:13:02.733      Name:       network-metrics-daemon
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   daemonsets
06-15 10:13:02.733      Group:      monitoring.coreos.com
06-15 10:13:02.733      Name:       monitor-network
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   servicemonitors
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       network-metrics-service
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   services
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       prometheus-k8s
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   roles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       prometheus-k8s
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   rolebindings
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       multus-admission-controller
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   services
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       multus-admission-controller-webhook
06-15 10:13:02.733      Resource:   clusterroles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       multus-admission-controller-webhook
06-15 10:13:02.733      Resource:   clusterrolebindings
06-15 10:13:02.733      Group:      admissionregistration.k8s.io
06-15 10:13:02.733      Name:       multus.openshift.io
06-15 10:13:02.733      Resource:   validatingwebhookconfigurations
06-15 10:13:02.733      Group:      apps
06-15 10:13:02.733      Name:       multus-admission-controller
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   daemonsets
06-15 10:13:02.733      Group:      monitoring.coreos.com
06-15 10:13:02.733      Name:       monitor-multus-admission-controller
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   servicemonitors
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       prometheus-k8s
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   roles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       prometheus-k8s
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   rolebindings
06-15 10:13:02.733      Group:      monitoring.coreos.com
06-15 10:13:02.733      Name:       prometheus-k8s-rules
06-15 10:13:02.733      Namespace:  openshift-multus
06-15 10:13:02.733      Resource:   prometheusrules
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes
06-15 10:13:02.733      Resource:   namespaces
06-15 10:13:02.733      Group:      apiextensions.k8s.io
06-15 10:13:02.733      Name:       egressfirewalls.k8s.ovn.org
06-15 10:13:02.733      Resource:   customresourcedefinitions
06-15 10:13:02.733      Group:      apiextensions.k8s.io
06-15 10:13:02.733      Name:       egressips.k8s.ovn.org
06-15 10:13:02.733      Resource:   customresourcedefinitions
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       ovn-kubernetes-node
06-15 10:13:02.733      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.733      Resource:   serviceaccounts
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes-node
06-15 10:13:02.733      Resource:   clusterroles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes-node
06-15 10:13:02.733      Resource:   clusterrolebindings
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       ovn-kubernetes-controller
06-15 10:13:02.733      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.733      Resource:   serviceaccounts
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes-controller
06-15 10:13:02.733      Resource:   clusterroles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes-controller
06-15 10:13:02.733      Resource:   clusterrolebindings
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes-sbdb
06-15 10:13:02.733      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.733      Resource:   roles
06-15 10:13:02.733      Group:      rbac.authorization.k8s.io
06-15 10:13:02.733      Name:       openshift-ovn-kubernetes-sbdb
06-15 10:13:02.733      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.733      Resource:   rolebindings
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       ovnkube-config
06-15 10:13:02.733      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.733      Resource:   configmaps
06-15 10:13:02.733      Group:      
06-15 10:13:02.733      Name:       ovnkube-db
06-15 10:13:02.733      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   services
06-15 10:13:02.734      Group:      network.operator.openshift.io
06-15 10:13:02.734      Name:       ovn
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   operatorpkis
06-15 10:13:02.734      Group:      network.operator.openshift.io
06-15 10:13:02.734      Name:       signer
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   operatorpkis
06-15 10:13:02.734      Group:      flowcontrol.apiserver.k8s.io
06-15 10:13:02.734      Name:       openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   flowschemas
06-15 10:13:02.734      Group:      monitoring.coreos.com
06-15 10:13:02.734      Name:       master-rules
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   prometheusrules
06-15 10:13:02.734      Group:      monitoring.coreos.com
06-15 10:13:02.734      Name:       networking-rules
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   prometheusrules
06-15 10:13:02.734      Group:      monitoring.coreos.com
06-15 10:13:02.734      Name:       monitor-ovn-master-metrics
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   servicemonitors
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       ovn-kubernetes-master
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   services
06-15 10:13:02.734      Group:      monitoring.coreos.com
06-15 10:13:02.734      Name:       monitor-ovn-node
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   servicemonitors
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       ovn-kubernetes-node
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   services
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       prometheus-k8s
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   roles
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       prometheus-k8s
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   rolebindings
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       openshift-host-network
06-15 10:13:02.734      Resource:   namespaces
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       host-network-namespace-quotas
06-15 10:13:02.734      Namespace:  openshift-host-network
06-15 10:13:02.734      Resource:   resourcequotas
06-15 10:13:02.734      Group:      policy
06-15 10:13:02.734      Name:       ovn-raft-quorum-guard
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   poddisruptionbudgets
06-15 10:13:02.734      Group:      apps
06-15 10:13:02.734      Name:       ovnkube-master
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   daemonsets
06-15 10:13:02.734      Group:      apps
06-15 10:13:02.734      Name:       ovnkube-node
06-15 10:13:02.734      Namespace:  openshift-ovn-kubernetes
06-15 10:13:02.734      Resource:   daemonsets
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       openshift-network-diagnostics
06-15 10:13:02.734      Resource:   namespaces
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       network-diagnostics
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   serviceaccounts
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       network-diagnostics
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   roles
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       network-diagnostics
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   rolebindings
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       network-diagnostics
06-15 10:13:02.734      Resource:   clusterroles
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       network-diagnostics
06-15 10:13:02.734      Resource:   clusterrolebindings
06-15 10:13:02.734      Group:      rbac.authorization.k8s.io
06-15 10:13:02.734      Name:       network-diagnostics
06-15 10:13:02.734      Namespace:  kube-system
06-15 10:13:02.734      Resource:   rolebindings
06-15 10:13:02.734      Group:      apps
06-15 10:13:02.734      Name:       network-check-source
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   deployments
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       network-check-source
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   services
06-15 10:13:02.734      Group:      monitoring.coreos.com
06-15 10:13:02.734      Name:       network-check-source
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   servicemonitors
06-15 10:13:02.734      Group:      apps
06-15 10:13:02.734      Name:       network-check-target
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   daemonsets
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       network-check-target
06-15 10:13:02.734      Namespace:  openshift-network-diagnostics
06-15 10:13:02.734      Resource:   services
06-15 10:13:02.734      Group:      
06-15 10:13:02.734      Name:       openshift-network-operator
06-15 10:13:02.734      Resource:   namespaces
06-15 10:13:02.734      Group:      operator.openshift.io
06-15 10:13:02.734      Name:       cluster
06-15 10:13:02.734      Resource:   networks
06-15 10:13:02.734      Group:      networking.k8s.io
06-15 10:13:02.734      Name:       
06-15 10:13:02.734      Resource:   NetworkPolicy
06-15 10:13:02.734      Group:      k8s.ovn.org
06-15 10:13:02.734      Name:       
06-15 10:13:02.734      Resource:   EgressFirewall
06-15 10:13:02.734      Group:      k8s.ovn.org
06-15 10:13:02.734      Name:       
06-15 10:13:02.734      Resource:   EgressIP
06-15 10:13:02.734    Versions:
06-15 10:13:02.734      Name:     operator
06-15 10:13:02.734      Version:  4.8.0-0.nightly-2021-06-14-145150
06-15 10:13:02.734  Events:       <none>

Expected results:
Upgrade is successful.

Additional info:

Comment 1 Aniket Bhat 2021-06-18 21:28:17 UTC

How reproducible is this issue? Does it happen every single time?
It seems like the kube-rbac-proxy container and ovn-controller container are the ones failing. The kube-rbac-proxy fails with EADDRINUSE error and the ovn-controller container complains about another process being running as well.

Comment 2 Yang Yang 2021-06-21 02:01:08 UTC

(In reply to Aniket Bhat from comment #1)
> How reproducible is this issue? Does it happen every single time?

It's found in QE CI. So far it's attempted once and experienced once. The cluster has FIPs, OVN, ETcd Encryption, proxy and security token service enabled.

Comment 4 Aniket Bhat 2021-06-21 15:25:28 UTC

@yanyang can you attach a sosreport if you still have the cluster? If not, can you provide a cluster in this state or try to get a sosreport in addition to must-gather?

Comment 5 Yang Yang 2021-06-22 15:09:16 UTC

Attempted 3 times upgrade 4.7.17-x86_64--> 4.8.0-0.nightly-2021-06-17-222823, 4.7.17-x86_64--> 4.8.0-0.nightly-2021-06-21-175537, 4.7.0-0.nightly-2021-06-20-093308--> 4.8.0-0.nightly-2021-06-21-175537, all of them ran into a machine-config bug https://bugzilla.redhat.com/show_bug.cgi?id=1974403. It cannot be reproduced at the moment.

06-22 22:08:32.266  NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
06-22 22:08:32.266  version   4.7.17    True        True          3h1m    Unable to apply 4.8.0-0.nightly-2021-06-17-222823: the cluster operator machine-config has not yet successfully rolled out
06-22 22:08:32.266  
06-22 22:08:32.266  
06-22 22:08:32.266  [Debug] upgrade_ret_1=1,upgrade_ret_2=1
06-22 22:08:32.266  
06-22 22:09:40.193  **************Post Action after upgrade fail****************
06-22 22:09:40.193  
06-22 22:09:40.193  Post action: #oc get node: NAME                                        STATUS                     ROLES    AGE     VERSION                INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
06-22 22:09:40.193  ip-10-0-49-133.us-east-2.compute.internal   Ready                      worker   4h19m   v1.20.0+2817867        10.0.49.133   <none>        Red Hat Enterprise Linux CoreOS 47.83.202106120438-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-3.rhaos4.7.git845b2fa.el8
06-22 22:09:40.193  ip-10-0-50-47.us-east-2.compute.internal    Ready                      worker   3h10m   v1.20.0+87cc9a4        10.0.50.47    <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.31.1.el7.x86_64    cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7
06-22 22:09:40.193  ip-10-0-52-8.us-east-2.compute.internal     Ready                      worker   4h17m   v1.20.0+2817867        10.0.52.8     <none>        Red Hat Enterprise Linux CoreOS 47.83.202106120438-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-3.rhaos4.7.git845b2fa.el8
06-22 22:09:40.193  ip-10-0-55-185.us-east-2.compute.internal   Ready                      master   4h28m   v1.21.0-rc.0+120883f   10.0.55.185   <none>        Red Hat Enterprise Linux CoreOS 48.84.202106161818-0 (Ootpa)   4.18.0-305.3.1.el8_4.x86_64    cri-o://1.21.1-9.rhaos4.8.gitdfcd2b6.el8
06-22 22:09:40.193  ip-10-0-60-171.us-east-2.compute.internal   Ready                      master   4h28m   v1.21.0-rc.0+120883f   10.0.60.171   <none>        Red Hat Enterprise Linux CoreOS 48.84.202106161818-0 (Ootpa)   4.18.0-305.3.1.el8_4.x86_64    cri-o://1.21.1-9.rhaos4.8.gitdfcd2b6.el8
06-22 22:09:40.193  ip-10-0-62-175.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   3h10m   v1.20.0+87cc9a4        10.0.62.175   <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.31.1.el7.x86_64    cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7
06-22 22:09:40.193  ip-10-0-64-236.us-east-2.compute.internal   Ready                      worker   4h19m   v1.20.0+2817867        10.0.64.236   <none>        Red Hat Enterprise Linux CoreOS 47.83.202106120438-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-3.rhaos4.7.git845b2fa.el8
06-22 22:09:40.193  ip-10-0-77-104.us-east-2.compute.internal   Ready                      master   4h28m   v1.21.0-rc.0+120883f   10.0.77.104   <none>        Red Hat Enterprise Linux CoreOS 48.84.202106161818-0 (Ootpa)   4.18.0-305.3.1.el8_4.x86_64    cri-o://1.21.1-9.rhaos4.8.gitdfcd2b6.el8
06-22 22:09:40.193  
06-22 22:09:40.193  
06-22 22:09:40.193  Post action: #oc get co:NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
06-22 22:09:40.193  authentication                             4.8.0-0.nightly-2021-06-17-222823   True        False         False      96m
06-22 22:09:40.193  baremetal                                  4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.193  cloud-credential                           4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h24m
06-22 22:09:40.193  cluster-autoscaler                         4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h24m
06-22 22:09:40.193  config-operator                            4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.193  console                                    4.8.0-0.nightly-2021-06-17-222823   True        False         False      107m
06-22 22:09:40.193  csi-snapshot-controller                    4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.193  dns                                        4.8.0-0.nightly-2021-06-17-222823   True        False         False      130m
06-22 22:09:40.193  etcd                                       4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h24m
06-22 22:09:40.193  image-registry                             4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h18m
06-22 22:09:40.193  ingress                                    4.8.0-0.nightly-2021-06-17-222823   True        False         False      149m
06-22 22:09:40.193  insights                                   4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h19m
06-22 22:09:40.193  kube-apiserver                             4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h22m
06-22 22:09:40.193  kube-controller-manager                    4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h23m
06-22 22:09:40.193  kube-scheduler                             4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h23m
06-22 22:09:40.193  kube-storage-version-migrator              4.8.0-0.nightly-2021-06-17-222823   True        False         False      99m
06-22 22:09:40.193  machine-api                                4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h21m
06-22 22:09:40.193  machine-approver                           4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.193  machine-config                             4.7.17                              False       True          True       86m
06-22 22:09:40.193  marketplace                                4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h24m
06-22 22:09:40.193  monitoring                                 4.8.0-0.nightly-2021-06-17-222823   True        False         False      146m
06-22 22:09:40.193  network                                    4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.193  node-tuning                                4.8.0-0.nightly-2021-06-17-222823   True        False         False      149m
06-22 22:09:40.193  openshift-apiserver                        4.8.0-0.nightly-2021-06-17-222823   True        False         False      99m
06-22 22:09:40.193  openshift-controller-manager               4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h19m
06-22 22:09:40.193  openshift-samples                          4.8.0-0.nightly-2021-06-17-222823   True        False         False      149m
06-22 22:09:40.193  operator-lifecycle-manager                 4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h24m
06-22 22:09:40.194  operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.194  operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h20m
06-22 22:09:40.194  service-ca                                 4.8.0-0.nightly-2021-06-17-222823   True        False         False      4h25m
06-22 22:09:40.194  storage                                    4.8.0-0.nightly-2021-06-17-222823   True        False         False      103m
06-22 22:09:40.194  
06-22 22:09:40.194  
06-22 22:09:40.194  print detail msg for node(SchedulingDisabled) if exist:
06-22 22:09:40.194  
06-22 22:10:38.882  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
06-22 22:10:38.882  
06-22 22:10:38.882  
06-22 22:10:38.882  Name:               ip-10-0-62-175.us-east-2.compute.internal
06-22 22:10:38.882  Roles:              worker
06-22 22:10:38.883  Labels:             beta.kubernetes.io/arch=amd64
06-22 22:10:38.883                      beta.kubernetes.io/instance-type=m4.xlarge
06-22 22:10:38.883                      beta.kubernetes.io/os=linux
06-22 22:10:38.883                      failure-domain.beta.kubernetes.io/region=us-east-2
06-22 22:10:38.883                      failure-domain.beta.kubernetes.io/zone=us-east-2a
06-22 22:10:38.883                      kubernetes.io/arch=amd64
06-22 22:10:38.883                      kubernetes.io/hostname=ip-10-0-62-175.us-east-2.compute.internal
06-22 22:10:38.883                      kubernetes.io/os=linux
06-22 22:10:38.883                      node-role.kubernetes.io/worker=
06-22 22:10:38.883                      node.kubernetes.io/instance-type=m4.xlarge
06-22 22:10:38.883                      node.openshift.io/os_id=rhel
06-22 22:10:38.883                      topology.ebs.csi.aws.com/zone=us-east-2a
06-22 22:10:38.883                      topology.kubernetes.io/region=us-east-2
06-22 22:10:38.883                      topology.kubernetes.io/zone=us-east-2a
06-22 22:10:38.883  Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-04badbe8d733fdfce"}
06-22 22:10:38.883                      k8s.ovn.org/host-addresses: ["10.0.62.175"]
06-22 22:10:38.883                      k8s.ovn.org/l3-gateway-config:
06-22 22:10:38.883                        {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-62-175.us-east-2.compute.internal","mac-address":"02:c5:ec:be:f0:18","ip-address...
06-22 22:10:38.883                      k8s.ovn.org/node-chassis-id: 9c691032-fe74-4a83-ad80-e260c313fb8a
06-22 22:10:38.883                      k8s.ovn.org/node-local-nat-ip: {"default":["169.254.13.4"]}
06-22 22:10:38.883                      k8s.ovn.org/node-mgmt-port-mac-address: 8e:60:50:39:0b:b7
06-22 22:10:38.883                      k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.62.175/20"}
06-22 22:10:38.883                      k8s.ovn.org/node-subnets: {"default":"10.131.2.0/23"}
06-22 22:10:38.883                      machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
06-22 22:10:38.883                      machineconfiguration.openshift.io/currentConfig: rendered-worker-1499553031752408b71539829f65a631
06-22 22:10:38.883                      machineconfiguration.openshift.io/desiredConfig: rendered-worker-3a172bab26901488eeba23826f54cc85
06-22 22:10:38.883                      machineconfiguration.openshift.io/reason: failed to drain node : ip-10-0-62-175.us-east-2.compute.internal after 1 hour
06-22 22:10:38.883                      machineconfiguration.openshift.io/ssh: accessed
06-22 22:10:38.883                      machineconfiguration.openshift.io/state: Degraded
06-22 22:10:38.883                      volumes.kubernetes.io/controller-managed-attach-detach: true
06-22 22:10:38.883  CreationTimestamp:  Tue, 22 Jun 2021 10:59:08 +0000
06-22 22:10:38.883  Taints:             node.kubernetes.io/unschedulable:NoSchedule
06-22 22:10:38.883  Unschedulable:      true
06-22 22:10:38.883  Lease:
06-22 22:10:38.883    HolderIdentity:  ip-10-0-62-175.us-east-2.compute.internal
06-22 22:10:38.883    AcquireTime:     <unset>
06-22 22:10:38.883    RenewTime:       Tue, 22 Jun 2021 14:10:35 +0000
06-22 22:10:38.883  Conditions:
06-22 22:10:38.883    Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
06-22 22:10:38.883    ----             ------  -----------------                 ------------------                ------                       -------
06-22 22:10:38.883    MemoryPressure   False   Tue, 22 Jun 2021 14:08:05 +0000   Tue, 22 Jun 2021 10:59:08 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
06-22 22:10:38.883    DiskPressure     False   Tue, 22 Jun 2021 14:08:05 +0000   Tue, 22 Jun 2021 10:59:08 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
06-22 22:10:38.883    PIDPressure      False   Tue, 22 Jun 2021 14:08:05 +0000   Tue, 22 Jun 2021 10:59:08 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
06-22 22:10:38.883    Ready            True    Tue, 22 Jun 2021 14:08:05 +0000   Tue, 22 Jun 2021 11:00:08 +0000   KubeletReady                 kubelet is posting ready status
06-22 22:10:38.883  Addresses:
06-22 22:10:38.883    InternalIP:   10.0.62.175
06-22 22:10:38.883    Hostname:     ip-10-0-62-175.us-east-2.compute.internal
06-22 22:10:38.883    InternalDNS:  ip-10-0-62-175.us-east-2.compute.internal
06-22 22:10:38.883  Capacity:
06-22 22:10:38.883    attachable-volumes-aws-ebs:  39
06-22 22:10:38.883    cpu:                         4
06-22 22:10:38.883    ephemeral-storage:           31444972Ki
06-22 22:10:38.883    hugepages-1Gi:               0
06-22 22:10:38.883    hugepages-2Mi:               0
06-22 22:10:38.883    memory:                      16264952Ki
06-22 22:10:38.883    pods:                        250
06-22 22:10:38.883  Allocatable:
06-22 22:10:38.883    attachable-volumes-aws-ebs:  39
06-22 22:10:38.883    cpu:                         3500m
06-22 22:10:38.883    ephemeral-storage:           27905944324
06-22 22:10:38.883    hugepages-1Gi:               0
06-22 22:10:38.883    hugepages-2Mi:               0
06-22 22:10:38.883    memory:                      15113976Ki
06-22 22:10:38.883    pods:                        250
06-22 22:10:38.883  System Info:
06-22 22:10:38.883    Machine ID:                             b5e6bffe480549a09900c3187a3df558
06-22 22:10:38.883    System UUID:                            EC29B09E-046D-7887-D773-94D82CE99A19
06-22 22:10:38.883    Boot ID:                                e68a29c2-ec93-4559-99a3-b4507b5f9461
06-22 22:10:38.883    Kernel Version:                         3.10.0-1160.31.1.el7.x86_64
06-22 22:10:38.883    OS Image:                               Red Hat Enterprise Linux Server 7.9 (Maipo)
06-22 22:10:38.883    Operating System:                       linux
06-22 22:10:38.883    Architecture:                           amd64
06-22 22:10:38.883    Container Runtime Version:              cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7
06-22 22:10:38.883    Kubelet Version:                        v1.20.0+87cc9a4
06-22 22:10:38.883    Kube-Proxy Version:                     v1.20.0+87cc9a4
06-22 22:10:38.883  ProviderID:                               aws:///us-east-2a/i-04badbe8d733fdfce
06-22 22:10:38.883  Non-terminated Pods:                      (24 in total)
06-22 22:10:38.883    Namespace                               Name                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
06-22 22:10:38.883    ---------                               ----                                      ------------  ----------  ---------------  -------------  ---
06-22 22:10:38.883    openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-w5xtl             30m (0%)      0 (0%)      150Mi (1%)       0 (0%)         148m
06-22 22:10:38.883    openshift-cluster-node-tuning-operator  tuned-8cwdz                               10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         148m
06-22 22:10:38.883    openshift-dns                           dns-default-cvm7c                         60m (1%)      0 (0%)      110Mi (0%)       0 (0%)         130m
06-22 22:10:38.883    openshift-dns                           node-resolver-98v6m                       5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         132m
06-22 22:10:38.883    openshift-image-registry                image-registry-64fff5ff7c-n2bsv           100m (2%)     0 (0%)      256Mi (1%)       0 (0%)         150m
06-22 22:10:38.883    openshift-image-registry                node-ca-6xmx6                             10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         149m
06-22 22:10:38.883    openshift-ingress-canary                ingress-canary-qfwpp                      10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         148m
06-22 22:10:38.883    openshift-machine-config-operator       machine-config-daemon-nlch5               40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         123m
06-22 22:10:38.883    openshift-marketplace                   certified-operators-dr6jr                 10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         150m
06-22 22:10:38.883    openshift-marketplace                   community-operators-mm7q5                 10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         150m
06-22 22:10:38.883    openshift-marketplace                   redhat-marketplace-5hqcl                  10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         150m
06-22 22:10:38.883    openshift-marketplace                   redhat-operators-q5smd                    10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         150m
06-22 22:10:38.883    openshift-monitoring                    alertmanager-main-1                       8m (0%)       0 (0%)      105Mi (0%)       0 (0%)         148m
06-22 22:10:38.883    openshift-monitoring                    node-exporter-8h9tr                       9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         150m
06-22 22:10:38.883    openshift-monitoring                    openshift-state-metrics-69db764f-wpwnc    3m (0%)       0 (0%)      72Mi (0%)        0 (0%)         150m
06-22 22:10:38.883    openshift-monitoring                    prometheus-adapter-57c4fb7876-qlphr       1m (0%)       0 (0%)      40Mi (0%)        0 (0%)         148m
06-22 22:10:38.883    openshift-monitoring                    prometheus-k8s-0                          76m (2%)      0 (0%)      1119Mi (7%)      0 (0%)         147m
06-22 22:10:38.883    openshift-monitoring                    telemeter-client-6d7879cc66-glp8k         3m (0%)       0 (0%)      70Mi (0%)        0 (0%)         150m
06-22 22:10:38.884    openshift-monitoring                    thanos-querier-7855c8db6-7zhdr            14m (0%)      0 (0%)      77Mi (0%)        0 (0%)         150m
06-22 22:10:38.884    openshift-multus                        multus-additional-cni-plugins-g4q2l       10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         144m
06-22 22:10:38.884    openshift-multus                        multus-z9jhw                              10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         140m
06-22 22:10:38.884    openshift-multus                        network-metrics-daemon-xnqmj              20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         143m
06-22 22:10:38.884    openshift-network-diagnostics           network-check-target-xphvl                10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         139m
06-22 22:10:38.884    openshift-ovn-kubernetes                ovnkube-node-b6crx                        40m (1%)      0 (0%)      640Mi (4%)       0 (0%)         143m
06-22 22:10:38.884  Allocated resources:
06-22 22:10:38.884    (Total limits may be over 100 percent, i.e., overcommitted.)
06-22 22:10:38.884    Resource                    Requests      Limits
06-22 22:10:38.884    --------                    --------      ------
06-22 22:10:38.884    cpu                         509m (14%)    0 (0%)
06-22 22:10:38.884    memory                      3297Mi (22%)  0 (0%)
06-22 22:10:38.884    ephemeral-storage           0 (0%)        0 (0%)
06-22 22:10:38.884    hugepages-1Gi               0 (0%)        0 (0%)
06-22 22:10:38.884    hugepages-2Mi               0 (0%)        0 (0%)
06-22 22:10:38.884    attachable-volumes-aws-ebs  0             0
06-22 22:10:38.884  Events:
06-22 22:10:38.884    Type    Reason              Age   From     Message
06-22 22:10:38.884    ----    ------              ----  ----     -------
06-22 22:10:38.884    Normal  NodeNotSchedulable  119m  kubelet  Node ip-10-0-62-175.us-east-2.compute.internal status is now: NodeNotSchedulable
06-22 22:10:38.884  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
06-22 22:10:38.884  
06-22 22:10:38.884  
06-22 22:10:38.884  print detail msg for co(AVAILABLE != True or PROGRESSING!=False or DEGRADED!=False or version != target_version) if exist:
06-22 22:10:38.884  
06-22 22:10:38.884  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal co details==~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
06-22 22:10:38.884  
06-22 22:10:38.884  
06-22 22:10:38.884  ~~~~~~~~
06-22 22:10:38.884   machine-config 
06-22 22:10:38.884  ~~~~~~~~
06-22 22:10:38.884  
06-22 22:10:38.884  #### Quick diagnosis: The first abnormal cluster operator is often the culprit! ####
06-22 22:10:38.884  => Below status and logs for the different conditions of all abnormal cos are sorted by 'lastTransitionTime':
06-22 22:10:38.884  2021-06-22T12:05:44Z [machine-config] -->>Progressing True<<-- - Working towards 4.8.0-0.nightly-2021-06-17-222823
06-22 22:10:38.884  2021-06-22T12:10:35Z [machine-config] Upgradeable False DegradedPool One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading
06-22 22:10:38.884  2021-06-22T12:42:46Z [machine-config] -->>Degraded True<<-- RequiredPoolsFailed Unable to apply 4.8.0-0.nightly-2021-06-17-222823: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool worker is not ready, retrying. Status: (pool degraded: true total: 5, ready 0, updated: 0, unavailable: 1)
06-22 22:10:38.884  2021-06-22T12:42:47Z [machine-config] -->>Available False<<-- - Cluster -->>not available<<-- for 4.8.0-0.nightly-2021-06-17-222823
06-22 22:10:38.884  
06-22 22:10:38.884  --------------------------
06-22 22:10:38.884  
06-22 22:10:39.483  Name:         machine-config
06-22 22:10:39.483  Namespace:    
06-22 22:10:39.483  Labels:       <none>
06-22 22:10:39.483  Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
06-22 22:10:39.483                include.release.openshift.io/self-managed-high-availability: true
06-22 22:10:39.483                include.release.openshift.io/single-node-developer: true
06-22 22:10:39.483  API Version:  config.openshift.io/v1
06-22 22:10:39.483  Kind:         ClusterOperator
06-22 22:10:39.483  Metadata:
06-22 22:10:39.483    Creation Timestamp:  2021-06-22T09:33:15Z
06-22 22:10:39.483    Generation:          1
06-22 22:10:39.483    Managed Fields:
06-22 22:10:39.483      API Version:  config.openshift.io/v1
06-22 22:10:39.483      Fields Type:  FieldsV1
06-22 22:10:39.483      fieldsV1:
06-22 22:10:39.483        f:metadata:
06-22 22:10:39.483          f:annotations:
06-22 22:10:39.483            .:
06-22 22:10:39.483            f:exclude.release.openshift.io/internal-openshift-hosted:
06-22 22:10:39.483            f:include.release.openshift.io/self-managed-high-availability:
06-22 22:10:39.483            f:include.release.openshift.io/single-node-developer:
06-22 22:10:39.483        f:spec:
06-22 22:10:39.483        f:status:
06-22 22:10:39.483          .:
06-22 22:10:39.483          f:relatedObjects:
06-22 22:10:39.483          f:versions:
06-22 22:10:39.483      Manager:      cluster-version-operator
06-22 22:10:39.483      Operation:    Update
06-22 22:10:39.483      Time:         2021-06-22T09:33:15Z
06-22 22:10:39.483      API Version:  config.openshift.io/v1
06-22 22:10:39.483      Fields Type:  FieldsV1
06-22 22:10:39.483      fieldsV1:
06-22 22:10:39.483        f:status:
06-22 22:10:39.483          f:conditions:
06-22 22:10:39.483          f:extension:
06-22 22:10:39.483            .:
06-22 22:10:39.483            f:master:
06-22 22:10:39.483            f:worker:
06-22 22:10:39.483          f:relatedObjects:
06-22 22:10:39.483          f:versions:
06-22 22:10:39.483      Manager:         machine-config-operator
06-22 22:10:39.483      Operation:       Update
06-22 22:10:39.483      Time:            2021-06-22T14:10:38Z
06-22 22:10:39.483    Resource Version:  157625
06-22 22:10:39.483    UID:               41b108a8-b2f7-4814-8447-f5572a6cc799
06-22 22:10:39.483  Spec:
06-22 22:10:39.483  Status:
06-22 22:10:39.483    Conditions:
06-22 22:10:39.483      Last Transition Time:  2021-06-22T12:05:44Z
06-22 22:10:39.483      Message:               Working towards 4.8.0-0.nightly-2021-06-17-222823
06-22 22:10:39.483      Status:                True
06-22 22:10:39.483      Type:                  Progressing
06-22 22:10:39.483      Last Transition Time:  2021-06-22T12:10:35Z
06-22 22:10:39.483      Message:               One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading
06-22 22:10:39.483      Reason:                DegradedPool
06-22 22:10:39.483      Status:                False
06-22 22:10:39.483      Type:                  Upgradeable
06-22 22:10:39.483      Last Transition Time:  2021-06-22T12:42:46Z
06-22 22:10:39.483      Message:               Unable to apply 4.8.0-0.nightly-2021-06-17-222823: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool worker is not ready, retrying. Status: (pool degraded: true total: 5, ready 0, updated: 0, unavailable: 1)
06-22 22:10:39.483      Reason:                RequiredPoolsFailed
06-22 22:10:39.483      Status:                True
06-22 22:10:39.483      Type:                  Degraded
06-22 22:10:39.483      Last Transition Time:  2021-06-22T12:42:47Z
06-22 22:10:39.483      Message:               Cluster not available for 4.8.0-0.nightly-2021-06-17-222823
06-22 22:10:39.483      Status:                False
06-22 22:10:39.483      Type:                  Available
06-22 22:10:39.483    Extension:
06-22 22:10:39.483      Master:  all 3 nodes are at latest configuration rendered-master-116ac1e56e07257ad969fb10e7b48f48
06-22 22:10:39.483      Worker:  pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ip-10-0-62-175.us-east-2.compute.internal is reporting: \"error setting node's state to Working: unable to update node \\\"&Node{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{},Allocatable:ResourceList{},Phase:,Conditions:[]NodeCondition{},Addresses:[]NodeAddress{},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:0,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},}\\\": Patch \\\"https://172.30.0.1:443/api/v1/nodes/ip-10-0-62-175.us-east-2.compute.internal\\\": http2: client connection lost\""
06-22 22:10:39.483    Related Objects:
06-22 22:10:39.483      Group:     
06-22 22:10:39.483      Name:      openshift-machine-config-operator
06-22 22:10:39.483      Resource:  namespaces
06-22 22:10:39.483      Group:     machineconfiguration.openshift.io
06-22 22:10:39.483      Name:      
06-22 22:10:39.483      Resource:  machineconfigpools
06-22 22:10:39.483      Group:     machineconfiguration.openshift.io
06-22 22:10:39.483      Name:      
06-22 22:10:39.483      Resource:  controllerconfigs
06-22 22:10:39.483      Group:     machineconfiguration.openshift.io
06-22 22:10:39.483      Name:      
06-22 22:10:39.483      Resource:  kubeletconfigs
06-22 22:10:39.483      Group:     machineconfiguration.openshift.io
06-22 22:10:39.483      Name:      
06-22 22:10:39.483      Resource:  containerruntimeconfigs
06-22 22:10:39.483      Group:     machineconfiguration.openshift.io
06-22 22:10:39.483      Name:      
06-22 22:10:39.483      Resource:  machineconfigs
06-22 22:10:39.483      Group:     
06-22 22:10:39.483      Name:      
06-22 22:10:39.483      Resource:  nodes
06-22 22:10:39.483      Group:     
06-22 22:10:39.483      Name:      openshift-kni-infra
06-22 22:10:39.483      Resource:  namespaces
06-22 22:10:39.483      Group:     
06-22 22:10:39.483      Name:      openshift-openstack-infra
06-22 22:10:39.483      Resource:  namespaces
06-22 22:10:39.483      Group:     
06-22 22:10:39.483      Name:      openshift-ovirt-infra
06-22 22:10:39.483      Resource:  namespaces
06-22 22:10:39.483      Group:     
06-22 22:10:39.483      Name:      openshift-vsphere-infra
06-22 22:10:39.483      Resource:  namespaces
06-22 22:10:39.483    Versions:
06-22 22:10:39.483      Name:     operator
06-22 22:10:39.483      Version:  4.7.17
06-22 22:10:39.483  Events:       <none>
06-22 22:10:39.484  
06-22 22:10:39.484  ~~~~~~~~~~~~~~~~~~~~~~~
06-22 22:10:39.484  
06-22 22:10:42.132  Name:         machine-config
06-22 22:10:42.132  Namespace:    
06-22 22:10:42.132  Labels:       <none>
06-22 22:10:42.132  Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
06-22 22:10:42.132                include.release.openshift.io/self-managed-high-availability: true
06-22 22:10:42.132                include.release.openshift.io/single-node-developer: true
06-22 22:10:42.132  API Version:  config.openshift.io/v1
06-22 22:10:42.132  Kind:         ClusterOperator
06-22 22:10:42.132  Metadata:
06-22 22:10:42.132    Creation Timestamp:  2021-06-22T09:33:15Z
06-22 22:10:42.132    Generation:          1
06-22 22:10:42.132    Managed Fields:
06-22 22:10:42.132      API Version:  config.openshift.io/v1
06-22 22:10:42.132      Fields Type:  FieldsV1
06-22 22:10:42.132      fieldsV1:
06-22 22:10:42.132        f:metadata:
06-22 22:10:42.132          f:annotations:
06-22 22:10:42.132            .:
06-22 22:10:42.132            f:exclude.release.openshift.io/internal-openshift-hosted:
06-22 22:10:42.132            f:include.release.openshift.io/self-managed-high-availability:
06-22 22:10:42.132            f:include.release.openshift.io/single-node-developer:
06-22 22:10:42.132        f:spec:
06-22 22:10:42.132        f:status:
06-22 22:10:42.132          .:
06-22 22:10:42.132          f:relatedObjects:
06-22 22:10:42.132          f:versions:
06-22 22:10:42.132      Manager:      cluster-version-operator
06-22 22:10:42.132      Operation:    Update
06-22 22:10:42.132      Time:         2021-06-22T09:33:15Z
06-22 22:10:42.132      API Version:  config.openshift.io/v1
06-22 22:10:42.132      Fields Type:  FieldsV1
06-22 22:10:42.132      fieldsV1:
06-22 22:10:42.132        f:status:
06-22 22:10:42.132          f:conditions:
06-22 22:10:42.132          f:extension:
06-22 22:10:42.132            .:
06-22 22:10:42.132            f:master:
06-22 22:10:42.132            f:worker:
06-22 22:10:42.132          f:relatedObjects:
06-22 22:10:42.132          f:versions:
06-22 22:10:42.132      Manager:         machine-config-operator
06-22 22:10:42.132      Operation:       Update
06-22 22:10:42.132      Time:            2021-06-22T14:10:40Z
06-22 22:10:42.132    Resource Version:  157638
06-22 22:10:42.132    UID:               41b108a8-b2f7-4814-8447-f5572a6cc799
06-22 22:10:42.132  Spec:
06-22 22:10:42.132  Status:
06-22 22:10:42.132    Conditions:
06-22 22:10:42.132      Last Transition Time:  2021-06-22T12:05:44Z
06-22 22:10:42.132      Message:               Working towards 4.8.0-0.nightly-2021-06-17-222823
06-22 22:10:42.132      Status:                True
06-22 22:10:42.132      Type:                  Progressing
06-22 22:10:42.132      Last Transition Time:  2021-06-22T12:10:35Z
06-22 22:10:42.132      Message:               One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading
06-22 22:10:42.132      Reason:                DegradedPool
06-22 22:10:42.132      Status:                False
06-22 22:10:42.132      Type:                  Upgradeable
06-22 22:10:42.132      Last Transition Time:  2021-06-22T12:42:46Z
06-22 22:10:42.132      Message:               Unable to apply 4.8.0-0.nightly-2021-06-17-222823: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool worker is not ready, retrying. Status: (pool degraded: true total: 5, ready 0, updated: 0, unavailable: 1)
06-22 22:10:42.132      Reason:                RequiredPoolsFailed
06-22 22:10:42.132      Status:                True
06-22 22:10:42.132      Type:                  Degraded
06-22 22:10:42.132      Last Transition Time:  2021-06-22T12:42:47Z
06-22 22:10:42.132      Message:               Cluster not available for 4.8.0-0.nightly-2021-06-17-222823
06-22 22:10:42.132      Status:                False
06-22 22:10:42.132      Type:                  Available
06-22 22:10:42.132    Extension:
06-22 22:10:42.132      Master:  all 3 nodes are at latest configuration rendered-master-116ac1e56e07257ad969fb10e7b48f48
06-22 22:10:42.133      Worker:  pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ip-10-0-62-175.us-east-2.compute.internal is reporting: \"error setting node's state to Working: unable to update node \\\"&Node{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{},Allocatable:ResourceList{},Phase:,Conditions:[]NodeCondition{},Addresses:[]NodeAddress{},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:0,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},}\\\": Patch \\\"https://172.30.0.1:443/api/v1/nodes/ip-10-0-62-175.us-east-2.compute.internal\\\": http2: client connection lost\""
06-22 22:10:42.133    Related Objects:
06-22 22:10:42.133      Group:     
06-22 22:10:42.133      Name:      openshift-machine-config-operator
06-22 22:10:42.133      Resource:  namespaces
06-22 22:10:42.133      Group:     machineconfiguration.openshift.io
06-22 22:10:42.133      Name:      
06-22 22:10:42.133      Resource:  machineconfigpools
06-22 22:10:42.133      Group:     machineconfiguration.openshift.io
06-22 22:10:42.133      Name:      
06-22 22:10:42.133      Resource:  controllerconfigs
06-22 22:10:42.133      Group:     machineconfiguration.openshift.io
06-22 22:10:42.133      Name:      
06-22 22:10:42.133      Resource:  kubeletconfigs
06-22 22:10:42.133      Group:     machineconfiguration.openshift.io
06-22 22:10:42.133      Name:      
06-22 22:10:42.133      Resource:  containerruntimeconfigs
06-22 22:10:42.133      Group:     machineconfiguration.openshift.io
06-22 22:10:42.133      Name:      
06-22 22:10:42.133      Resource:  machineconfigs
06-22 22:10:42.133      Group:     
06-22 22:10:42.133      Name:      
06-22 22:10:42.133      Resource:  nodes
06-22 22:10:42.133      Group:     
06-22 22:10:42.133      Name:      openshift-kni-infra
06-22 22:10:42.133      Resource:  namespaces
06-22 22:10:42.133      Group:     
06-22 22:10:42.133      Name:      openshift-openstack-infra
06-22 22:10:42.133      Resource:  namespaces
06-22 22:10:42.133      Group:     
06-22 22:10:42.133      Name:      openshift-ovirt-infra
06-22 22:10:42.133      Resource:  namespaces
06-22 22:10:42.133      Group:     
06-22 22:10:42.133      Name:      openshift-vsphere-infra
06-22 22:10:42.133      Resource:  namespaces
06-22 22:10:42.133    Versions:
06-22 22:10:42.133      Name:     operator
06-22 22:10:42.133      Version:  4.7.17
06-22 22:10:42.133  Events:       <none>

Comment 6 Aniket Bhat 2021-06-22 15:11:52 UTC

Since this isn't very reproducible, marking it as a blocker- and moving it to 4.8.z

Comment 8 Yang Yang 2021-06-23 03:00:05 UTC

(In reply to Yang Yang from comment #5)
> Attempted 3 times upgrade 4.7.17-x86_64-->
> 4.8.0-0.nightly-2021-06-17-222823, 4.7.17-x86_64-->
> 4.8.0-0.nightly-2021-06-21-175537, 4.7.0-0.nightly-2021-06-20-093308-->
> 4.8.0-0.nightly-2021-06-21-175537, all of them ran into a machine-config bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1974403. It cannot be reproduced
> at the moment.

Correcting the referenced bug, it's supposed to run into https://bugzilla.redhat.com/show_bug.cgi?id=1974962

Comment 9 Yang Yang 2021-06-30 09:24:36 UTC

I'm experiencing it again during upgrade from 4.7.0-0.nightly-2021-06-26-014854--> 4.8.0-0.nightly-2021-06-29-033219.

# oc describe network
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         Network
Metadata:
  Creation Timestamp:  2021-06-30T02:56:31Z
  Generation:          2
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:clusterNetwork:
        f:externalIP:
          .:
          f:policy:
        f:networkType:
        f:serviceNetwork:
      f:status:
    Manager:      cluster-bootstrap
    Operation:    Update
    Time:         2021-06-30T02:56:31Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:clusterNetwork:
        f:clusterNetworkMTU:
        f:networkType:
        f:serviceNetwork:
    Manager:         cluster-network-operator
    Operation:       Update
    Time:            2021-06-30T03:05:28Z
  Resource Version:  3323
  UID:               99d151e9-0445-47b6-b02b-996f513a9f38
Spec:
  Cluster Network:
    Cidr:         10.128.0.0/14
    Host Prefix:  23
  External IP:
    Policy:
  Network Type:  OVNKubernetes
  Service Network:
    172.30.0.0/16
Status:
  Cluster Network:
    Cidr:               10.128.0.0/14
    Host Prefix:        23
  Cluster Network MTU:  8901
  Network Type:         OVNKubernetes
  Service Network:
    172.30.0.0/16
Events:  <none>
[root@preserve-yangyangmerrn-1 tmp]# oc get co | grep network
network                                    4.8.0-0.nightly-2021-06-29-033219   True        True          True       6h2m
[root@preserve-yangyangmerrn-1 tmp]# oc descirbe co ^C
[root@preserve-yangyangmerrn-1 tmp]# oc describe co network
Name:         network
Namespace:    
Labels:       <none>
Annotations:  include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
              network.operator.openshift.io/last-seen-state:
                {"DaemonsetStates":[{"Namespace":"openshift-multus","Name":"multus-additional-cni-plugins","LastSeenStatus":{"currentNumberScheduled":8,"n...
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-06-30T02:56:33Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:include.release.openshift.io/ibm-cloud-managed:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
      f:spec:
      f:status:
        .:
        f:extension:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2021-06-30T02:56:33Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:network.operator.openshift.io/last-seen-state:
      f:status:
        f:conditions:
        f:relatedObjects:
        f:versions:
    Manager:         cluster-network-operator
    Operation:       Update
    Time:            2021-06-30T03:08:36Z
  Resource Version:  185868
  UID:               1165e5f2-e1c0-41df-a244-1be58326bf4d
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-06-30T03:04:25Z
    Status:                False
    Type:                  ManagementStateDegraded
    Last Transition Time:  2021-06-30T06:49:21Z
    Message:               DaemonSet "openshift-multus/multus" rollout is not making progress - last change 2021-06-30T06:38:24Z
DaemonSet "openshift-multus/multus-additional-cni-plugins" rollout is not making progress - last change 2021-06-30T06:39:01Z
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2021-06-30T06:39:04Z
    Reason:                RolloutHung
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-06-30T03:04:25Z
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2021-06-30T06:32:52Z
    Message:               DaemonSet "openshift-multus/multus" is not available (awaiting 1 nodes)
DaemonSet "openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
DaemonSet "openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
DaemonSet "openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    Reason:                Deploying
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2021-06-30T03:08:36Z
    Status:                True
    Type:                  Available
  Extension:               <nil>
  Related Objects:
    Group:      
    Name:       applied-cluster
    Namespace:  openshift-network-operator
    Resource:   configmaps
    Group:      apiextensions.k8s.io
    Name:       network-attachment-definitions.k8s.cni.cncf.io
    Resource:   customresourcedefinitions
    Group:      apiextensions.k8s.io
    Name:       ippools.whereabouts.cni.cncf.io
    Resource:   customresourcedefinitions
    Group:      apiextensions.k8s.io
    Name:       overlappingrangeipreservations.whereabouts.cni.cncf.io
    Resource:   customresourcedefinitions
    Group:      
    Name:       openshift-multus
    Resource:   namespaces
    Group:      rbac.authorization.k8s.io
    Name:       multus
    Resource:   clusterroles
    Group:      
    Name:       multus
    Namespace:  openshift-multus
    Resource:   serviceaccounts
    Group:      rbac.authorization.k8s.io
    Name:       multus
    Resource:   clusterrolebindings
    Group:      rbac.authorization.k8s.io
    Name:       multus-whereabouts
    Resource:   clusterrolebindings
    Group:      rbac.authorization.k8s.io
    Name:       whereabouts-cni
    Resource:   clusterroles
    Group:      
    Name:       cni-binary-copy-script
    Namespace:  openshift-multus
    Resource:   configmaps
    Group:      apps
    Name:       multus
    Namespace:  openshift-multus
    Resource:   daemonsets
    Group:      apps
    Name:       multus-additional-cni-plugins
    Namespace:  openshift-multus
    Resource:   daemonsets
    Group:      
    Name:       metrics-daemon-sa
    Namespace:  openshift-multus
    Resource:   serviceaccounts
    Group:      rbac.authorization.k8s.io
    Name:       metrics-daemon-role
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       metrics-daemon-sa-rolebinding
    Resource:   clusterrolebindings
    Group:      apps
    Name:       network-metrics-daemon
    Namespace:  openshift-multus
    Resource:   daemonsets
    Group:      monitoring.coreos.com
    Name:       monitor-network
    Namespace:  openshift-multus
    Resource:   servicemonitors
    Group:      
    Name:       network-metrics-service
    Namespace:  openshift-multus
    Resource:   services
    Group:      rbac.authorization.k8s.io
    Name:       prometheus-k8s
    Namespace:  openshift-multus
    Resource:   roles
    Group:      rbac.authorization.k8s.io
    Name:       prometheus-k8s
    Namespace:  openshift-multus
    Resource:   rolebindings
    Group:      
    Name:       multus-admission-controller
    Namespace:  openshift-multus
    Resource:   services
    Group:      rbac.authorization.k8s.io
    Name:       multus-admission-controller-webhook
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       multus-admission-controller-webhook
    Resource:   clusterrolebindings
    Group:      admissionregistration.k8s.io
    Name:       multus.openshift.io
    Resource:   validatingwebhookconfigurations
    Group:      apps
    Name:       multus-admission-controller
    Namespace:  openshift-multus
    Resource:   daemonsets
    Group:      monitoring.coreos.com
    Name:       monitor-multus-admission-controller
    Namespace:  openshift-multus
    Resource:   servicemonitors
    Group:      rbac.authorization.k8s.io
    Name:       prometheus-k8s
    Namespace:  openshift-multus
    Resource:   roles
    Group:      rbac.authorization.k8s.io
    Name:       prometheus-k8s
    Namespace:  openshift-multus
    Resource:   rolebindings
    Group:      monitoring.coreos.com
    Name:       prometheus-k8s-rules
    Namespace:  openshift-multus
    Resource:   prometheusrules
    Group:      
    Name:       openshift-ovn-kubernetes
    Resource:   namespaces
    Group:      apiextensions.k8s.io
    Name:       egressfirewalls.k8s.ovn.org
    Resource:   customresourcedefinitions
    Group:      apiextensions.k8s.io
    Name:       egressips.k8s.ovn.org
    Resource:   customresourcedefinitions
    Group:      
    Name:       ovn-kubernetes-node
    Namespace:  openshift-ovn-kubernetes
    Resource:   serviceaccounts
    Group:      rbac.authorization.k8s.io
    Name:       openshift-ovn-kubernetes-node
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       openshift-ovn-kubernetes-node
    Resource:   clusterrolebindings
    Group:      
    Name:       ovn-kubernetes-controller
    Namespace:  openshift-ovn-kubernetes
    Resource:   serviceaccounts
    Group:      rbac.authorization.k8s.io
    Name:       openshift-ovn-kubernetes-controller
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       openshift-ovn-kubernetes-controller
    Resource:   clusterrolebindings
    Group:      rbac.authorization.k8s.io
    Name:       openshift-ovn-kubernetes-sbdb
    Namespace:  openshift-ovn-kubernetes
    Resource:   roles
    Group:      rbac.authorization.k8s.io
    Name:       openshift-ovn-kubernetes-sbdb
    Namespace:  openshift-ovn-kubernetes
    Resource:   rolebindings
    Group:      
    Name:       ovnkube-config
    Namespace:  openshift-ovn-kubernetes
    Resource:   configmaps
    Group:      
    Name:       ovnkube-db
    Namespace:  openshift-ovn-kubernetes
    Resource:   services
    Group:      network.operator.openshift.io
    Name:       ovn
    Namespace:  openshift-ovn-kubernetes
    Resource:   operatorpkis
    Group:      network.operator.openshift.io
    Name:       signer
    Namespace:  openshift-ovn-kubernetes
    Resource:   operatorpkis
    Group:      flowcontrol.apiserver.k8s.io
    Name:       openshift-ovn-kubernetes
    Resource:   flowschemas
    Group:      monitoring.coreos.com
    Name:       master-rules
    Namespace:  openshift-ovn-kubernetes
    Resource:   prometheusrules
    Group:      monitoring.coreos.com
    Name:       networking-rules
    Namespace:  openshift-ovn-kubernetes
    Resource:   prometheusrules
    Group:      monitoring.coreos.com
    Name:       monitor-ovn-master-metrics
    Namespace:  openshift-ovn-kubernetes
    Resource:   servicemonitors
    Group:      
    Name:       ovn-kubernetes-master
    Namespace:  openshift-ovn-kubernetes
    Resource:   services
    Group:      monitoring.coreos.com
    Name:       monitor-ovn-node
    Namespace:  openshift-ovn-kubernetes
    Resource:   servicemonitors
    Group:      
    Name:       ovn-kubernetes-node
    Namespace:  openshift-ovn-kubernetes
    Resource:   services
    Group:      rbac.authorization.k8s.io
    Name:       prometheus-k8s
    Namespace:  openshift-ovn-kubernetes
    Resource:   roles
    Group:      rbac.authorization.k8s.io
    Name:       prometheus-k8s
    Namespace:  openshift-ovn-kubernetes
    Resource:   rolebindings
    Group:      
    Name:       openshift-host-network
    Resource:   namespaces
    Group:      
    Name:       host-network-namespace-quotas
    Namespace:  openshift-host-network
    Resource:   resourcequotas
    Group:      policy
    Name:       ovn-raft-quorum-guard
    Namespace:  openshift-ovn-kubernetes
    Resource:   poddisruptionbudgets
    Group:      apps
    Name:       ovnkube-master
    Namespace:  openshift-ovn-kubernetes
    Resource:   daemonsets
    Group:      apps
    Name:       ovnkube-node
    Namespace:  openshift-ovn-kubernetes
    Resource:   daemonsets
    Group:      
    Name:       openshift-network-diagnostics
    Resource:   namespaces
    Group:      
    Name:       network-diagnostics
    Namespace:  openshift-network-diagnostics
    Resource:   serviceaccounts
    Group:      rbac.authorization.k8s.io
    Name:       network-diagnostics
    Namespace:  openshift-network-diagnostics
    Resource:   roles
    Group:      rbac.authorization.k8s.io
    Name:       network-diagnostics
    Namespace:  openshift-network-diagnostics
    Resource:   rolebindings
    Group:      rbac.authorization.k8s.io
    Name:       network-diagnostics
    Resource:   clusterroles
    Group:      rbac.authorization.k8s.io
    Name:       network-diagnostics
    Resource:   clusterrolebindings
    Group:      rbac.authorization.k8s.io
    Name:       network-diagnostics
    Namespace:  kube-system
    Resource:   rolebindings
    Group:      apps
    Name:       network-check-source
    Namespace:  openshift-network-diagnostics
    Resource:   deployments
    Group:      
    Name:       network-check-source
    Namespace:  openshift-network-diagnostics
    Resource:   services
    Group:      monitoring.coreos.com
    Name:       network-check-source
    Namespace:  openshift-network-diagnostics
    Resource:   servicemonitors
    Group:      apps
    Name:       network-check-target
    Namespace:  openshift-network-diagnostics
    Resource:   daemonsets
    Group:      
    Name:       network-check-target
    Namespace:  openshift-network-diagnostics
    Resource:   services
    Group:      
    Name:       openshift-network-operator
    Resource:   namespaces
    Group:      operator.openshift.io
    Name:       cluster
    Resource:   networks
    Group:      networking.k8s.io
    Name:       
    Resource:   NetworkPolicy
    Group:      k8s.ovn.org
    Name:       
    Resource:   EgressFirewall
    Group:      k8s.ovn.org
    Name:       
    Resource:   EgressIP
  Versions:
    Name:     operator
    Version:  4.8.0-0.nightly-2021-06-29-033219
Events:       <none>

# oc get node
NAME                                        STATUS                        ROLES    AGE     VERSION
ip-10-0-48-218.us-east-2.compute.internal   Ready                         worker   6h4m    v1.21.0-rc.0+766a5fe
ip-10-0-49-65.us-east-2.compute.internal    Ready                         worker   4h53m   v1.20.0+87cc9a4
ip-10-0-50-212.us-east-2.compute.internal   Ready                         master   6h17m   v1.21.0-rc.0+766a5fe
ip-10-0-54-58.us-east-2.compute.internal    Ready                         worker   6h4m    v1.21.0-rc.0+766a5fe
ip-10-0-59-156.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker   4h54m   v1.20.0+87cc9a4
ip-10-0-71-142.us-east-2.compute.internal   Ready                         master   6h17m   v1.21.0-rc.0+766a5fe
ip-10-0-72-138.us-east-2.compute.internal   Ready                         master   6h18m   v1.21.0-rc.0+766a5fe
ip-10-0-73-231.us-east-2.compute.internal   Ready                         worker   6h4m    v1.21.0-rc.0+766a5fe

# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.0-0.nightly-2021-06-29-033219   True        False         False      164m
baremetal                                  4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h14m
cloud-credential                           4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h14m
cluster-autoscaler                         4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h13m
config-operator                            4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h14m
console                                    4.8.0-0.nightly-2021-06-29-033219   True        False         False      165m
csi-snapshot-controller                    4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h9m
dns                                        4.8.0-0.nightly-2021-06-29-033219   True        True          False      3h16m
etcd                                       4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h13m
image-registry                             4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h3m
ingress                                    4.8.0-0.nightly-2021-06-29-033219   True        False         False      3h39m
insights                                   4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h8m
kube-apiserver                             4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h12m
kube-controller-manager                    4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h11m
kube-scheduler                             4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h12m
kube-storage-version-migrator              4.8.0-0.nightly-2021-06-29-033219   True        False         False      167m
machine-api                                4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h10m
machine-approver                           4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h13m
machine-config                             4.7.0-0.nightly-2021-06-26-014854   False       True          True       176m
marketplace                                4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h13m
monitoring                                 4.8.0-0.nightly-2021-06-29-033219   False       True          True       161m
network                                    4.8.0-0.nightly-2021-06-29-033219   True        True          True       6h14m
node-tuning                                4.8.0-0.nightly-2021-06-29-033219   True        False         False      3h38m
openshift-apiserver                        4.8.0-0.nightly-2021-06-29-033219   True        False         False      164m
openshift-controller-manager               4.8.0-0.nightly-2021-06-29-033219   True        False         False      3h39m
openshift-samples                          4.8.0-0.nightly-2021-06-29-033219   True        False         False      3h39m
operator-lifecycle-manager                 4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h14m
operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h14m
operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-06-29-033219   True        False         False      3h3m
service-ca                                 4.8.0-0.nightly-2021-06-29-033219   True        False         False      6h14m
storage                                    4.8.0-0.nightly-2021-06-29-033219   True        True          False      175m

The cluster has a RHEL node with notready state and a few operators are not in good state.

To investigate it, you can access the cluster by using the kubeconfig file attached. The cluster will be preserved only 24 hours.

Comment 15 Yang Yang 2021-10-15 06:46:16 UTC

Upgrade test from 4.7.34-x86_64 to 4.8.15-x86_64 passed on profile 80_IPI on AWS RHCOS & RHEL7.9 & FIPS on & OVN & Etcd Encryption & http_proxy & STS. So I think we can close it. Thanks.

10-14 13:52:46.739  **************Post Action after upgrade succ****************
10-14 13:52:46.739  
10-14 13:52:46.739  Post action: #oc get node: NAME                                        STATUS   ROLES    AGE     VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
10-14 13:52:46.739  ip-10-0-51-156.us-east-2.compute.internal   Ready    worker   3h21m   v1.21.1+a620f50   10.0.51.156   <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.45.1.el7.x86_64    cri-o://1.21.3-6.rhaos4.8.gite34bf50.el7
10-14 13:52:46.739  ip-10-0-52-130.us-east-2.compute.internal   Ready    worker   3h21m   v1.21.1+a620f50   10.0.52.130   <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.45.1.el7.x86_64    cri-o://1.21.3-6.rhaos4.8.gite34bf50.el7
10-14 13:52:46.739  ip-10-0-53-57.us-east-2.compute.internal    Ready    worker   4h27m   v1.21.1+a620f50   10.0.53.57    <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.739  ip-10-0-54-209.us-east-2.compute.internal   Ready    master   4h39m   v1.21.1+a620f50   10.0.54.209   <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.739  ip-10-0-58-61.us-east-2.compute.internal    Ready    worker   4h31m   v1.21.1+a620f50   10.0.58.61    <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.740  ip-10-0-64-95.us-east-2.compute.internal    Ready    master   4h39m   v1.21.1+a620f50   10.0.64.95    <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.740  ip-10-0-69-152.us-east-2.compute.internal   Ready    worker   4h31m   v1.21.1+a620f50   10.0.69.152   <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.740  ip-10-0-71-209.us-east-2.compute.internal   Ready    worker   139m    v1.21.1+a620f50   10.0.71.209   <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.740  ip-10-0-79-132.us-east-2.compute.internal   Ready    master   4h39m   v1.21.1+a620f50   10.0.79.132   <none>        Red Hat Enterprise Linux CoreOS 48.84.202110121501-0 (Ootpa)   4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-6.rhaos4.8.gite34bf50.el8
10-14 13:52:46.740  
10-14 13:52:46.740  
10-14 13:52:46.740  Post action: #oc get co:NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
10-14 13:52:46.740  authentication                             4.8.15    True        False         False      2m
10-14 13:52:46.740  baremetal                                  4.8.15    True        False         False      4h35m
10-14 13:52:46.740  cloud-credential                           4.8.15    True        False         False      4h34m
10-14 13:52:46.740  cluster-autoscaler                         4.8.15    True        False         False      4h34m
10-14 13:52:46.740  config-operator                            4.8.15    True        False         False      4h36m
10-14 13:52:46.740  console                                    4.8.15    True        False         False      18m
10-14 13:52:46.740  csi-snapshot-controller                    4.8.15    True        False         False      167m
10-14 13:52:46.740  dns                                        4.8.15    True        False         False      54m
10-14 13:52:46.740  etcd                                       4.8.15    True        False         False      4h34m
10-14 13:52:46.740  image-registry                             4.8.15    True        False         False      4h29m
10-14 13:52:46.740  ingress                                    4.8.15    True        False         False      70m
10-14 13:52:46.740  insights                                   4.8.15    True        False         False      4h29m
10-14 13:52:46.740  kube-apiserver                             4.8.15    True        False         False      4h32m
10-14 13:52:46.740  kube-controller-manager                    4.8.15    True        False         False      4h33m
10-14 13:52:46.740  kube-scheduler                             4.8.15    True        False         False      4h33m
10-14 13:52:46.740  kube-storage-version-migrator              4.8.15    True        False         False      20m
10-14 13:52:46.740  machine-api                                4.8.15    True        False         False      4h30m
10-14 13:52:46.740  machine-approver                           4.8.15    True        False         False      4h35m
10-14 13:52:46.740  machine-config                             4.8.15    True        False         False      16m
10-14 13:52:46.740  marketplace                                4.8.15    True        False         False      171m
10-14 13:52:46.740  monitoring                                 4.8.15    True        False         False      67m
10-14 13:52:46.740  network                                    4.8.15    True        False         False      4h36m
10-14 13:52:46.740  node-tuning                                4.8.15    True        False         False      70m
10-14 13:52:46.740  openshift-apiserver                        4.8.15    True        False         False      20m
10-14 13:52:46.740  openshift-controller-manager               4.8.15    True        False         False      4h29m
10-14 13:52:46.740  openshift-samples                          4.8.15    True        False         False      70m
10-14 13:52:46.740  operator-lifecycle-manager                 4.8.15    True        False         False      4h35m
10-14 13:52:46.740  operator-lifecycle-manager-catalog         4.8.15    True        False         False      4h35m
10-14 13:52:46.740  operator-lifecycle-manager-packageserver   4.8.15    True        False         False      39m
10-14 13:52:46.740  service-ca                                 4.8.15    True        False         False      4h36m
10-14 13:52:46.740  storage                                    4.8.15    True        False         False      17m

Comment 18 errata-xmlrpc 2022-01-11 22:31:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.26 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0021