Hide Forgot
We need to make sure CNO doesn't erroneously show up as Degraded during the install. See PR. https://github.com/openshift/cluster-network-operator/pull/911
Verified this bug on 4.7.0-0.nightly-2021-01-06-012750 Check the openshift-network-operator pod logs: #oc logs network-operator-55496d8847-9thwc -n openshift-network-operator | grep "Deployment \"openshift-network-diagnostics/network-check-source\"" Waiting for Deployment "openshift-network-diagnostics/network-check-source" to be created Waiting for Deployment "openshift-network-diagnostics/network-check-source" to be created Deployment "openshift-network-diagnostics/network-check-source" is not yet scheduled on any nodes Deployment "openshift-network-diagnostics/network-check-source" is not yet scheduled on any nodes Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready Deployment "openshift-network-diagnostics/network-check-source" is not available (awaiting 1 nodes) Deployment "openshift-network-diagnostics/network-check-source" is not available (awaiting 1 nodes) Deployment "openshift-network-diagnostics/network-check-source" is not available (awaiting 1 nodes)
Hi, @danw we may met the CNO Degraded is'True' when upgrade from 4.6 to 4.7 due to DaemonSet "openshift-network-diagnostics/network-check-target" is not available since the worker is Insufficient for schedule the pod, see: [2021-01-08T02:44:41.529Z] Name: network [2021-01-08T02:44:41.529Z] Namespace: [2021-01-08T02:44:41.529Z] Labels: <none> [2021-01-08T02:44:41.529Z] Annotations: network.operator.openshift.io/last-seen-state: [2021-01-08T02:44:41.529Z] {"DaemonsetStates":[{"Namespace":"openshift-network-diagnostics","Name":"network-check-target","LastSeenStatus":{"currentNumberScheduled":... [2021-01-08T02:44:41.529Z] API Version: config.openshift.io/v1 [2021-01-08T02:44:41.529Z] Kind: ClusterOperator ... [2021-01-08T02:44:41.529Z] Spec: [2021-01-08T02:44:41.529Z] Status: [2021-01-08T02:44:41.529Z] Conditions: [2021-01-08T02:44:41.529Z] Last Transition Time: 2021-01-08T00:28:10Z [2021-01-08T02:44:41.529Z] Message: DaemonSet "openshift-network-diagnostics/network-check-target" rollout is not making progress - last change 2021-01-08T00:14:26Z [2021-01-08T02:44:41.529Z] Reason: RolloutHung [2021-01-08T02:44:41.529Z] Status: True [2021-01-08T02:44:41.529Z] Type: Degraded [2021-01-08T02:44:41.529Z] Last Transition Time: 2021-01-07T22:38:25Z [2021-01-08T02:44:41.529Z] Status: True [2021-01-08T02:44:41.529Z] Type: Upgradeable [2021-01-08T02:44:41.529Z] Last Transition Time: 2021-01-08T00:13:10Z [2021-01-08T02:44:41.529Z] Message: DaemonSet "openshift-network-diagnostics/network-check-target" is not available (awaiting 2 nodes) [2021-01-08T02:44:41.529Z] Reason: Deploying [2021-01-08T02:44:41.529Z] Status: True [2021-01-08T02:44:41.529Z] Type: Progressing [2021-01-08T02:44:41.529Z] Last Transition Time: 2021-01-08T00:13:10Z [2021-01-08T02:44:41.529Z] Message: The network is starting up [2021-01-08T02:44:41.529Z] Reason: Startup [2021-01-08T02:44:41.529Z] Status: False [2021-01-08T02:44:41.529Z] Type: Available Check the logs of pod: lastTransitionTime: "2021-01-22T19:58:18Z" message: '0/6 nodes are available: 1 Insufficient memory, 5 node(s) didn''t match Pod''s node affinity.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending qosClass: Burstable this the network operator status: network 4.7.0-0.nightly-2021-01-22-134922 False True True 148m I think network operator Degraded should not be 'true' even the openshift-network-diagnostics/network-check-target pod is not running. this will affect the upgrade flow. So I reopen this issue.
The "don't mark CNO Degraded because of non-critical DaemonSets" hack only operates at cluster install time, because we know the cluster isn't fully functional at that point (no worker nodes, no Service CA Operator) and so some pods won't be able to be started. But during an *update*, the cluster is expected to remain fully operational at all times, so if network-check-target is not rolling out, that's actually a problem and *should* be reported. (And "0/6 nodes are available: 1 Insufficient memory" makes it sound like there's something wrong with this cluster.) Re-closing this bz, because CNO error reporting is working as expected. If you have must-gather from that cluster, or if you can reproduce this problem later, then please open a new bug about network-check-target not deploying successfully.
ok, thanks the reply @Dan, I thought network-diagnostics are not very important pods and it should not blocked the CNO status. anyway, the root reason is there is node has Insufficient memory cause the pod cannot be scheduled.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633