Bug 1937888
Summary: | reconciling node-exporter DaemonSet failed when upgrading from 4.1.41 to 4.2.36 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Paige Rubendall <prubenda> |
Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> |
Status: | CLOSED EOL | QA Contact: | Junqi Zhao <juzhao> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2.0 | CC: | alegrand, anpicker, dgrisonn, erooth, kakkoyun, kewang, lcosic, pkrupa, prubenda, surbania |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-12 13:50:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Paige Rubendall
2021-03-11 17:18:38 UTC
We still hit this problem when upgrade from 4.7 to 4.8 nightly, so I have to reopen the bug. Upgrade command: ./oc adm upgrade --to-image=vmc.mirror-registry.qe.devcluster.openshift.com:5000/openshift-release-dev/ocp-release:4.8.0-0.nightly-2021-04-25-110331 --force=true --allow-explicit-upgrade=true $ oc get node NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME compute-0 Ready worker 4h26m v1.20.0+7d0a2b2 172.31.246.44 172.31.246.44 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 compute-1 NotReady,SchedulingDisabled worker 4h26m v1.20.0+7d0a2b2 172.31.246.61 172.31.246.61 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 control-plane-0 Ready master 4h39m v1.20.0+7d0a2b2 172.31.246.28 172.31.246.28 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 control-plane-1 NotReady,SchedulingDisabled master 4h39m v1.20.0+7d0a2b2 172.31.246.52 172.31.246.52 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 control-plane-2 Ready master 4h39m v1.20.0+7d0a2b2 172.31.246.41 172.31.246.41 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-2021-04-25-110331 True False True 84m baremetal 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m cloud-credential 4.8.0-0.nightly-2021-04-25-110331 True False False 4h39m cluster-autoscaler 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m config-operator 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m console 4.8.0-0.nightly-2021-04-25-110331 True False False 153m csi-snapshot-controller 4.8.0-0.nightly-2021-04-25-110331 True False False 149m dns 4.8.0-0.nightly-2021-04-25-110331 True False True 4h33m etcd 4.8.0-0.nightly-2021-04-25-110331 True False True 4h33m image-registry 4.8.0-0.nightly-2021-04-25-110331 True False False 88m ingress 4.8.0-0.nightly-2021-04-25-110331 True False True 4h25m insights 4.8.0-0.nightly-2021-04-25-110331 True False False 4h31m kube-apiserver 4.8.0-0.nightly-2021-04-25-110331 True False True 4h31m kube-controller-manager 4.8.0-0.nightly-2021-04-25-110331 True False True 4h32m kube-scheduler 4.8.0-0.nightly-2021-04-25-110331 True False True 4h33m kube-storage-version-migrator 4.8.0-0.nightly-2021-04-25-110331 True False False 88m machine-api 4.8.0-0.nightly-2021-04-25-110331 True False False 4h25m machine-approver 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m machine-config 4.7.0-0.nightly-2021-04-25-102429 False True True 132m marketplace 4.8.0-0.nightly-2021-04-25-110331 True False False 3h8m monitoring 4.8.0-0.nightly-2021-04-25-110331 False True True 83m network 4.8.0-0.nightly-2021-04-25-110331 True True True 4h38m node-tuning 4.8.0-0.nightly-2021-04-25-110331 True False False 153m openshift-apiserver 4.8.0-0.nightly-2021-04-25-110331 True False True 84m openshift-controller-manager 4.8.0-0.nightly-2021-04-25-110331 True False False 4h33m openshift-samples 4.8.0-0.nightly-2021-04-25-110331 True False False 153m operator-lifecycle-manager 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m operator-lifecycle-manager-catalog 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m operator-lifecycle-manager-packageserver 4.8.0-0.nightly-2021-04-25-110331 True False False 153m service-ca 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m storage 4.8.0-0.nightly-2021-04-25-110331 True False False 3h10m $ oc describe co/machine-config Name: machine-config Namespace: Labels: <none> Annotations: exclude.release.openshift.io/internal-openshift-hosted: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2021-04-25T16:42:19Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:exclude.release.openshift.io/internal-openshift-hosted: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:spec: f:status: .: f:versions: Manager: cluster-version-operator Operation: Update Time: 2021-04-25T16:42:19Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:extension: .: f:master: f:worker: f:relatedObjects: f:versions: Manager: machine-config-operator Operation: Update Time: 2021-04-25T20:05:22Z Resource Version: 141689 UID: cdf4f3fa-b3ca-4abe-bdb2-63dff6efbe9a Spec: Status: Conditions: Last Transition Time: 2021-04-25T19:01:50Z Message: Working towards 4.8.0-0.nightly-2021-04-25-110331 Status: True Type: Progressing Last Transition Time: 2021-04-25T20:05:22Z Message: Unable to apply 4.8.0-0.nightly-2021-04-25-110331: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 5, updated: 5, ready: 3, unavailable: 2) Reason: MachineConfigDaemonFailed Status: True Type: Degraded Last Transition Time: 2021-04-25T19:11:51Z Message: Cluster not available for 4.8.0-0.nightly-2021-04-25-110331 Status: False Type: Available Last Transition Time: 2021-04-25T19:55:23Z Message: One or more machine config pools are updating, please see `oc get mcp` for further details Reason: PoolUpdating Status: False Type: Upgradeable Extension: Master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-3769e7e060d5890610360c5d5513eaa8 Worker: 0 (ready 0) out of 2 nodes are updating to latest configuration rendered-worker-5570462504cf4f902167d5d3ce228ac4 Related Objects: Group: Name: openshift-machine-config-operator Resource: namespaces Group: machineconfiguration.openshift.io Name: Resource: machineconfigpools Group: machineconfiguration.openshift.io Name: Resource: controllerconfigs Group: machineconfiguration.openshift.io Name: Resource: kubeletconfigs Group: machineconfiguration.openshift.io Name: Resource: containerruntimeconfigs Group: machineconfiguration.openshift.io Name: Resource: machineconfigs Group: Name: Resource: nodes Group: Name: openshift-kni-infra Resource: namespaces Group: Name: openshift-openstack-infra Resource: namespaces Group: Name: openshift-ovirt-infra Resource: namespaces Group: Name: openshift-vsphere-infra Resource: namespaces Versions: Name: operator Version: 4.7.0-0.nightly-2021-04-25-102429 Events: <none> I did a quick search in logs of must-gather, $ grep -nr 'E0425 20:05' namespaces/openshift-machine-config-operator/pods/machine-config-operator-54b676975d-msxnw/machine-config-operator/machine-config-operator/logs/current.log:627:2021-04-25T20:05:22.414938630Z E0425 20:05:22.414822 1 sync.go:639] Error syncing Required MachineConfigPools: "pool master has not progressed to latest configuration: controller version mismatch for rendered-master-6c5851d411826109697ed6d4b1f404b6 expected 0c69300057bac1ea65d544ab0e22b378690b2488 has ac79a2ffc6002f086b3fa80003b278b635d2055a: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-3769e7e060d5890610360c5d5513eaa8, retrying" namespaces/openshift-monitoring/pods/cluster-monitoring-operator-8767f9b5d-xfdcg/cluster-monitoring-operator/cluster-monitoring-operator/logs/current.log:84:2021-04-25T20:05:19.044608892Z E0425 20:05:19.044560 1 operator.go:399] Syncing "openshift-monitoring/cluster-monitoring-config" failed namespaces/openshift-monitoring/pods/cluster-monitoring-operator-8767f9b5d-xfdcg/cluster-monitoring-operator/cluster-monitoring-operator/logs/current.log:85:2021-04-25T20:05:19.044608892Z E0425 20:05:19.044589 1 operator.go:400] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of openshift-monitoring/node-exporter: got 2 unavailable nodes ... we can found at time 20:05:19.044589 third error log, this is root reason caused the cluster operation monitoring DEGRADED. (In reply to Ke Wang from comment #3) > We still hit this problem when upgrade from 4.7 to 4.8 nightly, so I have to > reopen the bug. network is also degraded, so affect monitoring. |