Bug 1937888
| Summary: | reconciling node-exporter DaemonSet failed when upgrading from 4.1.41 to 4.2.36 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Paige Rubendall <prubenda> |
| Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> |
| Status: | CLOSED EOL | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2.0 | CC: | alegrand, anpicker, dgrisonn, erooth, kakkoyun, kewang, lcosic, pkrupa, prubenda, surbania |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-12 13:50:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Paige Rubendall
2021-03-11 17:18:38 UTC
We still hit this problem when upgrade from 4.7 to 4.8 nightly, so I have to reopen the bug.
Upgrade command: ./oc adm upgrade --to-image=vmc.mirror-registry.qe.devcluster.openshift.com:5000/openshift-release-dev/ocp-release:4.8.0-0.nightly-2021-04-25-110331 --force=true --allow-explicit-upgrade=true
$ oc get node
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
compute-0 Ready worker 4h26m v1.20.0+7d0a2b2 172.31.246.44 172.31.246.44 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8
compute-1 NotReady,SchedulingDisabled worker 4h26m v1.20.0+7d0a2b2 172.31.246.61 172.31.246.61 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8
control-plane-0 Ready master 4h39m v1.20.0+7d0a2b2 172.31.246.28 172.31.246.28 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8
control-plane-1 NotReady,SchedulingDisabled master 4h39m v1.20.0+7d0a2b2 172.31.246.52 172.31.246.52 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8
control-plane-2 Ready master 4h39m v1.20.0+7d0a2b2 172.31.246.41 172.31.246.41 Red Hat Enterprise Linux CoreOS 47.83.202104250838-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.8.0-0.nightly-2021-04-25-110331 True False True 84m
baremetal 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
cloud-credential 4.8.0-0.nightly-2021-04-25-110331 True False False 4h39m
cluster-autoscaler 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
config-operator 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
console 4.8.0-0.nightly-2021-04-25-110331 True False False 153m
csi-snapshot-controller 4.8.0-0.nightly-2021-04-25-110331 True False False 149m
dns 4.8.0-0.nightly-2021-04-25-110331 True False True 4h33m
etcd 4.8.0-0.nightly-2021-04-25-110331 True False True 4h33m
image-registry 4.8.0-0.nightly-2021-04-25-110331 True False False 88m
ingress 4.8.0-0.nightly-2021-04-25-110331 True False True 4h25m
insights 4.8.0-0.nightly-2021-04-25-110331 True False False 4h31m
kube-apiserver 4.8.0-0.nightly-2021-04-25-110331 True False True 4h31m
kube-controller-manager 4.8.0-0.nightly-2021-04-25-110331 True False True 4h32m
kube-scheduler 4.8.0-0.nightly-2021-04-25-110331 True False True 4h33m
kube-storage-version-migrator 4.8.0-0.nightly-2021-04-25-110331 True False False 88m
machine-api 4.8.0-0.nightly-2021-04-25-110331 True False False 4h25m
machine-approver 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
machine-config 4.7.0-0.nightly-2021-04-25-102429 False True True 132m
marketplace 4.8.0-0.nightly-2021-04-25-110331 True False False 3h8m
monitoring 4.8.0-0.nightly-2021-04-25-110331 False True True 83m
network 4.8.0-0.nightly-2021-04-25-110331 True True True 4h38m
node-tuning 4.8.0-0.nightly-2021-04-25-110331 True False False 153m
openshift-apiserver 4.8.0-0.nightly-2021-04-25-110331 True False True 84m
openshift-controller-manager 4.8.0-0.nightly-2021-04-25-110331 True False False 4h33m
openshift-samples 4.8.0-0.nightly-2021-04-25-110331 True False False 153m
operator-lifecycle-manager 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
operator-lifecycle-manager-catalog 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
operator-lifecycle-manager-packageserver 4.8.0-0.nightly-2021-04-25-110331 True False False 153m
service-ca 4.8.0-0.nightly-2021-04-25-110331 True False False 4h38m
storage 4.8.0-0.nightly-2021-04-25-110331 True False False 3h10m
$ oc describe co/machine-config
Name: machine-config
Namespace:
Labels: <none>
Annotations: exclude.release.openshift.io/internal-openshift-hosted: true
include.release.openshift.io/self-managed-high-availability: true
include.release.openshift.io/single-node-developer: true
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2021-04-25T16:42:19Z
Generation: 1
Managed Fields:
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:exclude.release.openshift.io/internal-openshift-hosted:
f:include.release.openshift.io/self-managed-high-availability:
f:include.release.openshift.io/single-node-developer:
f:spec:
f:status:
.:
f:versions:
Manager: cluster-version-operator
Operation: Update
Time: 2021-04-25T16:42:19Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:extension:
.:
f:master:
f:worker:
f:relatedObjects:
f:versions:
Manager: machine-config-operator
Operation: Update
Time: 2021-04-25T20:05:22Z
Resource Version: 141689
UID: cdf4f3fa-b3ca-4abe-bdb2-63dff6efbe9a
Spec:
Status:
Conditions:
Last Transition Time: 2021-04-25T19:01:50Z
Message: Working towards 4.8.0-0.nightly-2021-04-25-110331
Status: True
Type: Progressing
Last Transition Time: 2021-04-25T20:05:22Z
Message: Unable to apply 4.8.0-0.nightly-2021-04-25-110331: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 5, updated: 5, ready: 3, unavailable: 2)
Reason: MachineConfigDaemonFailed
Status: True
Type: Degraded
Last Transition Time: 2021-04-25T19:11:51Z
Message: Cluster not available for 4.8.0-0.nightly-2021-04-25-110331
Status: False
Type: Available
Last Transition Time: 2021-04-25T19:55:23Z
Message: One or more machine config pools are updating, please see `oc get mcp` for further details
Reason: PoolUpdating
Status: False
Type: Upgradeable
Extension:
Master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-3769e7e060d5890610360c5d5513eaa8
Worker: 0 (ready 0) out of 2 nodes are updating to latest configuration rendered-worker-5570462504cf4f902167d5d3ce228ac4
Related Objects:
Group:
Name: openshift-machine-config-operator
Resource: namespaces
Group: machineconfiguration.openshift.io
Name:
Resource: machineconfigpools
Group: machineconfiguration.openshift.io
Name:
Resource: controllerconfigs
Group: machineconfiguration.openshift.io
Name:
Resource: kubeletconfigs
Group: machineconfiguration.openshift.io
Name:
Resource: containerruntimeconfigs
Group: machineconfiguration.openshift.io
Name:
Resource: machineconfigs
Group:
Name:
Resource: nodes
Group:
Name: openshift-kni-infra
Resource: namespaces
Group:
Name: openshift-openstack-infra
Resource: namespaces
Group:
Name: openshift-ovirt-infra
Resource: namespaces
Group:
Name: openshift-vsphere-infra
Resource: namespaces
Versions:
Name: operator
Version: 4.7.0-0.nightly-2021-04-25-102429
Events: <none>
I did a quick search in logs of must-gather, $ grep -nr 'E0425 20:05' namespaces/openshift-machine-config-operator/pods/machine-config-operator-54b676975d-msxnw/machine-config-operator/machine-config-operator/logs/current.log:627:2021-04-25T20:05:22.414938630Z E0425 20:05:22.414822 1 sync.go:639] Error syncing Required MachineConfigPools: "pool master has not progressed to latest configuration: controller version mismatch for rendered-master-6c5851d411826109697ed6d4b1f404b6 expected 0c69300057bac1ea65d544ab0e22b378690b2488 has ac79a2ffc6002f086b3fa80003b278b635d2055a: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-3769e7e060d5890610360c5d5513eaa8, retrying" namespaces/openshift-monitoring/pods/cluster-monitoring-operator-8767f9b5d-xfdcg/cluster-monitoring-operator/cluster-monitoring-operator/logs/current.log:84:2021-04-25T20:05:19.044608892Z E0425 20:05:19.044560 1 operator.go:399] Syncing "openshift-monitoring/cluster-monitoring-config" failed namespaces/openshift-monitoring/pods/cluster-monitoring-operator-8767f9b5d-xfdcg/cluster-monitoring-operator/cluster-monitoring-operator/logs/current.log:85:2021-04-25T20:05:19.044608892Z E0425 20:05:19.044589 1 operator.go:400] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of openshift-monitoring/node-exporter: got 2 unavailable nodes ... we can found at time 20:05:19.044589 third error log, this is root reason caused the cluster operation monitoring DEGRADED. (In reply to Ke Wang from comment #3) > We still hit this problem when upgrade from 4.7 to 4.8 nightly, so I have to > reopen the bug. network is also degraded, so affect monitoring. |