Bug 1995199
Summary: | After unexpected power outage some nodes could not start anything via crio | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | oarribas <oarribas> |
Component: | Node | Assignee: | Peter Hunt <pehunt> |
Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | high | CC: | aos-bugs, atomlin, cldavey, oarribas, obockows, openshift-bugs-escalate, pehunt |
Version: | 4.6 | Flags: | pehunt:
needinfo-
|
Target Milestone: | --- | ||
Target Release: | 4.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-08 13:17:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1942536 | ||
Bug Blocks: |
Description
oarribas
2021-08-18 15:52:36 UTC
Is there any chance the user can upgrade to 4.8? in it, we have an enhancement that allows CRI-O to remove the container storage dir if it wasn't cleanly shutdown, allowing for restarts to be safer and more contact free. fixed in 4.7 with the attached PR merging Checked on a baremetal cluster with 4.7.0-0.nightly-2021-09-01-110541. Forced shutdown few times. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-09-01-110541 True False 4h23m Cluster version is 4.7.0-0.nightly-2021-09-01-110541 $ oc get nodes NAME STATUS ROLES AGE VERSION master-00.sunilc020947.qe.devcluster.openshift.com Ready master 4h44m v1.20.0+9689d22 master-01.sunilc020947.qe.devcluster.openshift.com Ready master 4h46m v1.20.0+9689d22 master-02.sunilc020947.qe.devcluster.openshift.com Ready master 4h45m v1.20.0+9689d22 worker-00.sunilc020947.qe.devcluster.openshift.com Ready worker 4h35m v1.20.0+9689d22 worker-01.sunilc020947.qe.devcluster.openshift.com Ready worker 4h36m v1.20.0+9689d22 worker-02.sunilc020947.qe.devcluster.openshift.com Ready worker 4h35m v1.20.0+9689d22 $ oc debug node/worker-00.sunilc020947.qe.devcluster.openshift.com Starting pod/worker-00sunilc020947qedevclusteropenshiftcom-debug ... ... sh-4.4# chroot /host sh-4.4# echo b > /proc/sysrq-trigger Removing debug pod ... $ oc get nodes NAME STATUS ROLES AGE VERSION master-00.sunilc020947.qe.devcluster.openshift.com Ready master 4h48m v1.20.0+9689d22 master-01.sunilc020947.qe.devcluster.openshift.com Ready master 4h49m v1.20.0+9689d22 master-02.sunilc020947.qe.devcluster.openshift.com Ready master 4h49m v1.20.0+9689d22 worker-00.sunilc020947.qe.devcluster.openshift.com NotReady worker 4h39m v1.20.0+9689d22 worker-01.sunilc020947.qe.devcluster.openshift.com Ready worker 4h40m v1.20.0+9689d22 worker-02.sunilc020947.qe.devcluster.openshift.com Ready worker 4h39m v1.20.0+9689d22 $ oc get nodes NAME STATUS ROLES AGE VERSION master-00.sunilc020947.qe.devcluster.openshift.com Ready master 4h59m v1.20.0+9689d22 master-01.sunilc020947.qe.devcluster.openshift.com Ready master 5h v1.20.0+9689d22 master-02.sunilc020947.qe.devcluster.openshift.com Ready master 4h59m v1.20.0+9689d22 worker-00.sunilc020947.qe.devcluster.openshift.com Ready worker 4h49m v1.20.0+9689d22 worker-01.sunilc020947.qe.devcluster.openshift.com Ready worker 4h50m v1.20.0+9689d22 worker-02.sunilc020947.qe.devcluster.openshift.com Ready worker 4h49m v1.20.0+9689d22 $ oc debug node/worker-00.sunilc020947.qe.devcluster.openshift.com Starting pod/worker-00sunilc020947qedevclusteropenshiftcom-debug ... ... sh-4.4# systemctl status crio ● crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf Active: active (running) since Thu 2021-09-02 16:39:06 UTC; 7min ago Docs: https://github.com/cri-o/cri-o Main PID: 2869 (crio) Tasks: 49 Memory: 2.7G CPU: 1min 51.829s CGroup: /system.slice/crio.service └─2869 /usr/bin/crio --enable-metrics=true --metrics-port=9537 ... sh-4.4# crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME 9f68c39a30f87 28 seconds ago Ready worker-00sunilc020947qedevclusteropenshiftcom-debug default 0 (default) 4a6a66db421b1 4 minutes ago NotReady community-operators-gn2c9 openshift-marketplace 0 (default) 370998fa67eaa 4 minutes ago NotReady community-operators-72lzd openshift-marketplace 0 (default) 85183fcab710a 7 minutes ago Ready tuned-8dzdl openshift-cluster-node-tuning-operator 0 (default) b945ec336c835 7 minutes ago Ready node-ca-5rvwv openshift-image-registry 0 (default) d9890b5439648 7 minutes ago Ready network-check-target-lk8fn openshift-network-diagnostics 0 (default) 847a64a273bcb 7 minutes ago Ready network-metrics-daemon-gc6qg openshift-multus 0 (default) 7c04ad0538605 7 minutes ago Ready redhat-marketplace-v7vrv openshift-marketplace 0 (default) 1a37f052e8fad 7 minutes ago Ready machine-config-daemon-vwwqc openshift-machine-config-operator 0 (default) 7092983a9cf33 7 minutes ago Ready community-operators-gmcc9 openshift-marketplace 0 (default) 2e2e850d0d469 7 minutes ago Ready multus-jnq2l openshift-multus 0 (default) 2efa4c2748458 7 minutes ago Ready node-exporter-657sw openshift-monitoring 0 (default) 81929f275b8b1 7 minutes ago Ready sdn-hr2qd openshift-sdn 0 (default) 1c6c6f9c5a6ac 7 minutes ago Ready dns-default-46gqg openshift-dns 0 (default) 4cc58a720a731 7 minutes ago Ready ingress-canary-k7l7t openshift-ingress-canary 0 (default) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.29 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3303 |