Bug 1995199
| Summary: | After unexpected power outage some nodes could not start anything via crio | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | oarribas <oarribas> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | high | CC: | aos-bugs, atomlin, cldavey, oarribas, obockows, openshift-bugs-escalate, pehunt |
| Version: | 4.6 | Flags: | pehunt:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | 4.7.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-08 13:17:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1942536 | ||
| Bug Blocks: | |||
|
Description
oarribas
2021-08-18 15:52:36 UTC
Is there any chance the user can upgrade to 4.8? in it, we have an enhancement that allows CRI-O to remove the container storage dir if it wasn't cleanly shutdown, allowing for restarts to be safer and more contact free. fixed in 4.7 with the attached PR merging Checked on a baremetal cluster with 4.7.0-0.nightly-2021-09-01-110541. Forced shutdown few times.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2021-09-01-110541 True False 4h23m Cluster version is 4.7.0-0.nightly-2021-09-01-110541
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-00.sunilc020947.qe.devcluster.openshift.com Ready master 4h44m v1.20.0+9689d22
master-01.sunilc020947.qe.devcluster.openshift.com Ready master 4h46m v1.20.0+9689d22
master-02.sunilc020947.qe.devcluster.openshift.com Ready master 4h45m v1.20.0+9689d22
worker-00.sunilc020947.qe.devcluster.openshift.com Ready worker 4h35m v1.20.0+9689d22
worker-01.sunilc020947.qe.devcluster.openshift.com Ready worker 4h36m v1.20.0+9689d22
worker-02.sunilc020947.qe.devcluster.openshift.com Ready worker 4h35m v1.20.0+9689d22
$ oc debug node/worker-00.sunilc020947.qe.devcluster.openshift.com
Starting pod/worker-00sunilc020947qedevclusteropenshiftcom-debug ...
...
sh-4.4# chroot /host
sh-4.4# echo b > /proc/sysrq-trigger
Removing debug pod ...
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-00.sunilc020947.qe.devcluster.openshift.com Ready master 4h48m v1.20.0+9689d22
master-01.sunilc020947.qe.devcluster.openshift.com Ready master 4h49m v1.20.0+9689d22
master-02.sunilc020947.qe.devcluster.openshift.com Ready master 4h49m v1.20.0+9689d22
worker-00.sunilc020947.qe.devcluster.openshift.com NotReady worker 4h39m v1.20.0+9689d22
worker-01.sunilc020947.qe.devcluster.openshift.com Ready worker 4h40m v1.20.0+9689d22
worker-02.sunilc020947.qe.devcluster.openshift.com Ready worker 4h39m v1.20.0+9689d22
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-00.sunilc020947.qe.devcluster.openshift.com Ready master 4h59m v1.20.0+9689d22
master-01.sunilc020947.qe.devcluster.openshift.com Ready master 5h v1.20.0+9689d22
master-02.sunilc020947.qe.devcluster.openshift.com Ready master 4h59m v1.20.0+9689d22
worker-00.sunilc020947.qe.devcluster.openshift.com Ready worker 4h49m v1.20.0+9689d22
worker-01.sunilc020947.qe.devcluster.openshift.com Ready worker 4h50m v1.20.0+9689d22
worker-02.sunilc020947.qe.devcluster.openshift.com Ready worker 4h49m v1.20.0+9689d22
$ oc debug node/worker-00.sunilc020947.qe.devcluster.openshift.com
Starting pod/worker-00sunilc020947qedevclusteropenshiftcom-debug ...
...
sh-4.4# systemctl status crio
● crio.service - Open Container Initiative Daemon
Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/crio.service.d
└─10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
Active: active (running) since Thu 2021-09-02 16:39:06 UTC; 7min ago
Docs: https://github.com/cri-o/cri-o
Main PID: 2869 (crio)
Tasks: 49
Memory: 2.7G
CPU: 1min 51.829s
CGroup: /system.slice/crio.service
└─2869 /usr/bin/crio --enable-metrics=true --metrics-port=9537
...
sh-4.4# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
9f68c39a30f87 28 seconds ago Ready worker-00sunilc020947qedevclusteropenshiftcom-debug default 0 (default)
4a6a66db421b1 4 minutes ago NotReady community-operators-gn2c9 openshift-marketplace 0 (default)
370998fa67eaa 4 minutes ago NotReady community-operators-72lzd openshift-marketplace 0 (default)
85183fcab710a 7 minutes ago Ready tuned-8dzdl openshift-cluster-node-tuning-operator 0 (default)
b945ec336c835 7 minutes ago Ready node-ca-5rvwv openshift-image-registry 0 (default)
d9890b5439648 7 minutes ago Ready network-check-target-lk8fn openshift-network-diagnostics 0 (default)
847a64a273bcb 7 minutes ago Ready network-metrics-daemon-gc6qg openshift-multus 0 (default)
7c04ad0538605 7 minutes ago Ready redhat-marketplace-v7vrv openshift-marketplace 0 (default)
1a37f052e8fad 7 minutes ago Ready machine-config-daemon-vwwqc openshift-machine-config-operator 0 (default)
7092983a9cf33 7 minutes ago Ready community-operators-gmcc9 openshift-marketplace 0 (default)
2e2e850d0d469 7 minutes ago Ready multus-jnq2l openshift-multus 0 (default)
2efa4c2748458 7 minutes ago Ready node-exporter-657sw openshift-monitoring 0 (default)
81929f275b8b1 7 minutes ago Ready sdn-hr2qd openshift-sdn 0 (default)
1c6c6f9c5a6ac 7 minutes ago Ready dns-default-46gqg openshift-dns 0 (default)
4cc58a720a731 7 minutes ago Ready ingress-canary-k7l7t openshift-ingress-canary 0 (default)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.29 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3303 |