Bug 1922154
Summary: | Upon node reboot, crio and kubelet service fail to start | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | pdsilva |
Component: | Node | Assignee: | Giuseppe Scrivano <gscrivan> |
Node sub component: | CRI-O | QA Contact: | MinLi <minmli> |
Status: | CLOSED UPSTREAM | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aos-bugs, aprabhak, danili, mdunnett, nagrawal |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-03-19 11:29:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
pdsilva
2021-01-29 11:18:24 UTC
Hi Peter and Node team, I just wanted to state that this behavior is randomly reproducible, but when it occurs, it renders Node unusable. It is also worth noting that there is a similar GH issue where the same happened on podman: https://github.com/containers/podman/issues/5986 Therefore, I wanted to give you this update and leave the "Blocker?" flag evaluation up to your team, as this is currently a "High" severity. I think https://github.com/cri-o/cri-o/pull/3999 is a potential fix for such issues with the storage We just hit this trying to add a remote worker to a cluster. The rhel node rebooted and kubelet and crio were dead. We are using 4.6.12. Hi Giuseppe and node team, would it be possible if your team could let us know whether this fix will be in 4.7? If not, Archana and her team hope to notify the Power doc writers to include this bug in the release notes. Thank you! @Dan Li, it seems unlikely that the fix will hit 4.7 @Giuseppe Could you suggest any workaround which would help in getting the services back to running state on the node? Thanks the workaround is to `rm -rf /var/lib/containers/storage` and reboot the node we have a potential fix upstream but we are not backporting it to 4.7. So closing the issue for 4.7 |