Bug 1942536
Summary: | Corrupted image preventing containers from starting | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matthew Robson <mrobson> |
Component: | Node | Assignee: | Peter Hunt <pehunt> |
Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | aos-bugs, apizarro, bbaude, dornelas, dwalsh, jkaur, jligon, jnovy, lsm5, mheon, pthomas, smccarty, steven.barre, tsweeney, umohnani, vrothber |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
A sudden reboot while a container or image is being committed to disk
Consequence:
Corruption of container storage, causing failures to pull images or create containers
Fix:
Detect when a node has rebootted without a corresponding sync and clear container storage if so
Result:
The node is protected from sudden reboots
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:55:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1186913, 1995199 |
Description
Matthew Robson
2021-03-24 14:13:36 UTC
*** Bug 1950536 has been marked as a duplicate of this bug. *** We have a fix incoming for this in 4.8 (attached) but it will require some soak time and testing to make sure it doesn't break things (it already has broken some things in 4.8) before we backport *** Bug 1918126 has been marked as a duplicate of this bug. *** Followed reproducer steps from https://bugzilla.redhat.com/show_bug.cgi?id=1921128#c25 by hard rebooting all nodes couple of times. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-21-231018 True False 3h21m Cluster version is 4.8.0-0.nightly-2021-04-21-231018 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-215.us-east-2.compute.internal Ready worker 3h41m v1.21.0-rc.0+3ced7a9 ip-10-0-152-86.us-east-2.compute.internal Ready master 3h49m v1.21.0-rc.0+3ced7a9 ip-10-0-178-57.us-east-2.compute.internal Ready worker 3h42m v1.21.0-rc.0+3ced7a9 ip-10-0-184-90.us-east-2.compute.internal Ready master 3h49m v1.21.0-rc.0+3ced7a9 ip-10-0-214-243.us-east-2.compute.internal Ready master 3h49m v1.21.0-rc.0+3ced7a9 ip-10-0-221-20.us-east-2.compute.internal Ready worker 3h41m v1.21.0-rc.0+3ced7a9 $ oc debug node/ip-10-0-152-86.us-east-2.compute.internal Starting pod/ip-10-0-152-86us-east-2computeinternal-debug ... ... sh-4.4# journalctl | grep -i "Error: readlink" sh-4.4# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |