Bug 1999591
| Summary: | internal registry is rejecting the container creation due to sha256 layer mismatch | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pamela Escorza <pescorza> |
| Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> |
| Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.7 | CC: | aos-bugs, david.karlsen, dfuller, hchiramm, jack.ottofaro, jlayton, jsafrane, kgordeev, luaparicio, luca.mercuri, mhackett, mnunes, obulatov, pdonnell, sdodson, sostapov, vcojot, wking |
| Target Milestone: | --- | Keywords: | UpgradeBlocker |
| Target Release: | 4.7.z | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | UpdateRecommendationsBlocked | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-15 09:16:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pamela Escorza
2021-08-31 11:42:59 UTC
Hi @jsafrane, thanks for your help, let me confirm your doubt with CU We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, it’s always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1 Setting NEEDINFO for the impact statement in comment 13th. > Who is impacted? Not only registry on cephfs is affected. Basically any data on cephfs may be corrupted. It could be a harmless log, but it could be a critical database too. > How involved is remediation For random corrupted data, restore them from backup. In addition, the cluster does not report any error. Users may find out pretty late that their data is corrupted (and maybe even backed up). (In reply to Oleg Bulatov from comment #17) > Who is impacted? If we have to block upgrade edges based on this issue, > which edges would need blocking? > > Customers, who use 4.7.24 and use PV with cephfs for the image registry. Is this 4.7.24-only? 4.7.28 is not vulnerable? No vulnerable 4.8 releases? AFAIK 4.7.24+ and 4.8.0+ are vulnerable. Edits upon comment 17 from Oleg, since I happen to know this affects all 4.8 as well. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? Customers, who use 4.7.24+ or 4.8.2+ and use PV with cephfs for the image registry. At time of writing, this is not fixed in a later 4.7.z or 4.8.z. What is the impact? Is it serious enough to warrant blocking edges? The registry storage irreversibly corrupts container images. Corrupted layers cannot be pulled/re-pushed, manual intervention is required. How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? Admin must rsh into the registry container and delete corrupted blobs and layer links. Corrupted images can only be re-pushed/re-built. Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? Yes, the regression is introduced at 4.7.24 and 4.8.2. At time of writing, this is not fixed in a later 4.7.z or 4.8.z. This is believed fixed with kernel-4.18.0-305.19.1.el8_4 in 4.7.30. The same kernel is in use in 4.8.30 and latest 4.9 nightlies - would be wise to test there as well, but I'll direct this bug at 4.7 (not sure a clone is needed for other releases). (In reply to Luke Meyer from comment #31) > The same kernel is in use in 4.8.30 :facepalm: I meant 4.8.11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.30 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3422 Catching up here, two weeks ago we blocked edges into 4.7.29 and 4.8.10 (on top of some impacted edges that had already been blocked for other reasons) in [1,2], based on the impact statement from comment 23. [1]: https://github.com/openshift/cincinnati-graph-data/pull/1033 [2]: https://github.com/openshift/cincinnati-graph-data/pull/1034 |