Bug 2115267

Summary: Failed to create clusters on AWS C2S/SC2S due to image-registry MissingEndpoint error
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.11CC: hongyli, jiazha, obulatov, sdodson, wking, yanyang, yunjiang
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: UpdateRecommendationsBlocked
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-23 18:29:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2083466    
Bug Blocks:    

Comment 4 W. Trevor King 2022-08-11 17:14:08 UTC
We're asking the following questions to evaluate whether or not this bug warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the ImpactStatementRequested label has been added to this bug. When responding, please remove ImpactStatementRequested and set the ImpactStatementProposed label. The expectation is that the assignee answers these questions.

Which 4.y.z to 4.y'.z' updates increase vulnerability? Which types of clusters?

reasoning: This allows us to populate from, to, and matchingRules in conditional update recommendations for "the $SOURCE_RELEASE to $TARGET_RELEASE update is not recommended for clusters like $THIS".
example: AWS clusters in region us-iso-east-1 (and other regions that don't have dualstack endpoints) updating from 4.10.z to 4.10.27.  Maybe also from 4.9.z to 4.10.27?  I dunno if this flavor of cluster was installable on 4.9.

What is the impact? Is it serious enough to warrant removing update recommendations?

reasoning: This allows us to populate name and message in conditional update recommendations for "...because if you update, $THESE_CONDITIONS may cause $THESE_UNFORTUNATE_SYMPTOMS".
example: Image registry goes Degraded=True with reason Unavailable, which blocks install from completing.  And also blocks updates into 4.10.27 from completing.

How involved is remediation?

reasoning: This allows administrators who are already vulnerable, or who chose to waive conditional-update risks, to recover their cluster. And even moderately serious impacts might be acceptable if they are easy to mitigate.
example: Issue resolves itself after five minutes.
example: Admin can run a single: oc ....
example: Admin must SSH to hosts, restore from backups, or other non standard admin activities.

Is this a regression?

reasoning: Updating between two vulnerable releases may not increase exposure (unless rebooting during the update increases vulnerability, etc.). We only qualify update recommendations if the update increases exposure.
example: Yes, [1], which landed in 4.10.27 via bug 2110963, introduced the regression.

[1]: https://github.com/openshift/cluster-image-registry-operator/pull/793

Comment 7 wewang 2022-08-17 08:33:45 UTC
Installed the cluster on aws c2s success
Version: 4.10.0-0.nightly-2022-08-16-180211



https://mastern-jenkins-xxxxxx/job/ocp-common/job/Flexy-install/130500/parameters/

Comment 9 Oleg Bulatov 2022-08-17 10:34:39 UTC
Which 4.y.z to 4.y'.z' updates increase vulnerability?

Anything to 4.10.27.


Which types of clusters?

AWS clusters in region us-iso-east-1 (and other regions that don't have dualstack endpoints)


What is the impact?

The image-registry operator cannot apply configuration nor deploy anything in these AWS regions. The diagnostics messages look like:

    unable to sync storage configuration: MissingEndpoint: 'Endpoint' configuration is required for this service


How involved is remediation?

The operator may need to be switched into Unmanaged mode to unblock upgrades. Or manually provide regionEndpoint for these AWS regions.


Is this a regression?

Yes, [1], which landed in 4.10.27 via bug 2110963, introduced the regression. 4.10.26 and below don't have this problem.


[1]: https://github.com/openshift/cluster-image-registry-operator/pull/793

Comment 10 wewang 2022-08-17 12:54:55 UTC
Revert to on_qa, tomorrow will have more upgrade test.

How about test the scenarios?

1. Upgrade from 4.9.46 to 4.10.0-0.nightly-2022-08-16-180211

2. Upgrade from 4.9.0-0.nightly-2022-08-11-185711 to 4.10.0-0.nightly-2022-08-16-180211

Comment 11 W. Trevor King 2022-08-18 02:57:50 UTC
Based on comment 9, we've tombstoned 4.10.27 [1].

[1]: https://github.com/openshift/cincinnati-graph-data/pull/2361

Comment 12 wewang 2022-08-18 05:35:13 UTC
1. Upgrade from 4.9.46 to 4.10.0-0.nightly-2022-08-16-180211

2. Upgrade from 4.9.0-0.nightly-2022-08-11-185711 to 4.10.0-0.nightly-2022-08-16-180211

image-registry did not meet the issue now, so verified it

Comment 14 errata-xmlrpc 2022-08-23 18:29:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.28 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6095