Bug 1862991 - image-registry operator fails to come up
Summary: image-registry operator fails to come up
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Gal Zaidman
QA Contact: Wenjing Zheng
Depends On:
TreeView+ depends on / blocked
Reported: 2020-08-03 12:28 UTC by Jan Zmeskal
Modified: 2022-10-11 12:50 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-10-27 16:22:34 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 585 0 None closed BUG 1862991: Remove storageclass from PVC creation 2021-02-08 09:25:42 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:23:00 UTC

Description Jan Zmeskal 2020-08-03 12:28:20 UTC
Description of problem:
OCP4.6 installation almost completes successfully (this was tested using WA outlined in description of BZ1862941), but one of the cluster operators (image-registry) never goes into Available state. 

Version-Release number of the following components:
oc get clusterversion
version             False       True          75m     Unable to apply 4.6.0-0.nightly-2020-08-03-054919: the cluster operator image-registry has not yet successfully rolled out

How reproducible:
Twice out of two attempts

Steps to Reproduce:
1. Run openshift-install create cluster
2. Wait for it to finish

Actual results:
image registry doesn't come up

Additional info:
oc get co: https://pastebin.com/SE3y2c2k
Installation log (debug verbosity): https://pastebin.com/Qkm4sgim
oc get po -n openshift-image-registry: https://pastebin.com/sAvxAGt8
oc describe pod/image-registry-8c6b76975-x6tb5 -n openshift-image-registry (warnings in Events section): https://pastebin.com/kSus7f9h 
- When running oc logs image-registry-8c6b76975-x6tb5 -n openshift-image-registry, nothing is returned

Comment 2 Gal Zaidman 2020-08-03 13:18:37 UTC
This is due to the ovirt csi operator merged into the cluster storage operator,
The image registry is trying to create a PVC with the "standard" storage class which is not available.
See the PR[1] for more information

[1] https://github.com/openshift/cluster-image-registry-operator/pull/585#issuecomment-667973155

Comment 4 Gal Zaidman 2020-08-04 09:34:23 UTC
A bit more explenation about the problem,
The current situation is that the cluster image registry operator is trying to create a PVC with the "standard" storage class which is not available any more due to the merge of the CSI operator for ovirt and openstack to the cluster storage operator.
When can solve it by not specifying the storage class and using the default storage class like suggested by the storage team on [2].
Thr problem is that both of the operators have the same runlevel 0000_50_(see [1] for reference) in the release image so they start in parallel, that is a problem because the image registry whats to use the storage class that the storage operator didn't create.
There are 2 ways of solving this issue:
1. change the run level of the cluster image registry operator to something like 0000_51_ and then it will run after the storage operator is done, and the storage class is available.
2. hardcode ovirt and openstack storage classes into the PVC (instead of the currect hardcoded "standard" class), that will cause the PVC to reconsile when the storage class is available, but it is not recommended to hardcore storage class name into the code. 

[1] https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level
[2] https://coreos.slack.com/archives/CBQHQFU0N/p1596452181198600

Comment 9 Wenjing Zheng 2020-08-18 08:53:21 UTC
Verified with below version on IPI on RHV cluster:
$ oc get pvc -n openshift-image-registry
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
image-registry-storage   Bound    pvc-3cfe572a-a0a0-494d-a3e9-1783d2c83a65   100Gi      RWO            ovirt-csi-sc   5d23h
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-12-062953   True        False         5d23h   Cluster version is 4.6.0-0.nightly-2020-08-12-062953

Comment 12 errata-xmlrpc 2020-10-27 16:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.