Bug 1862991

Summary: image-registry operator fails to come up
Product: OpenShift Container Platform Reporter: Jan Zmeskal <jzmeskal>
Component: Image RegistryAssignee: Gal Zaidman <gzaidman>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.6CC: abhinkum, aos-bugs, gzaidman, jiazha, lmohanty, pasik, pelauter, wking, xiuwang
Target Milestone: ---Keywords: TestBlockerForLayeredProduct, Upgrades
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:22:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Zmeskal 2020-08-03 12:28:20 UTC
Description of problem:
OCP4.6 installation almost completes successfully (this was tested using WA outlined in description of BZ1862941), but one of the cluster operators (image-registry) never goes into Available state. 

Version-Release number of the following components:
oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          75m     Unable to apply 4.6.0-0.nightly-2020-08-03-054919: the cluster operator image-registry has not yet successfully rolled out

How reproducible:
Twice out of two attempts

Steps to Reproduce:
1. Run openshift-install create cluster
2. Wait for it to finish

Actual results:
image registry doesn't come up

Additional info:
oc get co: https://pastebin.com/SE3y2c2k
Installation log (debug verbosity): https://pastebin.com/Qkm4sgim
oc get po -n openshift-image-registry: https://pastebin.com/sAvxAGt8
oc describe pod/image-registry-8c6b76975-x6tb5 -n openshift-image-registry (warnings in Events section): https://pastebin.com/kSus7f9h 
- When running oc logs image-registry-8c6b76975-x6tb5 -n openshift-image-registry, nothing is returned

Comment 2 Gal Zaidman 2020-08-03 13:18:37 UTC
This is due to the ovirt csi operator merged into the cluster storage operator,
The image registry is trying to create a PVC with the "standard" storage class which is not available.
See the PR[1] for more information

[1] https://github.com/openshift/cluster-image-registry-operator/pull/585#issuecomment-667973155

Comment 4 Gal Zaidman 2020-08-04 09:34:23 UTC
A bit more explenation about the problem,
The current situation is that the cluster image registry operator is trying to create a PVC with the "standard" storage class which is not available any more due to the merge of the CSI operator for ovirt and openstack to the cluster storage operator.
When can solve it by not specifying the storage class and using the default storage class like suggested by the storage team on [2].
Thr problem is that both of the operators have the same runlevel 0000_50_(see [1] for reference) in the release image so they start in parallel, that is a problem because the image registry whats to use the storage class that the storage operator didn't create.
There are 2 ways of solving this issue:
1. change the run level of the cluster image registry operator to something like 0000_51_ and then it will run after the storage operator is done, and the storage class is available.
2. hardcode ovirt and openstack storage classes into the PVC (instead of the currect hardcoded "standard" class), that will cause the PVC to reconsile when the storage class is available, but it is not recommended to hardcore storage class name into the code. 

[1] https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level
[2] https://coreos.slack.com/archives/CBQHQFU0N/p1596452181198600

Comment 9 Wenjing Zheng 2020-08-18 08:53:21 UTC
Verified with below version on IPI on RHV cluster:
$ oc get pvc -n openshift-image-registry
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
image-registry-storage   Bound    pvc-3cfe572a-a0a0-494d-a3e9-1783d2c83a65   100Gi      RWO            ovirt-csi-sc   5d23h
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-12-062953   True        False         5d23h   Cluster version is 4.6.0-0.nightly-2020-08-12-062953

Comment 12 errata-xmlrpc 2020-10-27 16:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196