Description of problem: OCP4.6 installation almost completes successfully (this was tested using WA outlined in description of BZ1862941), but one of the cluster operators (image-registry) never goes into Available state. Version-Release number of the following components: oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 75m Unable to apply 4.6.0-0.nightly-2020-08-03-054919: the cluster operator image-registry has not yet successfully rolled out How reproducible: Twice out of two attempts Steps to Reproduce: 1. Run openshift-install create cluster 2. Wait for it to finish Actual results: image registry doesn't come up Additional info: oc get co: https://pastebin.com/SE3y2c2k Installation log (debug verbosity): https://pastebin.com/Qkm4sgim oc get po -n openshift-image-registry: https://pastebin.com/sAvxAGt8 oc describe pod/image-registry-8c6b76975-x6tb5 -n openshift-image-registry (warnings in Events section): https://pastebin.com/kSus7f9h - When running oc logs image-registry-8c6b76975-x6tb5 -n openshift-image-registry, nothing is returned
This is due to the ovirt csi operator merged into the cluster storage operator, The image registry is trying to create a PVC with the "standard" storage class which is not available. See the PR[1] for more information [1] https://github.com/openshift/cluster-image-registry-operator/pull/585#issuecomment-667973155
A bit more explenation about the problem, The current situation is that the cluster image registry operator is trying to create a PVC with the "standard" storage class which is not available any more due to the merge of the CSI operator for ovirt and openstack to the cluster storage operator. When can solve it by not specifying the storage class and using the default storage class like suggested by the storage team on [2]. Thr problem is that both of the operators have the same runlevel 0000_50_(see [1] for reference) in the release image so they start in parallel, that is a problem because the image registry whats to use the storage class that the storage operator didn't create. There are 2 ways of solving this issue: 1. change the run level of the cluster image registry operator to something like 0000_51_ and then it will run after the storage operator is done, and the storage class is available. 2. hardcode ovirt and openstack storage classes into the PVC (instead of the currect hardcoded "standard" class), that will cause the PVC to reconsile when the storage class is available, but it is not recommended to hardcore storage class name into the code. [1] https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level [2] https://coreos.slack.com/archives/CBQHQFU0N/p1596452181198600
Verified with below version on IPI on RHV cluster: $ oc get pvc -n openshift-image-registry NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE image-registry-storage Bound pvc-3cfe572a-a0a0-494d-a3e9-1783d2c83a65 100Gi RWO ovirt-csi-sc 5d23h $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-12-062953 True False 5d23h Cluster version is 4.6.0-0.nightly-2020-08-12-062953
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196