Bug 1871820

Summary: Regression: image-registry fails to come up on RHV
Product: OpenShift Container Platform Reporter: Jan Zmeskal <jzmeskal>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED DUPLICATE QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-24 14:14:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Zmeskal 2020-08-24 11:19:41 UTC
Description of problem:
This issue has originally been reported in https://bugzilla.redhat.com/show_bug.cgi?id=1862991. However, fix was provided by Gal Zaidman and verified by Wenjing Zheng. I also deployed OCP4.6 couple of times since then and can confirm that I wasn't hitting this bug.

However, on August 24 2020 I deployed two OCP4.6 clusters on different RHV environments and both deployemnts failed on this. Therefore I decided to open a new bug because the root cause might be different.


Version-Release number of selected component (if applicable):
oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          63m     Unable to apply 4.6.0-0.nightly-2020-08-24-034934: the cluster operator image-registry has not yet successfully rolled out

How reproducible:
100 %

Steps to Reproduce:
1. Run openshift-installer on top of RHV (my install-config here: http://pastebin.test.redhat.com/895896)
2. Wait until the installation fails
3. Check the state of image-registry operator

Actual results:
oc status: http://pastebin.test.redhat.com/895931
oc get all -n openshift-image-registry: http://pastebin.test.redhat.com/895936
oc describe deployment.apps/image-registry: http://pastebin.test.redhat.com/895935
oc describe pod/image-registry-6464f46dc7-ckpb2: http://pastebin.test.redhat.com/895937
oc get events -n openshift-image-registry: http://pastebin.test.redhat.com/895940

Comment 1 Jan Zmeskal 2020-08-24 13:50:35 UTC
# oc get all -n openshift-cluster-csi-drivers
NAME                                               READY   STATUS    RESTARTS   AGE
pod/ovirt-csi-driver-controller-74994665c7-zdccv   3/3     Running   0          3h32m
pod/ovirt-csi-driver-node-2885v                    3/3     Running   1          3h32m
pod/ovirt-csi-driver-node-69vqk                    3/3     Running   0          3h32m
pod/ovirt-csi-driver-node-78hqn                    3/3     Running   0          3h32m
pod/ovirt-csi-driver-node-cm6hs                    3/3     Running   0          3h20m
pod/ovirt-csi-driver-node-gf9tt                    3/3     Running   0          3h19m
pod/ovirt-csi-driver-node-z6v6d                    3/3     Running   0          3h32m
pod/ovirt-csi-driver-operator-95477b8c7-gchlp      1/1     Running   0          3h51m

NAME                                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/ovirt-csi-driver-node   6         6         6       6            6           <none>          3h32m

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ovirt-csi-driver-controller   1/1     1            1           3h32m
deployment.apps/ovirt-csi-driver-operator     1/1     1            1           3h51m

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/ovirt-csi-driver-controller-74994665c7   1         1         1       3h32m
replicaset.apps/ovirt-csi-driver-operator-95477b8c7      1         1         1       3h51m

# oc logs pod/ovirt-csi-driver-controller-74994665c7-zdccv -c csi-driver -n openshift-cluster-csi-drivers
I0824 10:17:39.462782       1 ovirt-csi-driver.go:42] Driver vendor csi.ovirt.org 0.1.1
I0824 10:17:41.826417       1 driver.go:39] Setting the rpc server
I0824 10:17:50.208116       1 controller.go:36] Creating disk pvc-9bf197c7-1564-4a66-9136-0f67c3d20f3c

Comment 2 Jan Zmeskal 2020-08-24 14:00:30 UTC
# oc logs pod/ovirt-csi-driver-controller-74994665c7-zdccv -c csi-attacher -n openshift-cluster-csi-drivers | head -n 8
I0824 10:17:59.345293       1 main.go:91] Version: v4.6.0-202008210209.p0-0-g4c59d76-dirty
I0824 10:17:59.347404       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0824 10:17:59.351075       1 common.go:111] Probing CSI driver for readiness
I0824 10:17:59.432277       1 main.go:136] CSI driver name: "csi.ovirt.org"
W0824 10:17:59.432301       1 metrics.go:142] metrics endpoint will not be started because `metrics-address` was not specified.
I0824 10:17:59.433193       1 main.go:162] CSI driver supports ControllerPublishUnpublish, using real CSI handler
I0824 10:17:59.433609       1 controller.go:121] Starting CSI attacher
E0824 10:17:59.446020       1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:ovirt-csi-driver-controller-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope

<then the whole log is full of error like these>

# oc logs pod/ovirt-csi-driver-controller-74994665c7-zdccv -c csi-attacher -n openshift-cluster-csi-drivers | tail -n 5
E0824 13:59:14.139761       1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:ovirt-csi-driver-controller-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0824 13:59:15.143140       1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:ovirt-csi-driver-controller-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0824 13:59:16.146062       1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:ovirt-csi-driver-controller-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0824 13:59:17.148060       1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:ovirt-csi-driver-controller-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0824 13:59:18.150623       1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:ovirt-csi-driver-controller-sa" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope

Comment 3 Jan Zmeskal 2020-08-24 14:14:40 UTC

*** This bug has been marked as a duplicate of bug 1871051 ***