Bug 1690366

Summary: Fail to install cluster due to "Could not update rolebinding cluster-storage-operator"
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: apiserver-authAssignee: Erica von Buelow <evb>
Status: CLOSED DUPLICATE QA Contact: Chuan Yu <chuyu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: aos-bugs, aos-storage-staff, decarr, jialiu, jokerman, mmccomas, wking
Target Milestone: ---Keywords: BetaBlocker, Regression
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-26 14:11:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2019-03-19 10:40:45 UTC
Description of problem:
Trigger installation against 4.0.0-0.nightly-2019-03-18-200009 build. Install failed and exit for failing to initialize the cluster. 
WARNING Found override for ReleaseImage. Please be warned, this is not advised 
INFO Consuming "Install Config" from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 30m0s for the Kubernetes API at https://api.jliu-demo.qe.devcluster.openshift.com:6443... 
INFO API v1.12.4+befe71b up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 30m0s for the cluster at https://api.jliu-demo.qe.devcluster.openshift.com:6443 to initialize... 
FATAL failed to initialize the cluster: timed out waiting for the condition 

The detail install log shows that it failed due to Could not update rolebinding \"openshift-cluster-storage-operator/cluster-storage-operator\" (229 of 305): the server has forbidden updates to this resource"

time="2019-03-19T08:18:10Z" level=debug msg="Still waiting for the cluster to initialize..."
...
time="2019-03-19T08:42:36Z" level=debug msg="Still waiting for the cluster to initialize: Could not update rolebinding \"openshift-cluster-storage-operator/cluster-storage-operator\" (229 of 305): the server has forbidden updates to this resource"
...
time="2019-03-19T08:48:10Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

==========Checked more info about the cluster status==============
# oc get co storage
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
storage   4.0.0-0.nightly-2019-03-18-200009   True        False         False     57m

# oc get rolebindings -n openshift-cluster-storage-operator
NAME                       AGE
cluster-storage-operator   50m
system:deployers           74m
system:image-builders      74m
system:image-pullers       74m

The rolebinding has been created actually, and check cvo log shows that at the beginning it failed to apply the storage rolebind, and then it succeed.

E0319 08:19:42.969575       1 task.go:58] error running apply for rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (229 of 305): rolebindings.rbac.authorization.k8s.io "cluster-storage-operator" is forbidden: the server could not find the requested resource (get rolebindingrestrictions.authorization.openshift.io)
I0319 08:44:13.667047       1 task_graph.go:566] Result of work: [Could not update rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (229 of 305): the server has forbidden updates to this resource Could not update rolebinding "openshift-ingress-operator/ingress-operator" (189 of 305): the server has forbidden updates to this resource Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (292 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (298 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (304 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (301 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (295 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (286 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (289 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-image-registry/image-registry" (283 of 305): the server does not recognize this resource, check extension API servers Cluster operator authentication is still updating Cluster operator monitoring is still updating Cluster oper
ator console has not yet reported success]
E0319 08:44:13.667137       1 sync_worker.go:257] unable to synchronize image (waiting 24.968400798s): Could not update rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (229 of 305): the server has forbidden updates to this resource



Version-Release number of the following components:
# ./openshift-install version
./openshift-install v4.0.22-201903161424-dirty
built from commit ea2fab4c886ad770b25619b734c6bcf54195b038

How reproducible:
Sometimes

Steps to Reproduce:
1. Trigger installation with 4.0.0-0.nightly-2019-03-18-200009
2.
3.

Actual results:
Installation failed.

Expected results:
Installation succeed.

Additional info:
Detail storage operator log and cvo log in attachment.

Comment 3 liujia 2019-03-25 02:52:19 UTC
Still hit it on 4.0.0-0.nightly-2019-03-22-191219

Comment 4 Johnny Liu 2019-03-26 06:28:14 UTC
Recently I often hit the same issue.

4.0.0-0.nightly-2019-03-25-180911
4.0.0-0.nightly-2019-03-23-222829

Per my experience, reproduce ratio become bigger, so add a betablocker keyword.

Comment 7 W. Trevor King 2019-03-26 14:11:08 UTC
Dup of bug 1691513

*** This bug has been marked as a duplicate of bug 1691513 ***