Bug 1690366 - Fail to install cluster due to "Could not update rolebinding cluster-storage-operator"
Summary: Fail to install cluster due to "Could not update rolebinding cluster-storage-...
Keywords:
Status: CLOSED DUPLICATE of bug 1691513
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.0
Assignee: Erica von Buelow
QA Contact: Chuan Yu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-19 10:40 UTC by liujia
Modified: 2019-03-26 14:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-26 14:11:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description liujia 2019-03-19 10:40:45 UTC
Description of problem:
Trigger installation against 4.0.0-0.nightly-2019-03-18-200009 build. Install failed and exit for failing to initialize the cluster. 
WARNING Found override for ReleaseImage. Please be warned, this is not advised 
INFO Consuming "Install Config" from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 30m0s for the Kubernetes API at https://api.jliu-demo.qe.devcluster.openshift.com:6443... 
INFO API v1.12.4+befe71b up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 30m0s for the cluster at https://api.jliu-demo.qe.devcluster.openshift.com:6443 to initialize... 
FATAL failed to initialize the cluster: timed out waiting for the condition 

The detail install log shows that it failed due to Could not update rolebinding \"openshift-cluster-storage-operator/cluster-storage-operator\" (229 of 305): the server has forbidden updates to this resource"

time="2019-03-19T08:18:10Z" level=debug msg="Still waiting for the cluster to initialize..."
...
time="2019-03-19T08:42:36Z" level=debug msg="Still waiting for the cluster to initialize: Could not update rolebinding \"openshift-cluster-storage-operator/cluster-storage-operator\" (229 of 305): the server has forbidden updates to this resource"
...
time="2019-03-19T08:48:10Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

==========Checked more info about the cluster status==============
# oc get co storage
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
storage   4.0.0-0.nightly-2019-03-18-200009   True        False         False     57m

# oc get rolebindings -n openshift-cluster-storage-operator
NAME                       AGE
cluster-storage-operator   50m
system:deployers           74m
system:image-builders      74m
system:image-pullers       74m

The rolebinding has been created actually, and check cvo log shows that at the beginning it failed to apply the storage rolebind, and then it succeed.

E0319 08:19:42.969575       1 task.go:58] error running apply for rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (229 of 305): rolebindings.rbac.authorization.k8s.io "cluster-storage-operator" is forbidden: the server could not find the requested resource (get rolebindingrestrictions.authorization.openshift.io)
I0319 08:44:13.667047       1 task_graph.go:566] Result of work: [Could not update rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (229 of 305): the server has forbidden updates to this resource Could not update rolebinding "openshift-ingress-operator/ingress-operator" (189 of 305): the server has forbidden updates to this resource Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (292 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (298 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (304 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (301 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (295 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (286 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (289 of 305): the server does not recognize this resource, check extension API servers Could not update servicemonitor "openshift-image-registry/image-registry" (283 of 305): the server does not recognize this resource, check extension API servers Cluster operator authentication is still updating Cluster operator monitoring is still updating Cluster oper
ator console has not yet reported success]
E0319 08:44:13.667137       1 sync_worker.go:257] unable to synchronize image (waiting 24.968400798s): Could not update rolebinding "openshift-cluster-storage-operator/cluster-storage-operator" (229 of 305): the server has forbidden updates to this resource



Version-Release number of the following components:
# ./openshift-install version
./openshift-install v4.0.22-201903161424-dirty
built from commit ea2fab4c886ad770b25619b734c6bcf54195b038

How reproducible:
Sometimes

Steps to Reproduce:
1. Trigger installation with 4.0.0-0.nightly-2019-03-18-200009
2.
3.

Actual results:
Installation failed.

Expected results:
Installation succeed.

Additional info:
Detail storage operator log and cvo log in attachment.

Comment 3 liujia 2019-03-25 02:52:19 UTC
Still hit it on 4.0.0-0.nightly-2019-03-22-191219

Comment 4 Johnny Liu 2019-03-26 06:28:14 UTC
Recently I often hit the same issue.

4.0.0-0.nightly-2019-03-25-180911
4.0.0-0.nightly-2019-03-23-222829

Per my experience, reproduce ratio become bigger, so add a betablocker keyword.

Comment 7 W. Trevor King 2019-03-26 14:11:08 UTC
Dup of bug 1691513

*** This bug has been marked as a duplicate of bug 1691513 ***


Note You need to log in before you can comment on or make changes to this bug.