Description of problem: I upgraded cluster from OCP 3.4 to OCP 3.5 followed the following doc https://docs.openshift.com/container-platform/3.4/install_config/upgrading/manual_upgrades.html#upgrading-masters After rebooting master I see frequently following error messages in logs Mar 1 15:19:29 ip-172-31-60-104 atomic-openshift-master: E0301 15:19:29.963222 1135 reflector.go:199] pkg/controller/disruption/disruption.go:329: Failed to list *apps.StatefulSet: User "system:serviceaccount:openshift-infra:disruption-controller" cannot list all apps.statefulsets in the cluster Version-Release number of selected component (if applicable): Before openshift v3.4.1.7 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 After openshift v3.5.0.37 kubernetes v1.5.2+43a9be4 etcd 3.1.0 How reproducible: Always Steps to Reproduce: 1. Install 3.4 cluster 2. Create lots of projects/endpoints/services/routes 3. Upgrade cluster using docs Actual results: Frequent errors from atomic-openshift-master in logs. Expected results: Should not have errors. Additional info:
The list errors start after restarting atomic-openshift-master after updating the RPMs and come in at a rate of 3-5 sec on a small cluster. Step 4 here: https://docs.openshift.com/container-platform/3.4/install_config/upgrading/manual_upgrades.html
Created attachment 1258932 [details] Message logs showing before/after for restart after update to 3.5
did you reconcile cluster roles with a 3.5 oadm client? https://docs.openshift.com/container-platform/3.4/install_config/upgrading/manual_upgrades.html#updating-policy-definitions can you include the output of the following? oadm version oadm policy reconcile-cluster-roles -o name oadm policy reconcile-cluster-roles -o name --confirm
oh, missed this was the service account. Fix in https://github.com/openshift/origin/pull/13187
And 1.5 PR is here: https://github.com/openshift/origin/pull/13199
This has been merged into ocp and is in OCP v3.5.0.38 or newer.
Verified in clean install environment, with latest OCP 3.5 puddle, no such error logged. # openshift version openshift v3.5.0.39 kubernetes v1.5.2+43a9be4 etcd 3.1.0 Verb/resource added to system:disruption-controller ClusterRole: { "apiGroups": [ "apps" ], "attributeRestrictions": null, "resources": [ "statefulsets" ], "verbs": [ "list", "watch" ] }, Set the bug status to verified now, will continue check it in the upgrade environment.
I also check this with a scaled cluster upgrade, problem is gone.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884