Description of problem: upgrade from 4.1.9 to 4.1.11 failed by 'replicaset-controller Error creating: pods "packageserver-6474c74cdd-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found' $ oc get clusterversions NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.11 True False 2d22h Error while reconciling 4.1.11: the update could not be applied $ oc get clusteroperators NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.1.11 True False False 3d19h cloud-credential 4.1.11 True False False 3d20h cluster-autoscaler 4.1.11 True False False 3d20h console 4.1.11 True False False 3d20h dns 4.1.11 True False False 3d20h image-registry 4.1.11 True False False 2d23h ingress 4.1.11 True False False 2d23h kube-apiserver 4.1.11 True False False 3d20h kube-controller-manager 4.1.11 True False False 3d20h kube-scheduler 4.1.11 True False False 3d20h machine-api 4.1.11 True False False 3d20h machine-config 4.1.11 True False False 2d22h marketplace 4.1.11 True False False 2d17h monitoring 4.1.11 True False False 2d22h network 4.1.11 True False False 3d20h node-tuning 4.1.11 True False False 2d22h openshift-apiserver 4.1.11 True False False 2d22h openshift-controller-manager 4.1.11 True False False 3d20h openshift-samples 4.1.11 True False False 3d operator-lifecycle-manager 4.1.11 True False False 3d20h operator-lifecycle-manager-catalog 4.1.11 True False False 3d20h service-ca 4.1.11 True False False 3d20h service-catalog-apiserver 4.1.11 True False False 3d20h service-catalog-controller-manager 4.1.11 True False False 3d20h storage 4.1.11 True False False 3d1h $ oc get csv -n openshift-operator-lifecycle-manager NAME DISPLAY VERSION REPLACES PHASE packageserver.v0.9.0 Package Server 0.9.0 Pending $ oc describe replicaset.apps/packageserver-6474c74cdd -n openshift-operator-lifecycle-manager Name: packageserver-6474c74cdd Namespace: openshift-operator-lifecycle-manager Selector: app=packageserver,pod-template-hash=6474c74cdd Labels: app=packageserver pod-template-hash=6474c74cdd Annotations: deployment.kubernetes.io/desired-replicas: 1 deployment.kubernetes.io/max-replicas: 2 deployment.kubernetes.io/revision: 1 Controlled By: Deployment/packageserver Replicas: 0 current / 1 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=packageserver pod-template-hash=6474c74cdd Service Account: packageserver Containers: packageserver: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:13f137db60c022872b97d76ebbc12a084fab413c171bae6a3b05ade368d6e7c9 Port: 5443/TCP Host Port: 0/TCP Command: /bin/package-server -v=4 --secure-port 5443 --global-namespace openshift-operator-lifecycle-manager Liveness: http-get https://:5443/healthz delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:5443/healthz delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ ReplicaFailure True FailedCreate Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 63s (x54 over 3h51m) replicaset-controller Error creating: pods "packageserver-6474c74cdd-" is forbidden: error looking up service accou nt openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: n/a Steps to Reproduce: 1. not sure how to reproduce it yet. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
Hi, Jun Thanks for reporting this issue. Are the packageserver works well before upgrading? We haven't encountered this issue yet, could you help provide the reproduce steps? Thanks very much!
Upgrade blocking issues are always high severity
Seeing the same issue on an OSD cluster after upgrade from 4.1.9 to 4.1.13. Manually creating the serviceAccount does not resolve the issue but results in the next problem. tracking this internally as https://jira.coreos.com/browse/SREP-2048
This issue can be worked around by deleting the following resources: in the `openshift-operator-lifecycle-manager` namespace: - `Kind: InstallPlan`, `name: packageserver-<something>` - `Kind: Subscription`, `name: packageserver` cluster scoped (these may or may not exist): - `Kind: ClusterRole`, `name: packageserver-<something>` - `Kind: ClusterRole`, `name: packageserver-<something>` You should find when you do that, that the Subscription gets recreated, followed by the InstallPlan, followed by the CR and CRB, and the ServiceSccount as well. The packageserver deployment should finish rolling out successfully.
The following objects may also need to be deleted, if they exist: in the `openshift-operator-lifecycle-manager` namespace: - `Kind: ClusterServiceVersion`, `name: packageserver.<something>`
I hit this same issue with my cluster that was originally installed on 4.1.0. My 4.1.13 to 4.1.14 upgrade was failing. I followed the fix Chris and Evan outlined above, deleting the relevant InstallPlan, Subscription, and ClusterServiceVersion, and the update eventually succeeded.
This issue is fixed in 4.1.15 and later, and we can't fix the intermediate 4.1.z releases. Closing as "cantfix" for the range of affected z-streams, even though it's fixed in latest z-streams. Hopefully the manual steps here are enough for the (hopefully small number of) remaining cluster upgrades that would be affected.
I am seeing this issue on a newly updated 4.1.17 -> 4.1.18 cluster. The remediation steps listed above (delete objects) did work.
Seeing this on an upgrade from 4.1.13 to 4.1.18. The remediation steps got me past the service account error, but now packageserver won't start with this error: time="2019-10-02T20:10:59Z" level=fatal msg="error creating self-signed certificates: mkdir apiserver.local.config: permission denied"
It appears the service account gets created, but the role and role binding do not.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days