Bug 1743061
Summary: | upgrade from 4.1.9 to 4.1.11 failed by 'replicaset-controller Error creating: pods "packageserver-6474c74cdd-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found' | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | jwang | |
Component: | OLM | Assignee: | Evan Cordell <ecordell> | |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> | |
Status: | CLOSED CANTFIX | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aos-bugs, aprajapa, bandrade, brad.williams, cblecker, ccoleman, chuo, ecordell, erich, jeder, jfan, jharlow, jiazha, jlucky, jokerman, rhowe, rrackow, rszumski | |
Version: | 4.1.0 | |||
Target Milestone: | --- | |||
Target Release: | 4.1.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1746159 (view as bug list) | Environment: | ||
Last Closed: | 2019-09-14 14:59:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1746159, 1755646 | |||
Bug Blocks: |
Description
jwang
2019-08-19 02:27:40 UTC
Hi, Jun Thanks for reporting this issue. Are the packageserver works well before upgrading? We haven't encountered this issue yet, could you help provide the reproduce steps? Thanks very much! Upgrade blocking issues are always high severity Seeing the same issue on an OSD cluster after upgrade from 4.1.9 to 4.1.13. Manually creating the serviceAccount does not resolve the issue but results in the next problem. tracking this internally as https://jira.coreos.com/browse/SREP-2048 This issue can be worked around by deleting the following resources: in the `openshift-operator-lifecycle-manager` namespace: - `Kind: InstallPlan`, `name: packageserver-<something>` - `Kind: Subscription`, `name: packageserver` cluster scoped (these may or may not exist): - `Kind: ClusterRole`, `name: packageserver-<something>` - `Kind: ClusterRole`, `name: packageserver-<something>` You should find when you do that, that the Subscription gets recreated, followed by the InstallPlan, followed by the CR and CRB, and the ServiceSccount as well. The packageserver deployment should finish rolling out successfully. The following objects may also need to be deleted, if they exist: in the `openshift-operator-lifecycle-manager` namespace: - `Kind: ClusterServiceVersion`, `name: packageserver.<something>` I hit this same issue with my cluster that was originally installed on 4.1.0. My 4.1.13 to 4.1.14 upgrade was failing. I followed the fix Chris and Evan outlined above, deleting the relevant InstallPlan, Subscription, and ClusterServiceVersion, and the update eventually succeeded. This issue is fixed in 4.1.15 and later, and we can't fix the intermediate 4.1.z releases. Closing as "cantfix" for the range of affected z-streams, even though it's fixed in latest z-streams. Hopefully the manual steps here are enough for the (hopefully small number of) remaining cluster upgrades that would be affected. I am seeing this issue on a newly updated 4.1.17 -> 4.1.18 cluster. The remediation steps listed above (delete objects) did work. Seeing this on an upgrade from 4.1.13 to 4.1.18. The remediation steps got me past the service account error, but now packageserver won't start with this error: time="2019-10-02T20:10:59Z" level=fatal msg="error creating self-signed certificates: mkdir apiserver.local.config: permission denied" It appears the service account gets created, but the role and role binding do not. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |