Bug 1743061 - upgrade from 4.1.9 to 4.1.11 failed by 'replicaset-controller Error creating: pods "packageserver-6474c74cdd-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found'
Summary: upgrade from 4.1.9 to 4.1.11 failed by 'replicaset-controller Error creating:...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.z
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On: 1746159 1755646
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-19 02:27 UTC by jwang
Modified: 2023-09-14 05:41 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1746159 (view as bug list)
Environment:
Last Closed: 2019-09-14 14:59:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jwang 2019-08-19 02:27:40 UTC
Description of problem:
upgrade from 4.1.9 to 4.1.11 failed by 'replicaset-controller Error creating: pods "packageserver-6474c74cdd-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found'

$ oc get clusterversions
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.11    True        False         2d22h   Error while reconciling 4.1.11: the update could not be applied

$ oc get clusteroperators                                                                                                                       
NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                       4.1.11    True        False         False      3d19h
cloud-credential                     4.1.11    True        False         False      3d20h
cluster-autoscaler                   4.1.11    True        False         False      3d20h
console                              4.1.11    True        False         False      3d20h
dns                                  4.1.11    True        False         False      3d20h
image-registry                       4.1.11    True        False         False      2d23h
ingress                              4.1.11    True        False         False      2d23h
kube-apiserver                       4.1.11    True        False         False      3d20h
kube-controller-manager              4.1.11    True        False         False      3d20h
kube-scheduler                       4.1.11    True        False         False      3d20h
machine-api                          4.1.11    True        False         False      3d20h
machine-config                       4.1.11    True        False         False      2d22h
marketplace                          4.1.11    True        False         False      2d17h
monitoring                           4.1.11    True        False         False      2d22h
network                              4.1.11    True        False         False      3d20h
node-tuning                          4.1.11    True        False         False      2d22h
openshift-apiserver                  4.1.11    True        False         False      2d22h
openshift-controller-manager         4.1.11    True        False         False      3d20h
openshift-samples                    4.1.11    True        False         False      3d
operator-lifecycle-manager           4.1.11    True        False         False      3d20h
operator-lifecycle-manager-catalog   4.1.11    True        False         False      3d20h
service-ca                           4.1.11    True        False         False      3d20h
service-catalog-apiserver            4.1.11    True        False         False      3d20h
service-catalog-controller-manager   4.1.11    True        False         False      3d20h
storage                              4.1.11    True        False         False      3d1h

$ oc get csv -n openshift-operator-lifecycle-manager
NAME                   DISPLAY          VERSION   REPLACES   PHASE
packageserver.v0.9.0   Package Server   0.9.0                Pending

$ oc describe replicaset.apps/packageserver-6474c74cdd -n openshift-operator-lifecycle-manager

Name:           packageserver-6474c74cdd
Namespace:      openshift-operator-lifecycle-manager
Selector:       app=packageserver,pod-template-hash=6474c74cdd
Labels:         app=packageserver
                pod-template-hash=6474c74cdd
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/packageserver
Replicas:       0 current / 1 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=packageserver
                    pod-template-hash=6474c74cdd
  Service Account:  packageserver
  Containers:
   packageserver:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:13f137db60c022872b97d76ebbc12a084fab413c171bae6a3b05ade368d6e7c9
    Port:       5443/TCP
    Host Port:  0/TCP
    Command:
      /bin/package-server
      -v=4
      --secure-port                
      5443                                                                                                                                                         
      --global-namespace                                                                                                                                           
      openshift-operator-lifecycle-manager                                                                                                                         
    Liveness:     http-get https://:5443/healthz delay=0s timeout=1s period=10s #success=1 #failure=3                                                              
    Readiness:    http-get https://:5443/healthz delay=0s timeout=1s period=10s #success=1 #failure=3                                                              
    Environment:  <none>                                                                                                                                           
    Mounts:       <none>                                                                                                                                           
  Volumes:        <none>                                                                                                                                           
Conditions:                                                                                                                                                        
  Type             Status  Reason                                                                                                                                  
  ----             ------  ------                                                                                                                                  
  ReplicaFailure   True    FailedCreate                                                                                                                            
Events:                                                                                                                                                            
  Type     Reason        Age                   From                   Message                                                                                      
  ----     ------        ----                  ----                   -------                                                                                      
  Warning  FailedCreate  63s (x54 over 3h51m)  replicaset-controller  Error creating: pods "packageserver-6474c74cdd-" is forbidden: error looking up service accou
nt openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found


Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:
n/a

Steps to Reproduce:
1. not sure how to reproduce it yet.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Jian Zhang 2019-08-19 07:12:31 UTC
Hi, Jun

Thanks for reporting this issue. Are the packageserver works well before upgrading?
We haven't encountered this issue yet, could you help provide the reproduce steps? Thanks very much!

Comment 7 Clayton Coleman 2019-08-29 20:48:49 UTC
Upgrade blocking issues are always high severity

Comment 9 Rick Rackow 2019-09-06 13:40:55 UTC
Seeing the same issue on an OSD cluster after upgrade from 4.1.9 to 4.1.13.
Manually creating the serviceAccount does not resolve the issue but results in the next problem.
tracking this internally as https://jira.coreos.com/browse/SREP-2048

Comment 13 Evan Cordell 2019-09-10 17:31:34 UTC
This issue can be worked around by deleting the following resources:

in the `openshift-operator-lifecycle-manager` namespace:
- `Kind: InstallPlan`, `name: packageserver-<something>`
- `Kind: Subscription`, `name: packageserver`

cluster scoped (these may or may not exist):
- `Kind: ClusterRole`, `name: packageserver-<something>`
- `Kind: ClusterRole`, `name: packageserver-<something>`

You should find when you do that, that the Subscription gets recreated, followed by the InstallPlan, followed by the CR and CRB, and the ServiceSccount as well. The packageserver deployment should finish rolling out successfully.

Comment 14 Christoph Blecker 2019-09-10 17:44:16 UTC
The following objects may also need to be deleted, if they exist:

in the `openshift-operator-lifecycle-manager` namespace:
- `Kind: ClusterServiceVersion`, `name: packageserver.<something>`

Comment 15 Jacob Lucky 2019-09-13 15:00:58 UTC
I hit this same issue with my cluster that was originally installed on 4.1.0. My 4.1.13 to 4.1.14 upgrade was failing. I followed the fix Chris and Evan outlined above, deleting the relevant InstallPlan, Subscription, and ClusterServiceVersion, and the update eventually succeeded.

Comment 16 Evan Cordell 2019-09-14 14:59:02 UTC
This issue is fixed in 4.1.15 and later, and we can't fix the intermediate 4.1.z releases. Closing as "cantfix" for the range of affected z-streams, even though it's fixed in latest z-streams. Hopefully the manual steps here are enough for the (hopefully small number of) remaining cluster upgrades that would be affected.

Comment 17 Rob Szumski 2019-09-26 19:38:47 UTC
I am seeing this issue on a newly updated 4.1.17 -> 4.1.18 cluster. The remediation steps listed above (delete objects) did work.

Comment 18 Jason Harlow 2019-10-02 20:18:33 UTC
Seeing this on an upgrade from 4.1.13 to 4.1.18. The remediation steps got me past the service account error, but now packageserver won't start with this error:

time="2019-10-02T20:10:59Z" level=fatal msg="error creating self-signed certificates: mkdir apiserver.local.config: permission denied"

Comment 19 Jason Harlow 2019-10-02 20:28:02 UTC
It appears the service account gets created, but the role and role binding do not.

Comment 20 Red Hat Bugzilla 2023-09-14 05:41:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.