Bug 1678924 - OLM package server not running due to missing service account
Summary: OLM package server not running due to missing service account
Keywords:
Status: CLOSED DUPLICATE of bug 1678606
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-19 21:56 UTC by Derek Carr
Modified: 2019-03-12 14:24 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-20 05:28:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Derek Carr 2019-02-19 21:56:36 UTC
Description of problem:
Installed cluster with version 4.0.0-0.alpha-2019-02-18-164603 and see monitoring errors reporting no ability to reply to package server API.

Inspection of OLM operator shows it fails to deploy the replica set due to missing service account:

oc get events -n openshift-operator-lifecycle-manager

Error creating: pods "packageserver-5567cd88c6-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. see above
2.
3.

Actual results:


Expected results:
expected `oc get clusteroperators` to not report available for operator-lifecycle-manager when the package server is not actually running.

Additional info:

Comment 1 Derek Carr 2019-02-19 21:57:52 UTC
monitoring is invoking:

$ oc get --raw /apis/packages.apps.redhat.com/v1alpha1
Error from server (ServiceUnavailable): the server is currently unable to handle the request

And I can confirm the following above as well.

Comment 2 Derek Carr 2019-02-19 22:08:13 UTC
From operator see following:

time="2019-02-19T22:00:49Z" level=warning msg="needs reinstall: Timeout: deployment packageserver not ready before timeout: deployment \"packageserver\" exceeded its progress deadline" csv=packageserver.v0.8.1 id=qj5k6 namespace=openshift-operator-lifecycle-manager phase=Failed strategy=deployment


which i think corresponds to here:
https://github.com/operator-framework/operator-lifecycle-manager/blob/ff0ea15c22d0a3099dcb5a9a00400864f52ff87e/pkg/controller/install/deployment.go#L167

i cannot tell if cluster operator status is being written once or continuosly synced:

https://github.com/operator-framework/operator-lifecycle-manager/blob/cce4af21efb662527a8f71d22f7f2c37007ea4bf/cmd/olm/main.go#L134

it must be synced, and should be in a separate goroutine if i am seeing this correctly.

Comment 3 Jian Zhang 2019-02-20 05:28:38 UTC
Derek,

Many thanks for your report! We have already a bug to trace this issue.
Could you help add comments in bug 1678606? Thanks!

*** This bug has been marked as a duplicate of bug 1678606 ***


Note You need to log in before you can comment on or make changes to this bug.