Bug 1678924

Summary: OLM package server not running due to missing service account
Product: OpenShift Container Platform Reporter: Derek Carr <decarr>
Component: OLMAssignee: Evan Cordell <ecordell>
Status: CLOSED DUPLICATE QA Contact: Jian Zhang <jiazha>
Severity: unspecified Docs Contact:
Priority: high    
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-20 05:28:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Carr 2019-02-19 21:56:36 UTC
Description of problem:
Installed cluster with version 4.0.0-0.alpha-2019-02-18-164603 and see monitoring errors reporting no ability to reply to package server API.

Inspection of OLM operator shows it fails to deploy the replica set due to missing service account:

oc get events -n openshift-operator-lifecycle-manager

Error creating: pods "packageserver-5567cd88c6-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. see above
2.
3.

Actual results:


Expected results:
expected `oc get clusteroperators` to not report available for operator-lifecycle-manager when the package server is not actually running.

Additional info:

Comment 1 Derek Carr 2019-02-19 21:57:52 UTC
monitoring is invoking:

$ oc get --raw /apis/packages.apps.redhat.com/v1alpha1
Error from server (ServiceUnavailable): the server is currently unable to handle the request

And I can confirm the following above as well.

Comment 2 Derek Carr 2019-02-19 22:08:13 UTC
From operator see following:

time="2019-02-19T22:00:49Z" level=warning msg="needs reinstall: Timeout: deployment packageserver not ready before timeout: deployment \"packageserver\" exceeded its progress deadline" csv=packageserver.v0.8.1 id=qj5k6 namespace=openshift-operator-lifecycle-manager phase=Failed strategy=deployment


which i think corresponds to here:
https://github.com/operator-framework/operator-lifecycle-manager/blob/ff0ea15c22d0a3099dcb5a9a00400864f52ff87e/pkg/controller/install/deployment.go#L167

i cannot tell if cluster operator status is being written once or continuosly synced:

https://github.com/operator-framework/operator-lifecycle-manager/blob/cce4af21efb662527a8f71d22f7f2c37007ea4bf/cmd/olm/main.go#L134

it must be synced, and should be in a separate goroutine if i am seeing this correctly.

Comment 3 Jian Zhang 2019-02-20 05:28:38 UTC
Derek,

Many thanks for your report! We have already a bug to trace this issue.
Could you help add comments in bug 1678606? Thanks!

*** This bug has been marked as a duplicate of bug 1678606 ***