Bug 1678924

Summary:	OLM package server not running due to missing service account
Product:	OpenShift Container Platform	Reporter:	Derek Carr <decarr>
Component:	OLM	Assignee:	Evan Cordell <ecordell>
Status:	CLOSED DUPLICATE	QA Contact:	Jian Zhang <jiazha>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	4.1.0
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-02-20 05:28:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Derek Carr 2019-02-19 21:56:36 UTC

Description of problem:
Installed cluster with version 4.0.0-0.alpha-2019-02-18-164603 and see monitoring errors reporting no ability to reply to package server API.

Inspection of OLM operator shows it fails to deploy the replica set due to missing service account:

oc get events -n openshift-operator-lifecycle-manager

Error creating: pods "packageserver-5567cd88c6-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. see above
2.
3.

Actual results:


Expected results:
expected `oc get clusteroperators` to not report available for operator-lifecycle-manager when the package server is not actually running.

Additional info:

Comment 1 Derek Carr 2019-02-19 21:57:52 UTC

monitoring is invoking:

$ oc get --raw /apis/packages.apps.redhat.com/v1alpha1
Error from server (ServiceUnavailable): the server is currently unable to handle the request

And I can confirm the following above as well.

Comment 2 Derek Carr 2019-02-19 22:08:13 UTC

From operator see following:

time="2019-02-19T22:00:49Z" level=warning msg="needs reinstall: Timeout: deployment packageserver not ready before timeout: deployment \"packageserver\" exceeded its progress deadline" csv=packageserver.v0.8.1 id=qj5k6 namespace=openshift-operator-lifecycle-manager phase=Failed strategy=deployment


which i think corresponds to here:
https://github.com/operator-framework/operator-lifecycle-manager/blob/ff0ea15c22d0a3099dcb5a9a00400864f52ff87e/pkg/controller/install/deployment.go#L167

i cannot tell if cluster operator status is being written once or continuosly synced:

https://github.com/operator-framework/operator-lifecycle-manager/blob/cce4af21efb662527a8f71d22f7f2c37007ea4bf/cmd/olm/main.go#L134

it must be synced, and should be in a separate goroutine if i am seeing this correctly.

Comment 3 Jian Zhang 2019-02-20 05:28:38 UTC

Derek,

Many thanks for your report! We have already a bug to trace this issue.
Could you help add comments in bug 1678606? Thanks!

*** This bug has been marked as a duplicate of bug 1678606 ***