1678924 – OLM package server not running due to missing service account

Bug 1678924 - OLM package server not running due to missing service account

Summary: OLM package server not running due to missing service account

Keywords:
Status:	CLOSED DUPLICATE of bug 1678606
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Evan Cordell
QA Contact:	Jian Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-19 21:56 UTC by Derek Carr
Modified:	2019-03-12 14:24 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-20 05:28:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Derek Carr 2019-02-19 21:56:36 UTC

Description of problem:
Installed cluster with version 4.0.0-0.alpha-2019-02-18-164603 and see monitoring errors reporting no ability to reply to package server API.

Inspection of OLM operator shows it fails to deploy the replica set due to missing service account:

oc get events -n openshift-operator-lifecycle-manager

Error creating: pods "packageserver-5567cd88c6-" is forbidden: error looking up service account openshift-operator-lifecycle-manager/packageserver: serviceaccount "packageserver" not found


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. see above
2.
3.

Actual results:


Expected results:
expected `oc get clusteroperators` to not report available for operator-lifecycle-manager when the package server is not actually running.

Additional info:

Comment 1 Derek Carr 2019-02-19 21:57:52 UTC

monitoring is invoking:

$ oc get --raw /apis/packages.apps.redhat.com/v1alpha1
Error from server (ServiceUnavailable): the server is currently unable to handle the request

And I can confirm the following above as well.

Comment 2 Derek Carr 2019-02-19 22:08:13 UTC

From operator see following:

time="2019-02-19T22:00:49Z" level=warning msg="needs reinstall: Timeout: deployment packageserver not ready before timeout: deployment \"packageserver\" exceeded its progress deadline" csv=packageserver.v0.8.1 id=qj5k6 namespace=openshift-operator-lifecycle-manager phase=Failed strategy=deployment


which i think corresponds to here:
https://github.com/operator-framework/operator-lifecycle-manager/blob/ff0ea15c22d0a3099dcb5a9a00400864f52ff87e/pkg/controller/install/deployment.go#L167

i cannot tell if cluster operator status is being written once or continuosly synced:

https://github.com/operator-framework/operator-lifecycle-manager/blob/cce4af21efb662527a8f71d22f7f2c37007ea4bf/cmd/olm/main.go#L134

it must be synced, and should be in a separate goroutine if i am seeing this correctly.

Comment 3 Jian Zhang 2019-02-20 05:28:38 UTC

Derek,

Many thanks for your report! We have already a bug to trace this issue.
Could you help add comments in bug 1678606? Thanks!

*** This bug has been marked as a duplicate of bug 1678606 ***

Note You need to log in before you can comment on or make changes to this bug.