1808419 – If OLM catalog operator cannot reach the API server, it does not seem to retry

Bug 1808419 - If OLM catalog operator cannot reach the API server, it does not seem to retry

Summary: If OLM catalog operator cannot reach the API server, it does not seem to retry

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Ben Luddy
QA Contact:	Bruno Andrade
Docs Contact:
URL:
Whiteboard:
Depends On:	1808418
Blocks:	1808422
TreeView+	depends on / blocked

Reported:	2020-02-28 13:42 UTC by Ben Luddy
Modified:	2020-03-24 14:34 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-24 14:34:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	operator-framework operator-lifecycle-manager pull 1366	0	None	closed	[release-4.3] Bug 1808419: Don't block on ctx.Done() if startup fails.	2020-09-11 20:02:19 UTC
Red Hat Product Errata	RHBA-2020:0858	0	None	None	None	2020-03-24 14:34:44 UTC

Description Ben Luddy 2020-02-28 13:42:41 UTC

This bug was initially created as a copy of Bug #1808418

I am copying this bug because: 



This bug was initially created as a copy of Bug #1807128

I am copying this bug because: 



I have an install stuck at: level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager-catalog has not yet reported success"

oc get clusteroperators does not show operator-lifecycle-manager-catalog... and the logs show:

$ oc logs $POD -n openshift-operator-lifecycle-managertime="2020-02-25T14:58:32Z" level=info msg="log level info"
time="2020-02-25T14:58:32Z" level=info msg="TLS keys set, using https for metrics"
W0225 14:58:32.552916       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
W0225 14:58:32.557542       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
time="2020-02-25T14:58:32Z" level=info msg="operator not ready: communicating with server failed: Get https://172.30.0.1:443/version?timeout=32s: dial tcp 172.30.0.1:443: connect: connection refused"
time="2020-02-25T14:58:32Z" level=info msg="ClusterOperator api not present, skipping update (Get https://172.30.0.1:443/api?timeout=32s: dial tcp 
172.30.0.1:443: connect: connection refused)"



However, currently the API is now available:

$ oc rsh -n openshift-operator-lifecycle-manager $POD                        
sh-4.2$ curl -k https://172.30.0.1:443/api?timeout=32s:
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/api\"",
  "reason": "Forbidden",
  "details": {
  },
  "code": 403
}sh-4.2$ 



But it appears the operator is not retrying.

Comment 4 Bruno Andrade 2020-03-12 15:27:09 UTC

My apologies posted the comment in the wrong bug. Here is the verification of this one.

Installed cluster and left it installed for approximately one day and OLM Cluster Operators are running as expected. Marking as VERIFIED.


OCP Cluster Version: 4.3.0-0.nightly-2020-03-12-085147

oc get clusteroperators | grep "operator-lifecycle-manager*"                                                           
operator-lifecycle-manager                 4.3.0-0.nightly-2020-03-12-085147   True        False         False      17h
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2020-03-12-085147   True        False         False      17h
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2020-03-12-085147   True        False         False      17h

oc get pods -n openshift-operator-lifecycle-manager                                                                    
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-7db788c658-gjdw7   1/1     Running   0          17h
olm-operator-68dd7d597f-wpd7j       1/1     Running   0          17h
packageserver-f9cfd58dd-4m9st       1/1     Running   0          17h
packageserver-f9cfd58dd-vkxh4       1/1     Running   0          17h

Comment 6 errata-xmlrpc 2020-03-24 14:34:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0858

Note You need to log in before you can comment on or make changes to this bug.