Bug 1807128 - If OLM catalog operator cannot reach the API server, it does not seem to retry
Summary: If OLM catalog operator cannot reach the API server, it does not seem to retry
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.5.0
Assignee: Ben Luddy
QA Contact: Bruno Andrade
URL:
Whiteboard:
: 1810025 (view as bug list)
Depends On:
Blocks: 1808418
TreeView+ depends on / blocked
 
Reported: 2020-02-25 16:34 UTC by Stephen Benjamin
Modified: 2020-08-04 18:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 18:02:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1323 0 None closed Bug 1807128: Don't block on ctx.Done() if startup fails. 2021-02-13 02:20:42 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:02:13 UTC

Description Stephen Benjamin 2020-02-25 16:34:19 UTC
I have an install stuck at: level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager-catalog has not yet reported success"

oc get clusteroperators does not show operator-lifecycle-manager-catalog... and the logs show:

$ oc logs $POD -n openshift-operator-lifecycle-managertime="2020-02-25T14:58:32Z" level=info msg="log level info"
time="2020-02-25T14:58:32Z" level=info msg="TLS keys set, using https for metrics"
W0225 14:58:32.552916       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
W0225 14:58:32.557542       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
time="2020-02-25T14:58:32Z" level=info msg="operator not ready: communicating with server failed: Get https://172.30.0.1:443/version?timeout=32s: dial tcp 172.30.0.1:443: connect: connection refused"
time="2020-02-25T14:58:32Z" level=info msg="ClusterOperator api not present, skipping update (Get https://172.30.0.1:443/api?timeout=32s: dial tcp 
172.30.0.1:443: connect: connection refused)"



However, currently the API is now available:

$ oc rsh -n openshift-operator-lifecycle-manager $POD                        
sh-4.2$ curl -k https://172.30.0.1:443/api?timeout=32s:
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/api\"",
  "reason": "Forbidden",
  "details": {
  },
  "code": 403
}sh-4.2$ 



But it appears the operator is not retrying.

Comment 1 Stephen Benjamin 2020-02-25 16:35:13 UTC
Similar BZ: BZ1798135

Comment 6 Bruno Andrade 2020-03-10 16:35:37 UTC
Installed cluster and left it installed for approximately one day and OLM Cluster Operators are running as expected. Marking as VERIFIED.


OCP Cluster Version: 4.5.0-0.nightly-2020-03-06-190457

oc get clusteroperators | grep "operator-lifecycle-manager*"                                                           
operator-lifecycle-manager                 4.5.0-0.nightly-2020-03-06-190457   True        False         False      16h
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-03-06-190457   True        False         False      16h
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-03-06-190457   True        False         False      4h11m
                                 
oc get pods -n openshift-operator-lifecycle-manager                                                                    
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-6d54448f87-qktbj   1/1     Running   0          16h
olm-operator-7c876bcb96-rxsxq       1/1     Running   0          16h
packageserver-6dcdd88944-88tjg      1/1     Running   0          4h11m
packageserver-6dcdd88944-cwqnx      1/1     Running   0          4h11m

Comment 7 Evan Cordell 2020-03-12 14:30:15 UTC
*** Bug 1810025 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2020-08-04 18:02:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.