Bug 1808419 - If OLM catalog operator cannot reach the API server, it does not seem to retry
Summary: If OLM catalog operator cannot reach the API server, it does not seem to retry
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.z
Assignee: Ben Luddy
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On: 1808418
Blocks: 1808422
TreeView+ depends on / blocked
 
Reported: 2020-02-28 13:42 UTC by Ben Luddy
Modified: 2020-03-24 14:34 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-24 14:34:23 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1366 None closed [release-4.3] Bug 1808419: Don't block on ctx.Done() if startup fails. 2020-06-22 19:53:38 UTC
Red Hat Product Errata RHBA-2020:0858 None None None 2020-03-24 14:34:44 UTC

Description Ben Luddy 2020-02-28 13:42:41 UTC
This bug was initially created as a copy of Bug #1808418

I am copying this bug because: 



This bug was initially created as a copy of Bug #1807128

I am copying this bug because: 



I have an install stuck at: level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager-catalog has not yet reported success"

oc get clusteroperators does not show operator-lifecycle-manager-catalog... and the logs show:

$ oc logs $POD -n openshift-operator-lifecycle-managertime="2020-02-25T14:58:32Z" level=info msg="log level info"
time="2020-02-25T14:58:32Z" level=info msg="TLS keys set, using https for metrics"
W0225 14:58:32.552916       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
W0225 14:58:32.557542       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-02-25T14:58:32Z" level=info msg="Using in-cluster kube client config"
time="2020-02-25T14:58:32Z" level=info msg="operator not ready: communicating with server failed: Get https://172.30.0.1:443/version?timeout=32s: dial tcp 172.30.0.1:443: connect: connection refused"
time="2020-02-25T14:58:32Z" level=info msg="ClusterOperator api not present, skipping update (Get https://172.30.0.1:443/api?timeout=32s: dial tcp 
172.30.0.1:443: connect: connection refused)"



However, currently the API is now available:

$ oc rsh -n openshift-operator-lifecycle-manager $POD                        
sh-4.2$ curl -k https://172.30.0.1:443/api?timeout=32s:
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/api\"",
  "reason": "Forbidden",
  "details": {
  },
  "code": 403
}sh-4.2$ 



But it appears the operator is not retrying.

Comment 4 Bruno Andrade 2020-03-12 15:27:09 UTC
My apologies posted the comment in the wrong bug. Here is the verification of this one.

Installed cluster and left it installed for approximately one day and OLM Cluster Operators are running as expected. Marking as VERIFIED.


OCP Cluster Version: 4.3.0-0.nightly-2020-03-12-085147

oc get clusteroperators | grep "operator-lifecycle-manager*"                                                           
operator-lifecycle-manager                 4.3.0-0.nightly-2020-03-12-085147   True        False         False      17h
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2020-03-12-085147   True        False         False      17h
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2020-03-12-085147   True        False         False      17h

oc get pods -n openshift-operator-lifecycle-manager                                                                    
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-7db788c658-gjdw7   1/1     Running   0          17h
olm-operator-68dd7d597f-wpd7j       1/1     Running   0          17h
packageserver-f9cfd58dd-4m9st       1/1     Running   0          17h
packageserver-f9cfd58dd-vkxh4       1/1     Running   0          17h

Comment 6 errata-xmlrpc 2020-03-24 14:34:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0858


Note You need to log in before you can comment on or make changes to this bug.