Bug 1889526 - When the external http proxy is down, *marketplace pods still report healthy even though operators aren't being downloaded.
Summary: When the external http proxy is down, *marketplace pods still report healthy ...
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ISV Operators
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: tonyc
QA Contact: tonyc
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-19 20:59 UTC by Vincent S. Cojot
Modified: 2020-10-19 20:59 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Vincent S. Cojot 2020-10-19 20:59:08 UTC
Hi,

This is an environment where OCP was freshly deployed but the HTTP Proxy used to reach out to the Internet is broken/down/unreachable:


$ oc get nodes
NAME                         STATUS     ROLES    AGE    VERSION
ocp4d-n5ktm-infra-0-j9hqh    NotReady   worker   40m    v1.18.3+47c0e71
ocp4d-n5ktm-infra-0-jmprk    Ready      worker   40m    v1.18.3+47c0e71
ocp4d-n5ktm-infra-0-sqn8z    Ready      worker   39m    v1.18.3+47c0e71
ocp4d-n5ktm-master-0         Ready      master   142m   v1.18.3+47c0e71
ocp4d-n5ktm-master-1         Ready      master   142m   v1.18.3+47c0e71
ocp4d-n5ktm-master-2         Ready      master   137m   v1.18.3+47c0e71
ocp4d-n5ktm-worker-0-jd242   Ready      worker   84m    v1.18.3+47c0e71
ocp4d-n5ktm-worker-0-nrfwn   Ready      worker   56m    v1.18.3+47c0e71
ocp4d-n5ktm-worker-0-v4n2j   Ready      worker   56m    v1.18.3+47c0e71

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.13    True        False         49m     Cluster version is 4.5.13

At this point, creating the OCS cr and cr sub results in.. nothing since the operator cannot be downloaded:

$ oc project openshift-storage
Now using project "openshift-storage" on server "https://api.ocp4d.openshift.lasthome.solace.krynn:6443".

$ oc get all
No resources found in openshift-storage namespace.

Also, in the 'openshift-marketplace' project, all pods report alive and well:

$ oc get pods
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-ccd9ffb5c-55zzj     1/1     Running   0          52m
community-operators-b8c7b64ff-f82pw     1/1     Running   0          52m
marketplace-operator-6ff46f666b-xhx78   1/1     Running   0          96m
redhat-marketplace-867bd688f-rrntd      1/1     Running   0          52m
redhat-operators-79fbb8f4cc-wbjqt       1/1     Running   0          52m

but upon further inespection, those messages are seen:
$ oc logs redhat-operators-79fbb8f4cc-wbjqt  |tail -5
time="2020-10-19T20:06:14Z" level=info msg="decoded 0 flattened and 0 nested operator manifest(s)" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=error msg="the following error(s) occurred while preparing the download list: Get https://quay.io/cnr/api/v1/packages?media_type=helm&namespace=redhat-operators: proxyconnect tcp: dial tcp 10.0.128.254:3128: i/o timeout" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=error msg="stat failed on target directory[downloaded] - stat downloaded: no such file or directory" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=warning msg="strict mode disabled" error="error loading manifests from appregistry: [error downloading manifests: Get https://quay.io/cnr/api/v1/packages?media_type=helm&namespace=redhat-operators: proxyconnect tcp: dial tcp 10.0.128.254:3128: i/o timeout, error loading operator manifests: stat downloaded: no such file or directory]" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=info msg="serving registry" port=50051 type=appregistry

$ oc logs certified-operators-ccd9ffb5c-55zzj    |tail -5
time="2020-10-19T20:06:14Z" level=info msg="No operator manifest decoded" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=info msg="decoded 0 flattened and 0 nested operator manifest(s)" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=error msg="stat failed on target directory[downloaded] - stat downloaded: no such file or directory" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=warning msg="strict mode disabled" error="error loading manifests from appregistry: [error downloading manifests: Get https://quay.io/cnr/api/v1/packages?media_type=helm&namespace=certified-operators: proxyconnect tcp: dial tcp 10.0.128.254:3128: i/o timeout, error loading operator manifests: stat downloaded: no such file or directory]" port=50051 type=appregistry
time="2020-10-19T20:06:14Z" level=info msg="serving registry" port=50051 type=appregistry
[raistlin@daltigoth ~]$ 

10.0.128.254 (port 3128) is the IPV4 of the http proxy in use (currently unreachable).

The fix is simple: switch to a working http proxy but why are the *operators pods in openshift-marketplace reporting alive and well?

Is this expected?


Note You need to log in before you can comment on or make changes to this bug.