Bug 1815957

Summary: disconnected community catalog always restart because healthcheck failed
Product: OpenShift Container Platform Reporter: zhengwan <zhengwan>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: yhui
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: dapark, jiazha, jritter, mfuruta, nhale, rheinzma, yhui
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1816184 (view as bug list) Environment:
Last Closed: 2020-07-13 17:23:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816184    

Description zhengwan 2020-03-23 01:47:53 UTC
Description of problem:
after apply community catalog source, the pod will continue to restart because healthcheck failed.

Version-Release number of selected component (if applicable):
all versions in 4.3.*

How reproducible:
On aws, it seems ok, this is because the hareware performence is good on aws, so the healthcheck on catalog srouce will return ok.

Steps to Reproduce:
1. create the community catalog source on platform other than aws

cat <<EOF > community-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Community Operator Catalog
  sourceType: grpc
  image: docker.io/wangzheng422/operator-catalog:community-2020-02-29
  publisher: Community
EOF
oc create -f community-operator-catalog.yaml


2. you can see the pod will restart always.
3.

Actual results:
community operator catalog pod will continue restart because of healthcheck failed.

Expected results:
community operator catalog pod will run smoothly.

Additional info:

Comment 1 zhengwan 2020-03-23 02:02:15 UTC
I just fixed it, and submit a github pull request.

Comment 4 Nick Hale 2020-03-25 14:43:11 UTC
*** Bug 1816986 has been marked as a duplicate of this bug. ***

Comment 6 Evan Cordell 2020-04-12 17:44:07 UTC
Please note that the root cause of this issue has been addressed in a recent PR: https://github.com/operator-framework/operator-registry/pull/227

The issue is that on some systems, a lack of an nsswitch config means that dns will attempt to resolve from external sources before local files (so resolving `localhost` could take a long time).

Comment 9 yhui 2020-04-16 06:10:18 UTC
[hui@localhost work]$ oc version
Client Version: 4.5.0-202004062101-f2b01c4
Server Version: 4.5.0-0.nightly-2020-04-14-221451
Kubernetes Version: v1.18.0-rc.1

Test the case on azure.

Steps to test:
1. create the community catalog source on platform azure.

cat <<EOF > community-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Community Operator Catalog
  sourceType: grpc
  image: docker.io/wangzheng422/operator-catalog:community-2020-02-29
  publisher: Community
EOF
oc create -f community-operator-catalog.yaml


2. The community operator catalog pod is running.

[hui@localhost test]$ oc get CatalogSource -n openshift-marketplace
NAME                         DISPLAY                      TYPE   PUBLISHER   AGE
certified-operators          Certified Operators          grpc   Red Hat     100m
community-operator-catalog   Community Operator Catalog   grpc   Community   9s
community-operators          Community Operators          grpc   Red Hat     100m
redhat-marketplace           Red Hat Marketplace          grpc   Red Hat     100m
redhat-operators             Red Hat Operators            grpc   Red Hat     100m

[hui@localhost test]$ oc get pods -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-789976dc4-5ffsw     1/1     Running   0          88m
community-operator-catalog-cksdq        1/1     Running   0          10m
community-operators-5b948bd55-sb76t     1/1     Running   0          88m
marketplace-operator-6f979dc485-sgq6d   1/1     Running   0          89m
redhat-marketplace-5c87b57d76-xh2lz     1/1     Running   0          88m
redhat-operators-7c8fb9bcfd-4nwk9       1/1     Running   0          88m

The result is what we expected.

Comment 11 yhui 2020-04-17 16:24:32 UTC
The bug has been verified.

Comment 15 errata-xmlrpc 2020-07-13 17:23:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409