Bug 1815957 - disconnected community catalog always restart because healthcheck failed
Summary: disconnected community catalog always restart because healthcheck failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.z
Hardware: x86_64
OS: All
high
high
Target Milestone: ---
: 4.5.0
Assignee: Evan Cordell
QA Contact: yhui
URL:
Whiteboard:
: 1816986 (view as bug list)
Depends On:
Blocks: 1816184
TreeView+ depends on / blocked
 
Reported: 2020-03-23 01:47 UTC by zhengwan
Modified: 2023-09-07 22:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1816184 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:23:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1404 0 None closed Bug 1815957: increase grpc_health_probe timeout 2020-12-21 09:47:43 UTC
Github operator-framework operator-lifecycle-manager pull 1407 0 None closed Bug 1815957: increase grpc_health_probe timeout 2020-12-21 09:47:43 UTC
Red Hat Knowledge Base (Solution) 4927461 0 None None None 2020-03-25 14:54:46 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:23:51 UTC

Description zhengwan 2020-03-23 01:47:53 UTC
Description of problem:
after apply community catalog source, the pod will continue to restart because healthcheck failed.

Version-Release number of selected component (if applicable):
all versions in 4.3.*

How reproducible:
On aws, it seems ok, this is because the hareware performence is good on aws, so the healthcheck on catalog srouce will return ok.

Steps to Reproduce:
1. create the community catalog source on platform other than aws

cat <<EOF > community-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Community Operator Catalog
  sourceType: grpc
  image: docker.io/wangzheng422/operator-catalog:community-2020-02-29
  publisher: Community
EOF
oc create -f community-operator-catalog.yaml


2. you can see the pod will restart always.
3.

Actual results:
community operator catalog pod will continue restart because of healthcheck failed.

Expected results:
community operator catalog pod will run smoothly.

Additional info:

Comment 1 zhengwan 2020-03-23 02:02:15 UTC
I just fixed it, and submit a github pull request.

Comment 4 Nick Hale 2020-03-25 14:43:11 UTC
*** Bug 1816986 has been marked as a duplicate of this bug. ***

Comment 6 Evan Cordell 2020-04-12 17:44:07 UTC
Please note that the root cause of this issue has been addressed in a recent PR: https://github.com/operator-framework/operator-registry/pull/227

The issue is that on some systems, a lack of an nsswitch config means that dns will attempt to resolve from external sources before local files (so resolving `localhost` could take a long time).

Comment 9 yhui 2020-04-16 06:10:18 UTC
[hui@localhost work]$ oc version
Client Version: 4.5.0-202004062101-f2b01c4
Server Version: 4.5.0-0.nightly-2020-04-14-221451
Kubernetes Version: v1.18.0-rc.1

Test the case on azure.

Steps to test:
1. create the community catalog source on platform azure.

cat <<EOF > community-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: community-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: Community Operator Catalog
  sourceType: grpc
  image: docker.io/wangzheng422/operator-catalog:community-2020-02-29
  publisher: Community
EOF
oc create -f community-operator-catalog.yaml


2. The community operator catalog pod is running.

[hui@localhost test]$ oc get CatalogSource -n openshift-marketplace
NAME                         DISPLAY                      TYPE   PUBLISHER   AGE
certified-operators          Certified Operators          grpc   Red Hat     100m
community-operator-catalog   Community Operator Catalog   grpc   Community   9s
community-operators          Community Operators          grpc   Red Hat     100m
redhat-marketplace           Red Hat Marketplace          grpc   Red Hat     100m
redhat-operators             Red Hat Operators            grpc   Red Hat     100m

[hui@localhost test]$ oc get pods -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-789976dc4-5ffsw     1/1     Running   0          88m
community-operator-catalog-cksdq        1/1     Running   0          10m
community-operators-5b948bd55-sb76t     1/1     Running   0          88m
marketplace-operator-6f979dc485-sgq6d   1/1     Running   0          89m
redhat-marketplace-5c87b57d76-xh2lz     1/1     Running   0          88m
redhat-operators-7c8fb9bcfd-4nwk9       1/1     Running   0          88m

The result is what we expected.

Comment 11 yhui 2020-04-17 16:24:32 UTC
The bug has been verified.

Comment 15 errata-xmlrpc 2020-07-13 17:23:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.