1978404 – CI test failures due to catalog pod networking issues

Bug 1978404 - CI test failures due to catalog pod networking issues

Summary: CI test failures due to catalog pod networking issues

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Alexander Greene
QA Contact:	Jian Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-01 17:51 UTC by Anik
Modified:	2021-08-04 13:43 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-04 13:43:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Anik 2021-07-01 17:51:01 UTC

Description of problem:

Several e2e-olm tests kept failing due to the catalog pod being unreachable in https://github.com/openshift/operator-framework-olm/pull/108 

e2e test logs: 

```
waiting for catalog pod mock-ocs-main-preexistingcrdownerisreplaced-645dr to be available (for sync) - CONNECTING
waiting for catalog pod mock-ocs-main-preexistingcrdownerisreplaced-645dr to be available (for sync) - CONNECTING
waiting for catalog pod mock-ocs-main-preexistingcrdownerisreplaced-645dr to be available (for sync) - CONNECTING
waiting for catalog pod mock-ocs-main-preexistingcrdownerisreplaced-645dr to be available (for sync) - CONNECTING
.
.
.
```

There's a good amount of the following error in the catalog pod logs: 

```
time="2021-06-30T21:56:51Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.30.108.164:50051: connect: no route to host\"" catalog="{mock-ocs-main-grlrk openshift-operators}"
```

This may indicate that registry-service was not accepting traffic/iptables needed to be flushed. 

Version-Release number of selected component (if applicable):


How reproducible:

Not reproducible locally using kind cluster. Reproducible easily on 4.8 CI 

Steps to Reproduce:
1. Run tests on ocp 4.8 cluster
2. Run tests on a PR
3.

Actual results:


Expected results:


Additional info:

1. main branch (4.9 CI) did not have the same issue, but is worth making sure the problem does not exists in the main branch.
2. Test runs history: https://prow.ci.openshift.org/pr-history/?org=openshift&repo=operator-framework-olm&pr=108

Comment 1 Alexander Greene 2021-08-02 21:19:43 UTC

Taking a look at this.

Comment 2 Alexander Greene 2021-08-04 13:43:13 UTC

Unable to reproduce the issue as shown in [1], marking this as CLOSED due to INSUFFICIENT_DATA as the steps provided could not reproduce said issue.

Ref:
[1] https://github.com/openshift/operator-framework-olm/pull/146

Note You need to log in before you can comment on or make changes to this bug.