1744061 – [ci] [azure] Flaky test:The HAProxy router should serve routes that were created from an ingress

Bug 1744061 - [ci] [azure] Flaky test:The HAProxy router should serve routes that were created from an ingress

Summary: [ci] [azure] Flaky test:The HAProxy router should serve routes that were crea...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Dan Mace
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-21 08:45 UTC by zhou ying
Modified:	2022-08-04 22:24 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:36:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 23688	0	'None'	closed	e2e: use container network to access routes	2020-09-18 10:22:57 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:37:07 UTC

Description zhou ying 2019-08-21 08:45:02 UTC

Description of problem:
Failed job:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/53
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/61

Failed error:
Aug 21 07:52:50.591 E kube-apiserver Kube API started failing: Get https://api.ci-op-5brzliy5-282fe.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/kube-system?timeout=3s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Version-Release number of selected component (if applicable):
redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-4.2

How reproducible:
Some times

Comment 1 Michal Fojtik 2019-08-21 10:36:27 UTC

Aug 21 07:40:45.998: INFO: stderr: "+ set -euo pipefail\n+ rc=0\n+ curl -X GET -H Host:www.google.com -s -S -o /tmp/body -D /tmp/headers https://13.89.139.152 -w '{\"code\":%{http_code}}' -k\n+ rc=7\n++ echo 7\n++ cat /tmp/output\n++ cat /tmp/error\n++ json_escape\n++ python -c 'import json,sys; print json.dumps(sys.stdin.read())'\n++ cat /tmp/body\ncat: /tmp/body: No such file or directory\n++ base64 -w 0 -\n++ cat /tmp/headers\n++ json_escape\n++ python -c 'import json,sys; print json.dumps(sys.stdin.read())'\n+ echo '{\"test\":0,\"rc\":7,\"curl\":{\"code\":000},\"error\":\"curl: (7) Failed connect to 13.89.139.152:443; Connection timed out\\n\",\"body\":\"\",\"headers\":\"\"}'\n"
9694
Aug 21 07:40:45.998: INFO: stdout: "{\"test\":0,\"rc\":7,\"curl\":{\"code\":000},\"error\":\"curl: (7) Failed connect to 13.89.139.152:443; Connection timed out\\n\",\"body\":\"\",\"headers\":\"\"}\n"

There seems to be plenty of networking errors in that test runs, I wonder if Casey or somebody from networking team can judge in.

Comment 2 Casey Callendrello 2019-08-26 12:16:05 UTC

That's weird... the test should be going through the router's Service IP[1]. According to the logs, that's 13.89.139.152... why? That doesn't look like a service IP at all? I assume that's because it points to a load balancer.

I remember Azure external-facing load-balancers having issues when being accessed internally, but I can't find anything about that in the documentation now.

So, is it possible that WaitForRouterServiceIP() is doing the wrong thing?

Over to Routing.


1: https://github.com/openshift/origin/blob/master/test/extended/router/router.go#L52

Comment 3 Dan Mace 2019-08-26 13:28:05 UTC

Whatever's wrong here is probably the same issue as the other Azure flakes, but we can keep this issue open and distinct for now.

https://bugzilla.redhat.com/show_bug.cgi?id=1741532
https://bugzilla.redhat.com/show_bug.cgi?id=1741534
https://bugzilla.redhat.com/show_bug.cgi?id=1725259

Comment 4 Dan Mace 2019-08-30 15:04:37 UTC

We believe this was solved by https://github.com/openshift/origin/pull/23688. Will open new bugs as necessary.

Comment 6 zhou ying 2019-09-03 03:24:45 UTC

Confirmed with latest job , can't find the same issue, will verify it:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/155
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/157
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/156
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/153

Comment 7 errata-xmlrpc 2019-10-16 06:36:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.