Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1791905

Summary:

the server sometimes is unable to handle the get routes.route.openshift.io request

Product:

OpenShift Container Platform

Reporter:

Oleg Bulatov <obulatov>

Component:

openshift-apiserver

Assignee:

Lukasz Szaszkiewicz <lszaszki>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Xingxing Xia <xxia>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

4.4

CC:

adam.kaplan, aos-bugs, mfojtik, scuppett, sttts

Target Milestone:

---

Target Release:

4.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-02-17 16:52:36 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
http request to route.openshift.io group	none
aggregator_unavailable_apiservice metrics	none

Description Oleg Bulatov 2020-01-16 16:48:14 UTC

Description of problem:

Recently I found that some of our tests fails because of the error

unable to get route: the server is currently unable to handle the request (get routes.route.openshift.io testroute)

In our test we have disabled the openshift-apiserver operator, so openshift-apiserver shouldn't be interrupted and should stay available during the test.

Version-Release number of selected component (if applicable):

master

How reproducible:

Sometimes. There are few occasions:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/428/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-operator/2202
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/437/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-operator/2196
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/428/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-operator/2195

Expected results:

the apiserver stays high available during the test suite and always handle `get routes.route.openshift.io` requests

Comment 2 Lukasz Szaszkiewicz 2020-02-12 11:21:31 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/437/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-operator/2196 seems to be a one-off issue.

The test tried to connect to the server but failed with "unable to get route: the server is currently unable to handle the request (get routes.route.openshift.io testroute)". The error message implies that the servers should have returned HTTP 503 error code. However, I didn't find a single request with that HTTP status (see attached file). Additionally `aggregator_unavailable_apiservice` metric didn't report anything.

Comment 3 Lukasz Szaszkiewicz 2020-02-12 11:22:37 UTC

Created attachment 1662648 [details]
http request to route.openshift.io group

Comment 4 Lukasz Szaszkiewicz 2020-02-12 11:23:18 UTC

Created attachment 1662649 [details]
aggregator_unavailable_apiservice metrics

Comment 5 Lukasz Szaszkiewicz 2020-02-12 11:54:27 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/428/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-operator/2202 is similar additionally one test reports dial tcp 10.0.131.15:10250: connect: connection refused

Comment 7 Lukasz Szaszkiewicz 2020-02-12 12:49:53 UTC

I've checked briefly other runs to see if they suffer from the same issue and I haven't found any (https://prow.svc.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws-operator)

Comment 8 Lukasz Szaszkiewicz 2020-02-12 12:56:21 UTC

Assuming that the graceful mechanism we have works you shouldn't see any interruptions even during restarts.
Oleg does the issue still occurs?

Comment 9 Oleg Bulatov 2020-02-13 15:50:28 UTC

I haven't seen this issue for a while.