Bug 1623195 - Discovery clients are not tolerant of failures in discovery
Summary: Discovery clients are not tolerant of failures in discovery
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.0
Assignee: Stefan Schimanski
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-28 17:05 UTC by Clayton Coleman
Modified: 2018-09-06 11:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-06 11:44:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2018-08-28 17:05:00 UTC
Adding and removing an apiservice can break other components that rely on discovery (because they can't access a particular endpoint).

The failure here is oc new-app which exits because the discovery client failed, but in this case should be making forward progress because we don't need discovery info for all endpoints, just those that are in a new-app call (we would need to error if we ended up needing that info).

Failed here https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18816/pull-ci-origin-e2e-gcp/2909/#conformanceareanetworkingfeaturerouter-the-haproxy-router-should-serve-the-correct-routes-when-running-with-the-haproxy-config-manager-suiteopenshiftconformanceparallel because the API aggregation test was running at the same time (registering wardle).

/tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/test/extended/router/config_manager.go:43
Expected error:
    <*util.ExitError | 0xc422547a70>: {
        Cmd: "oc new-app --config=/tmp/admin.kubeconfig --namespace=e2e-test-router-config-manager-77v9b -f /tmp/fixture-testdata-dir572013924/test/extended/testdata/router-config-manager.yaml -p IMAGE=registry.svc.ci.openshift.org/ci-op-y5k4hf9x/stable:haproxy-router",
        StdErr: "--> Deploying template \"e2e-test-router-config-manager-77v9b/\" for \"/tmp/fixture-testdata-dir572013924/test/extended/testdata/router-config-manager.yaml\" to project e2e-test-router-config-manager-77v9b\n\n     * With parameters:\n        * IMAGE=registry.svc.ci.openshift.org/ci-op-y5k4hf9x/stable:haproxy-router\n\nerror: Unable to to get list of available resources: unable to retrieve the complete list of server APIs: wardle.k8s.io/v1alpha1: the server could not find the requested resource",
        ExitError: {


The aggregation test will be disabled from openshift e2es until this bug is fixed.

The metrics server has also been a source of this failure, the installer e2es were blocked for over a week due to flakiness in metrics server.

Comment 1 Clayton Coleman 2018-08-28 17:05:31 UTC
David preferred a bug to making the test not disruptive.  You can thank him later.

Comment 2 Maciej Szulik 2018-09-06 11:36:23 UTC
There was a problem with discovery in new-app, which was introduced in https://github.com/openshift/origin/pull/20020 but that was later reverted in https://github.com/openshift/origin/pull/20781 and needs proper implementation, one that will tolerate discovery failures.

Comment 3 Stefan Schimanski 2018-09-06 11:44:23 UTC
With Maciej's comment, this was fixed by https://github.com/openshift/origin/pull/20781. Closing.


Note You need to log in before you can comment on or make changes to this bug.