Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1623195

Summary: Discovery clients are not tolerant of failures in discovery
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: MasterAssignee: Stefan Schimanski <sttts>
Status: CLOSED WONTFIX QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jokerman, maszulik, mmccomas
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-06 11:44:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-08-28 17:05:00 UTC
Adding and removing an apiservice can break other components that rely on discovery (because they can't access a particular endpoint).

The failure here is oc new-app which exits because the discovery client failed, but in this case should be making forward progress because we don't need discovery info for all endpoints, just those that are in a new-app call (we would need to error if we ended up needing that info).

Failed here https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18816/pull-ci-origin-e2e-gcp/2909/#conformanceareanetworkingfeaturerouter-the-haproxy-router-should-serve-the-correct-routes-when-running-with-the-haproxy-config-manager-suiteopenshiftconformanceparallel because the API aggregation test was running at the same time (registering wardle).

/tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/test/extended/router/config_manager.go:43
Expected error:
    <*util.ExitError | 0xc422547a70>: {
        Cmd: "oc new-app --config=/tmp/admin.kubeconfig --namespace=e2e-test-router-config-manager-77v9b -f /tmp/fixture-testdata-dir572013924/test/extended/testdata/router-config-manager.yaml -p IMAGE=registry.svc.ci.openshift.org/ci-op-y5k4hf9x/stable:haproxy-router",
        StdErr: "--> Deploying template \"e2e-test-router-config-manager-77v9b/\" for \"/tmp/fixture-testdata-dir572013924/test/extended/testdata/router-config-manager.yaml\" to project e2e-test-router-config-manager-77v9b\n\n     * With parameters:\n        * IMAGE=registry.svc.ci.openshift.org/ci-op-y5k4hf9x/stable:haproxy-router\n\nerror: Unable to to get list of available resources: unable to retrieve the complete list of server APIs: wardle.k8s.io/v1alpha1: the server could not find the requested resource",
        ExitError: {


The aggregation test will be disabled from openshift e2es until this bug is fixed.

The metrics server has also been a source of this failure, the installer e2es were blocked for over a week due to flakiness in metrics server.

Comment 1 Clayton Coleman 2018-08-28 17:05:31 UTC
David preferred a bug to making the test not disruptive.  You can thank him later.

Comment 2 Maciej Szulik 2018-09-06 11:36:23 UTC
There was a problem with discovery in new-app, which was introduced in https://github.com/openshift/origin/pull/20020 but that was later reverted in https://github.com/openshift/origin/pull/20781 and needs proper implementation, one that will tolerate discovery failures.

Comment 3 Stefan Schimanski 2018-09-06 11:44:23 UTC
With Maciej's comment, this was fixed by https://github.com/openshift/origin/pull/20781. Closing.