Bug 2030972

Summary: TestAdminAck should succeed: vulnerable to API-server hiccups
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Test FrameworkAssignee: W. Trevor King <wking>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: medium    
Version: 4.8CC: dgoodwin
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-27 03:01:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2021-12-10 06:50:36 UTC
The test case:

  [sig-cluster-lifecycle] TestAdminAck should succeed [Suite:openshift/conformance/parallel]

is vulnerable to brief API-server hiccups like [1]:

  Dec  6 00:00:10.440: FAIL: Error accessing configmap openshift-config-managed/admin-gates: Get "https://api.ci-op-g2m38jp7-eafe9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-config-managed/configmaps/admin-gates": dial tcp: lookup api.ci-op-g2m38jp7-eafe9.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host

and [2]:

  Dec  9 19:53:20.747: FAIL: Error accessing configmap openshift-config-managed/admin-gates: Get "https://api.ci-op-w5q90zpi-9278e.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-config-managed/configmaps/admin-gates": dial tcp 100.21.251.165:6443: i/o timeout

We should... do something to make those non-fatal.  Logging the error and then bailing out to wait for the next poll round might work, but we want to ensure that we actually get a successful run and don't claim "success" if all our attempts were "I couldn't actually connect to the Kube API-server to check".

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2026806#c8
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2027929#c1

Comment 2 W. Trevor King 2022-02-04 00:47:53 UTC
Moving back to NEW, because I haven't had time to work on it.  Leaving myself in as the assignee, because I don't want to dump fixing this on the Test Framework folks.