Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1845792

Summary:	Pods cannot access the /config/master API endpoint: csrapprover: timed out waiting for the condition
Product:	OpenShift Container Platform	Reporter:	W. Trevor King <wking>
Component:	Cloud Compute	Assignee:	Alberto <agarcial>
Cloud Compute sub component:	Other Providers	QA Contact:	Jianwei Hou <jhou>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	behoward, mimccune
Version:	4.5	Keywords:	UpcomingSprint
Target Milestone:	---
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:	[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce multiple ingress policies with ingress allow-all policy taking precedence [Feature:NetworkPolicy]
Last Closed:	2020-10-27 16:06:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1846091

Description W. Trevor King 2020-06-10 05:12:04 UTC

test:

[sig-cluster-lifecycle] Pods cannot access the /config/master API endpoint [Suite:openshift/conformance/parallel]

is failing frequently in CI, see search results:

$ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?search=Pods+cannot+access+the+%2Fconfig%2Fmaster+API+endpoint&maxAge=48h&type=junit&name=release-openshift-' | grep 'failures match'
release-openshift-ocp-installer-e2e-aws-upi-4.5 - 24 runs, 50% failed, 8% of failures match
release-openshift-ocp-installer-e2e-aws-4.5 - 50 runs, 62% failed, 3% of failures match
release-openshift-ocp-installer-e2e-metal-4.5 - 26 runs, 50% failed, 8% of failures match
release-openshift-ocp-installer-e2e-vsphere-upi-4.5 - 24 runs, 79% failed, 5% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.5 - 11 runs, 64% failed, 14% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6 - 11 runs, 45% failed, 20% of failures match
release-openshift-ocp-installer-e2e-aws-4.6 - 71 runs, 73% failed, 4% of failures match
release-openshift-origin-installer-e2e-gcp-4.6 - 32 runs, 41% failed, 15% of failures match
release-openshift-ocp-installer-e2e-gcp-4.6 - 5 runs, 40% failed, 50% of failures match
release-openshift-ocp-installer-e2e-metal-4.6 - 5 runs, 40% failed, 50% of failures match
release-openshift-ocp-installer-e2e-metal-compact-4.6 - 5 runs, 20% failed, 100% of failures match
release-openshift-origin-installer-e2e-azure-shared-vpc-4.5 - 2 runs, 50% failed, 100% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci - 2 runs, 100% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-shared-vpc-4.5 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-aws-ovn-4.5 - 23 runs, 83% failed, 11% of failures match
release-openshift-ocp-installer-e2e-ovirt-4.6 - 11 runs, 82% failed, 11% of failures match
release-openshift-origin-installer-e2e-aws-calico-4.5 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-openstack-4.6 - 12 runs, 100% failed, 17% of failures match

Picking [1] as an example job, the error message was:

Run #0: Failed expand_less	3m20s
fail [github.com/openshift/origin/test/extended/csrapprover/csrapprover.go:49]: Unexpected error:
    <*errors.errorString | 0xc0001d8970>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/1399

Comment 1 Alberto 2020-06-10 10:15:27 UTC

This error is coming in code from https://github.com/openshift/origin/blob/55f403906e9b5e66fab9b4afb19f40b2212f74b5/test/extended/csrapprover/csrapprover.go#L48
https://github.com/openshift/origin/blob/55f403906e9b5e66fab9b4afb19f40b2212f74b5/test/extended/util/framework.go#L1595

By looking at "Monitor cluster while tests execute":

"registry.fedoraproject.org/fedora:30": rpc error: code = Unknown desc = Error reading manifest 30 in registry.fedoraproject.org/fedora: received unexpected HTTP status: 503 Service Temporarily Unavailable (2 times)

Jun 10 07:17:32.994 W ns/e2e-test-cluster-client-cert-qz5fk pod/get-bootstrap-creds node/ip-10-0-135-201.us-west-2.compute.internal reason/GracefulDelete in 30s

Seems fedora registry is actually not responding:

Albertos-MacBook-Pro:enhancements@albertogarla $ docker pull registry.fedoraproject.org/fedora:30
Error response from daemon: received unexpected HTTP status: 503 Service Temporarily Unavailable

Targeting to 4.6 to not block 4.5 as this is orthogonal.

Comment 2 Sam Batschelet 2020-06-10 15:33:19 UTC

*** Bug 1845295 has been marked as a duplicate of this bug. ***

Comment 6 Michael McCune 2020-08-20 21:56:00 UTC

i think we will probably need to revert Alberto's change given that the new target does not support multi-arch manifests. i did do some investigation into the previous image spec and it is working for me /with/ multi-arch:

```
$ podman pull registry.fedoraproject.org/fedora:32 --override-arch arm64                                                                                       
Trying to pull registry.fedoraproject.org/fedora:32...                                                                                                         
Getting image source signatures                                                                                                                                
Copying blob 1bfcc9281f78 done                                                                                                                                 
Copying config ef79e50227 done                                                                                                                                 
Writing manifest to image destination                                                                                                                          
Storing signatures                                                                                                                                             
ef79e5022740c1df693fafa7c666791adb6dabae9004ef5e46e21e8e75f33b1c
```

i'm not sure how to test these changes, but i will propose a PR to use the fedora:32 target from registry.fedoraproject.org if that will support our multi-arch builds. any recommendations or advice on how to test?

Comment 7 W. Trevor King 2020-08-20 23:28:30 UTC

Bug 1816812 is about decoupling the test suite from external registries; maybe just close as a dup of that?

Comment 9 errata-xmlrpc 2020-10-27 16:06:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196