Bug 1845792 - Pods cannot access the /config/master API endpoint: csrapprover: timed out waiting for the condition
Summary: Pods cannot access the /config/master API endpoint: csrapprover: timed out wa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Alberto
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks: 1846091
TreeView+ depends on / blocked
 
Reported: 2020-06-10 05:12 UTC by W. Trevor King
Modified: 2020-10-27 16:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce multiple ingress policies with ingress allow-all policy taking precedence [Feature:NetworkPolicy]
Last Closed: 2020-10-27 16:06:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25087 0 None closed Bug 1845792: Drop registry.fedoraproject.org/fedora:30 in favour of quay.io/fedora:32-x86_64 2021-01-27 05:05:35 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:06:38 UTC

Description W. Trevor King 2020-06-10 05:12:04 UTC
test:

[sig-cluster-lifecycle] Pods cannot access the /config/master API endpoint [Suite:openshift/conformance/parallel]

is failing frequently in CI, see search results:

$ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?search=Pods+cannot+access+the+%2Fconfig%2Fmaster+API+endpoint&maxAge=48h&type=junit&name=release-openshift-' | grep 'failures match'
release-openshift-ocp-installer-e2e-aws-upi-4.5 - 24 runs, 50% failed, 8% of failures match
release-openshift-ocp-installer-e2e-aws-4.5 - 50 runs, 62% failed, 3% of failures match
release-openshift-ocp-installer-e2e-metal-4.5 - 26 runs, 50% failed, 8% of failures match
release-openshift-ocp-installer-e2e-vsphere-upi-4.5 - 24 runs, 79% failed, 5% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.5 - 11 runs, 64% failed, 14% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6 - 11 runs, 45% failed, 20% of failures match
release-openshift-ocp-installer-e2e-aws-4.6 - 71 runs, 73% failed, 4% of failures match
release-openshift-origin-installer-e2e-gcp-4.6 - 32 runs, 41% failed, 15% of failures match
release-openshift-ocp-installer-e2e-gcp-4.6 - 5 runs, 40% failed, 50% of failures match
release-openshift-ocp-installer-e2e-metal-4.6 - 5 runs, 40% failed, 50% of failures match
release-openshift-ocp-installer-e2e-metal-compact-4.6 - 5 runs, 20% failed, 100% of failures match
release-openshift-origin-installer-e2e-azure-shared-vpc-4.5 - 2 runs, 50% failed, 100% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci - 2 runs, 100% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-shared-vpc-4.5 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-aws-ovn-4.5 - 23 runs, 83% failed, 11% of failures match
release-openshift-ocp-installer-e2e-ovirt-4.6 - 11 runs, 82% failed, 11% of failures match
release-openshift-origin-installer-e2e-aws-calico-4.5 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-openstack-4.6 - 12 runs, 100% failed, 17% of failures match

Picking [1] as an example job, the error message was:

Run #0: Failed expand_less	3m20s
fail [github.com/openshift/origin/test/extended/csrapprover/csrapprover.go:49]: Unexpected error:
    <*errors.errorString | 0xc0001d8970>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/1399

Comment 1 Alberto 2020-06-10 10:15:27 UTC
This error is coming in code from https://github.com/openshift/origin/blob/55f403906e9b5e66fab9b4afb19f40b2212f74b5/test/extended/csrapprover/csrapprover.go#L48
https://github.com/openshift/origin/blob/55f403906e9b5e66fab9b4afb19f40b2212f74b5/test/extended/util/framework.go#L1595

By looking at "Monitor cluster while tests execute":

"registry.fedoraproject.org/fedora:30": rpc error: code = Unknown desc = Error reading manifest 30 in registry.fedoraproject.org/fedora: received unexpected HTTP status: 503 Service Temporarily Unavailable (2 times)

Jun 10 07:17:32.994 W ns/e2e-test-cluster-client-cert-qz5fk pod/get-bootstrap-creds node/ip-10-0-135-201.us-west-2.compute.internal reason/GracefulDelete in 30s

Seems fedora registry is actually not responding:

Albertos-MacBook-Pro:enhancements@albertogarla $ docker pull registry.fedoraproject.org/fedora:30
Error response from daemon: received unexpected HTTP status: 503 Service Temporarily Unavailable

Targeting to 4.6 to not block 4.5 as this is orthogonal.

Comment 2 Sam Batschelet 2020-06-10 15:33:19 UTC
*** Bug 1845295 has been marked as a duplicate of this bug. ***

Comment 6 Michael McCune 2020-08-20 21:56:00 UTC
i think we will probably need to revert Alberto's change given that the new target does not support multi-arch manifests. i did do some investigation into the previous image spec and it is working for me /with/ multi-arch:

```
$ podman pull registry.fedoraproject.org/fedora:32 --override-arch arm64                                                                                       
Trying to pull registry.fedoraproject.org/fedora:32...                                                                                                         
Getting image source signatures                                                                                                                                
Copying blob 1bfcc9281f78 done                                                                                                                                 
Copying config ef79e50227 done                                                                                                                                 
Writing manifest to image destination                                                                                                                          
Storing signatures                                                                                                                                             
ef79e5022740c1df693fafa7c666791adb6dabae9004ef5e46e21e8e75f33b1c
```

i'm not sure how to test these changes, but i will propose a PR to use the fedora:32 target from registry.fedoraproject.org if that will support our multi-arch builds. any recommendations or advice on how to test?

Comment 7 W. Trevor King 2020-08-20 23:28:30 UTC
Bug 1816812 is about decoupling the test suite from external registries; maybe just close as a dup of that?

Comment 9 errata-xmlrpc 2020-10-27 16:06:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.