Description of problem: There are a lot of credential related and access denied errors encountered on https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn-upgrade/1530006293157253120 this search shows the "denied: requested access to the resource is denied" variety: https://search.ci.openshift.org/?search=rpc+error%3A+code+%3D+Unknown+desc+%3D+reading+manifest+404+in+docker.io%2Flibrary%2Fwebserver&maxAge=80h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job this query will show a much larger list: $ podman run -it corbinu/alpine-w3m -dump -cols 200 "https://search.ci.openshift.org/?search=rpc+error%3A+code+%3D+Unknown+desc+%3D+reading+manifest+404+in+docker.io%2Flibrary%2Fwebserver&maxAge=80h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job" | grep 'failures match' | sort I'm wondering if we're still not out of the woods due to recent "rotated" credentials. Here's a failure in the job mentioned above: : [sig-node] should not encounter ErrImagePull in non-openshift namespace pods expand_less Run #0: Failed expand_less 0s { Found 45 ErrImagePull intervals for: ns/e2e-container-runtime-6277 pod/image-pull-test67121c7e-681a-4cf5-bab5-78602d451c40 node/ip-10-0-139-5.us-west-2.compute.internal uid/7eee92c9-c850-4c7a-b99c-ec585e2b1a3b container/image-pull-test: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 3.7 in gcr.io/authenticated-image-pulling/alpine: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication ns/e2e-container-runtime-6277 pod/image-pull-test67121c7e-681a-4cf5-bab5-78602d451c40 node/ip-10-0-139-5.us-west-2.compute.internal: reason/Failed Error: ErrImagePull ns/e2e-container-runtime-6277 pod/image-pull-test67121c7e-681a-4cf5-bab5-78602d451c40 uid/7eee92c9-c850-4c7a-b99c-ec585e2b1a3b container/image-pull-test: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 3.7 in gcr.io/authenticated-image-pulling/alpine: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication ns/e2e-container-runtime-6735 pod/image-pull-testc9d21750-50b0-491d-b2a9-c54d344ed9b6 node/ip-10-0-234-247.us-west-2.compute.internal uid/95a5aae6-d8c9-4e71-8b8b-cafaa51ba815 container/image-pull-test: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = pinging container registry invalid.com: Get "https://invalid.com/v2/": dial tcp 173.0.129.46:443: i/o timeout ns/e2e-container-runtime-6735 pod/image-pull-testc9d21750-50b0-491d-b2a9-c54d344ed9b6 node/ip-10-0-234-247.us-west-2.compute.internal: reason/Failed Error: ErrImagePull ns/e2e-container-runtime-6735 pod/image-pull-testc9d21750-50b0-491d-b2a9-c54d344ed9b6 uid/95a5aae6-d8c9-4e71-8b8b-cafaa51ba815 container/image-pull-test: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = pinging container registry invalid.com: Get "https://invalid.com/v2/": dial tcp 173.0.129.46:443: i/o timeout ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-8tjcs node/ip-10-0-234-247.us-west-2.compute.internal uid/4cb7aeeb-6ffe-42a0-b1c6-5110ced25f07 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-8tjcs uid/4cb7aeeb-6ffe-42a0-b1c6-5110ced25f07 container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-9q7rg node/ip-10-0-234-247.us-west-2.compute.internal uid/11bbbf50-57f2-47e5-bc50-46f25b0d7b76 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-9q7rg uid/11bbbf50-57f2-47e5-bc50-46f25b0d7b76 container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-bb6h2 node/ip-10-0-139-5.us-west-2.compute.internal uid/c44b01fb-60b9-4d68-83b5-cd11b5d9315a container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-bb6h2 uid/c44b01fb-60b9-4d68-83b5-cd11b5d9315a container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-bzjbt node/ip-10-0-139-5.us-west-2.compute.internal uid/4735dbfa-899b-47fc-b339-f66654cebf10 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-bzjbt uid/4735dbfa-899b-47fc-b339-f66654cebf10 container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-dwxtm node/ip-10-0-166-234.us-west-2.compute.internal uid/87db9c5e-252c-4c4f-ac25-9c04760667f4 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-dwxtm uid/87db9c5e-252c-4c4f-ac25-9c04760667f4 container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-gwf2b node/ip-10-0-234-247.us-west-2.compute.internal uid/a67114e3-5ef7-4a9b-81b5-0e21567bd8df container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-gwf2b uid/a67114e3-5ef7-4a9b-81b5-0e21567bd8df container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-h4wks node/ip-10-0-166-234.us-west-2.compute.internal uid/066ed5c7-27ba-47ca-b762-fe6a6bdfc23a container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-h4wks uid/066ed5c7-27ba-47ca-b762-fe6a6bdfc23a container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-mhd9c node/ip-10-0-234-247.us-west-2.compute.internal uid/dd614cc0-b084-42fd-8db7-5de98f6ab2ba container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-mhd9c uid/dd614cc0-b084-42fd-8db7-5de98f6ab2ba container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-pkj2l node/ip-10-0-139-5.us-west-2.compute.internal uid/2aec4233-9a84-42c8-a4e4-bc68083edb1c container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-pkj2l uid/2aec4233-9a84-42c8-a4e4-bc68083edb1c container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-qjvfc node/ip-10-0-166-234.us-west-2.compute.internal uid/b70f4e1f-4f17-4e1b-a0bb-8cd71defc9cd container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-qjvfc uid/b70f4e1f-4f17-4e1b-a0bb-8cd71defc9cd container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-rlnvg node/ip-10-0-166-234.us-west-2.compute.internal uid/7675d482-274c-481c-ab5a-44ca3329d2ae container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-rlnvg node/ip-10-0-166-234.us-west-2.compute.internal: reason/Failed Error: ErrImagePull ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-rlnvg uid/7675d482-274c-481c-ab5a-44ca3329d2ae container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-vtjbl node/ip-10-0-139-5.us-west-2.compute.internal uid/3bcb549e-b5be-45d0-9414-e0d0fcdbc2b1 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-vtjbl uid/3bcb549e-b5be-45d0-9414-e0d0fcdbc2b1 container/httpd: constructed/true reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-xz8ht node/ip-10-0-234-247.us-west-2.compute.internal uid/94bf446c-c85d-4bc7-88f7-2c145ea36978 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required ... Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: payload rejection: https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.ci/release/4.11.0-0.ci-2022-05-27-015910 aggregated job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-aws-ovn-upgrade-4.11-micro-release-openshift-release-analysis-aggregator/1530006294512013312 the failed job (one of two): https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn-upgrade/1530006293157253120
relevant slack thread with more info: https://coreos.slack.com/archives/C01CQA76KMX/p1653664318699599 trying to determine where docker.io is being used ... this code has been there for a while to avoid docker.io: https://github.com/openshift/release/blob/8bbb676c5f8b158c2c8d69f3ea11bbefcc9fbb4f/ci-operator/step-registry/openshift/e2e/test/openshift-e2e-test-commands.sh#L45 # Override the upstream docker.io registry due to issues with rate limiting # https://bugzilla.redhat.com/show_bug.cgi?id=1895107 # sjenning: TODO: use of personal repo is temporary; should find long term location for these mirrored images export KUBE_TEST_REPO_LIST=${HOME}/repo_list.yaml cat <<EOF > ${KUBE_TEST_REPO_LIST} dockerLibraryRegistry: quay.io/sjenning dockerGluster: quay.io/sjenning EOF but is overridden by: https://github.com/openshift/origin/blob/28cbfd55f5bda6a814d98569906dbabf5f00b68c/cmd/openshift-tests/openshift-tests.go#L44-L46 if len(os.Getenv("KUBE_TEST_REPO_LIST")) > 0 { fmt.Fprintln(os.Stderr, "warning: KUBE_TEST_REPO_LIST may not be set when using openshift-tests and will be ignored") os.Setenv("KUBE_TEST_REPO_LIST", "") which has also been there for a while.
Just noting that this test has been flaking for a long time which is why we haven't really noticed it.
See the flake here (look for [sig-node] should not encounter ErrImagePull in openshift namespace pods): https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-e2e-aws-arm64
Two sets of those failures in the original description are failures created by a test: https://github.com/openshift/origin//blob/bba29efd04e32ca75a62d2a117c6e9ce333e6e77/vendor/k8s.io/kubernetes/test/e2e/common/node/runtime.go#L379 https://github.com/openshift/origin//blob/bba29efd04e32ca75a62d2a117c6e9ce333e6e77/vendor/k8s.io/kubernetes/test/e2e/common/node/runtime.go#L390 So, I will ensure we skip them in this PR: https://github.com/openshift/origin/pull/27202
In this thread: https://coreos.slack.com/archives/C01CQA76KMX/p1654051864855859?thread_ts=1653664318.699599&cid=C01CQA76KMX , JustinP pointed out that for at least one of those tests, the ErrImagePull was expected. Specifically, this: ns/e2e-deployment-2940 pod/webserver-deployment-57ccb67bb8-8tjcs node/ip-10-0-234-247.us-west-2.compute.internal uid/4cb7aeeb-6ffe-42a0-b1c6-5110ced25f07 container/httpd: reason/ContainerWait cause/ErrImagePull: rpc error: code = Unknown desc = reading manifest 404 in docker.io/library/webserver: errors: denied: requested access to the resource is denied unauthorized: authentication required is expected. The message was mis-interpreted as me but the message is trying to convey they the test is looking for webserver:404 which translates to docker.io/library/webserver:404 which is documented in the origin repo (look for webserver:404) as not an image that does not exist. So, I'm pivoting on this to modify the test to take this into account and not flag those ErrImagePull events as an error/failure; see https://github.com/openshift/origin/pull/27202
Tests now show either flaky or success and no longer fail https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-e2e-aws-arm64&include-filter-by-regex=should%20not%20encounter%20ErrImagePull%20in%20openshift%20namespace%20pods