Description of problem: Docker started to rate limit image pulls recently [0] (Nov. 2nd) and CI jobs are seeing image pull failures because of this. you would see something like this in the build-log from this example job [1]: 3x kubelet: Failed to pull image "busybox": rpc error: code = Unknown desc = Error reading manifest latest in docker.io/library/busybox: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit * 3x kubelet: Error: ErrImagePull * 5x kubelet: Back-off pulling image "busybox" * 5x kubelet: Error: ImagePullBackOff [0] https://www.docker.com/blog/what-you-need-to-know-about-upcoming-docker-hub-rate-limiting/ [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/852/pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn/1324047536943534080 Version-Release number of selected component (if applicable): How reproducible: very frequently. A search over a 2d period at the time of the creation of this BZ showed 313 jobs affected by this: https://search.ci.openshift.org/?search=Failed+to+pull+image+%22busybox%22&maxAge=48h&context=1&type=build-log&name=&maxMatches=5&maxBytes=20971520&groupBy=job Additional info: discussions in slack mentioned two alternatives: gcr.io/google-containers/busybox quay.io/quay/busybox
Looking in openshift/origin (openshift-tests) some of the files I found referencing the docker.io busybox image: test/extended/builds/hooks.go test/extended/builds/multistage.go test/extended/cli/compat.go test/extended/images/extract.go test/extended/images/layers.go test/extended/images/mirror.go test/extended/images/oc_tag.go test/extended/images/resolve.go test/extended/testdata/bindata.go test/extended/testdata/builds/build-postcommit/docker.yaml test/extended/testdata/builds/test-cds-dockerbuild.json test/extended/testdata/builds/test-docker-build.json test/extended/testdata/cmd/test/cmd/testdata/test-docker-build.json test/extended/testdata/test-cli-debug.yaml Most test that use the docker.io busybox image are in the Builds and Images tests.
Being tracked for upstream e2e tests https://github.com/kubernetes/test-infra/issues/19477 https://github.com/kubernetes/kubernetes/issues/94018
meant to assign this to ImageStream, though there are some tests from Build that also need changed.
Trying to address this for upstream e2e https://github.com/openshift/release/pull/13460
Still a popular failure mode: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=www.docker.com/increase-rate-limit' | grep 'failures match' | sort periodic-ci-openshift-release-master-ocp-4.5-e2e-vsphere-upi - 6 runs, 100% failed, 17% of failures match periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy - 7 runs, 86% failed, 17% of failures match periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere - 14 runs, 100% failed, 14% of failures match pull-ci-cri-o-cri-o-master-e2e-aws - 29 runs, 41% failed, 8% of failures match ... pull-ci-openshift-sriov-network-operator-master-e2e-aws - 22 runs, 100% failed, 5% of failures match release-openshift-ocp-installer-e2e-aws-4.7 - 11 runs, 45% failed, 20% of failures match release-openshift-ocp-installer-e2e-ovirt-4.7 - 12 runs, 67% failed, 25% of failures match release-openshift-okd-installer-e2e-aws-4.6 - 8 runs, 63% failed, 80% of failures match release-openshift-okd-installer-e2e-aws-upgrade - 7 runs, 14% failed, 100% of failures match
Clayton's PR has landed. But we still have this problem: https://search.ci.openshift.org/?search=You+have+reached+your+pull+rate+limit&maxAge=48h&context=1&type=build-log&name=&maxMatches=5&maxBytes=20971520&groupBy=job I created few BZ for tests that still use docker.io: BZ 1904679 BZ 1904682 BZ 1904683 BZ 1904684
https://search.ci.openshift.org/?search=You+have+reached+your+pull+rate+limit&maxAge=48h&context=1&type=build-log&name=&maxMatches=5&maxBytes=20971520&groupBy=job "matched 1.56% of failing runs" I think we've mostly mitigated the problem, but there are some tests that still use Docker Hub (mostly it's e2e-cmd). Let's create additional BZs for the remaining tests as they are owned by different teams.