Description of problem: The error: ``` could not wait for build: the build machine-os-content failed with reason DockerBuildFailed: Docker build strategy has failed The downloaded packages were saved in cache until the next successful transaction. You can remove cached packages by executing 'dnf clean packages'. Error: Error downloading packages: Curl error (6): Couldn't resolve host name for https://m...=x86_64 [Could not resolve host: mirrors.fedoraproject.org] error: build error: running 'set -x && yum install -y ostr...erlay RPMs" --branch=origin-ci-dev' failed with exit code 1 ``` has happened approximately 1% of the time. See example: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22653/pull-ci-openshift-origin-master-e2e-aws-serial/5299 It seems unlikely that we should have issues hitting mirrors.fedoraproject.org. Its not clear from the logs if there is a retry built-in already.
This is coming from https://github.com/openshift/origin/blob/master/images/os/Dockerfile We are working towards replacing this with the use of `coreos-assembler` to perform the `machine-os-content` builds. See this related PR which helps enable this work - https://github.com/coreos/coreos-assembler/pull/489
not blocking the release for it, but if it gets fixed so be it.
Created attachment 1558927 [details] Instances of this error over the past 24 hours Five jobs failed with this error message today, but all of them were from a single PR [1]. The first failure was slow (16+ minutes [2]). The remaining failures were all under one minute [3,4,5,6]. I'm pretty convinced that you got unlucky with one mirrors.fedoraproject.org connection, and then CI keeps replaying that cached failure (bug 1695507). You should be able to recover by removing the project to clear the cache: $ oc delete project ci-op-n34w1184 and kicking the test again. [1]: https://github.com/openshift/origin/pull/22653 [2]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22653/pull-ci-openshift-origin-master-e2e-aws-serial/5286 [3]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22653/pull-ci-openshift-origin-master-e2e-aws-serial/5290 [4]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22653/pull-ci-openshift-origin-master-e2e-aws-serial/5295 [5]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22653/pull-ci-openshift-origin-master-e2e-aws-serial/5296 [6]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22653/pull-ci-openshift-origin-master-e2e-aws-serial/5299
(In reply to Micah Abbott from comment #1) > This is coming from > https://github.com/openshift/origin/blob/master/images/os/Dockerfile > > We are working towards replacing this with the use of `coreos-assembler` to > perform the `machine-os-content` builds. See this related PR which helps > enable this work - https://github.com/coreos/coreos-assembler/pull/489 I believe this is due to faux `machine-os-content` being generated for testing using https://github.com/openshift/imagebuilder to create an image. Is the idea to replace the use of `imagebuilder` in origin's context with `cosa` and the dev-overlay command?
I don't think the problem is related to imagebuilder, it's a generic DNS failure which could happen with any tool; we're currently running `yum` at build time and particularly with Fedora infrastructure that is known to be flaky. Using cosa would avoid doing `yum install ostree`, though at a notable cost of downloading a rather larger image. Then bigger picture the idea indeed is to use dev-overlay for this, but that's not directly related to the build flake.
(In reply to Colin Walters from comment #7) > I don't think the problem is related to imagebuilder, it's a generic DNS > failure which could happen with any tool; we're currently running `yum` at > build time and particularly with Fedora infrastructure that is known to be > flaky. > > Using cosa would avoid doing `yum install ostree`, though at a notable cost > of downloading a rather larger image. That's where I was heading with my question :-) > Then bigger picture the idea indeed is to use dev-overlay for this, but > that's not directly related to the build flake. For the time being would moving the CI image off of Fedora and on to RHEL make sense? With the bigger picture having origin folks take advantage of dev-overlay _or_ RHCOS folks helping origin developers utilize cosa?
Pushing to 4.4 and reassigning to Vadim since he owns GRPA-392
Not worth tracking as a bug