Description of problem: Seeing failure to bootstrap in 4.3 branch tests https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.3/706 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.3/705 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/719 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/720 Version-Release number of selected component (if applicable): Branch 4.3 latest How reproducible: Often Steps to Reproduce: See tests above Actual results: Cluster fails to bootstrap Expected results: Cluster starts Additional info:
Error snippet: container=\"cluster-policy-controller-2\" is waiting: \"CreateContainerError\" - \"container create failed: time=\\\"2020-01-24T13:52:43Z\\\" level=error msg=\\\"container_linux.go:346: starting container process caused \\\\\\\"exec: \\\\\\\\\\\\\\\"cluster-policy-controller\\\\\\\\\\\\\\\": executable file not found in $PATH\\\\\\\"\\\"
This is an issue with the 3.x version of imagebuilder used in OSBS
The copy in question was copying files(non-existent files in this case) from an earlier build stage.
Setting reported version to 3.11 as i understand this to be occurring on OSBS which uses OCP 3.11 which in turn uses imagebuilder when doing multistage dockerfile builds. But after this is root-caused we should confirm that the same issue does not manifest itself in OCP 4.x w/ buildah since buildah is partially imagebuilder based.
It looks like we've got two problems: using registry.svc.ci.openshift.org/ocp/builder:golang-1.12 as a base for building the cluster-policy-controller image causes `go list -mod=vendor ./cmd/...` to fail because the `-mod=vendor` build flag is only valid when using modules in 1.12 (1.13 doesn't seem to mind), which causes `GO_BUILD_PACKAGES_EXPANDED` to be empty, which causes `make build` to decide that there's nothing to do, so no binary is built and `make` appears to succeed. Then, when imagebuilder master attempts to copy the binary that wasn't built, it doesn't detect an error when the source file is not found. The logic in buildah master that handles COPY instructions appears to notice the error. A Dockerfile which I think matches the second error looks like: FROM busybox AS builder FROM scratch COPY --from=builder /bin/-no-such-file-error- /usr/bin
yeah i believe the go build failure issue was already resolved, so they've successfully built a working image, we just want to fix imagebuilder so this doesn't happen again.
Mrunal this looks like it got stuck/abandoned, can you get it moving again?
Opened https://github.com/openshift/imagebuilder/pull/162 for merging the changes onto imagebuilder's openshift-3.11 branch, which we should be able to vendor into the origin release-3.11 branch.
Assigning to Jindrich for packaging needs.
Still working on testing for the PR that pulls this fix in for OpenShift 3.11. I'm not certain that it'll happen during this sprint.
Rebased my pull request on top of other changes that have landed in the 3.11 branch. One of the other dependencies is no longer available from its original hosting site, so I had to change where we're getting it, which could complicates things a bit. The OpenShift organization might want to fork a copy of https://bitbucket-archive.softwareheritage.org/projects/ww/ww/goautoneg.html and use that instead.
The imagebuilder bump was merged as part of https://github.com/openshift/origin/pull/25416, so I'll link to that PR instead.
Verified on v3.11.286. It is now detecting when there is no file and reporting no such file or directory during build. # oc version oc v3.11.286 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-12-20.ec2.internal:8443 openshift v3.11.286 kubernetes v1.11.0+d4cacc0 # oc get nodes NAME STATUS ROLES AGE VERSION ip-172-18-10-59.ec2.internal Ready <none> 3h v1.11.0+d4cacc0 ip-172-18-12-20.ec2.internal Ready master 3h v1.11.0+d4cacc0 ip-172-18-5-226.ec2.internal Ready compute 3h v1.11.0+d4cacc0 # docker version Client: Version: 1.13.1 API version: 1.26 Package version: docker-1.13.1-162.git64e9980.el7_8.x86_64 Go version: go1.10.3 Git commit: 64e9980/1.13.1 Built: Mon Jun 22 03:20:20 2020 OS/Arch: linux/amd64 Server: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Package version: docker-1.13.1-162.git64e9980.el7_8.x86_64 Go version: go1.10.3 Git commit: 64e9980/1.13.1 Built: Mon Jun 22 03:20:20 2020 OS/Arch: linux/amd64 Experimental: false # cat Dockerfile FROM busybox COPY /nosuch-file / RUN stat /nosuch-file # docker build -t image . Sending build context to Docker daemon 2.048 kB Step 1/3 : FROM busybox ---> 6858809bf669 Step 2/3 : COPY /nosuch-file / lstat nosuch-file: no such file or directory
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.286 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3695
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days