Bug 1794768 - [buildcop] imagebuilder does not report an error when failing to copy files from an earlier build stage
Summary: [buildcop] imagebuilder does not report an error when failing to copy files f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.11.0
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Nalin Dahyabhai
QA Contact: Sunil Choudhary
Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-24 15:43 UTC by Cesar Wong
Modified: 2023-09-14 05:50 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-16 07:46:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25416 0 None closed Bug 1865441: UPSTREAM: 286: Bump github.com/google/certificate-transparency-go to v1.0.20 to compile with golang >= 1.11 2020-09-29 03:14:04 UTC
Red Hat Product Errata RHBA-2020:3695 0 None None None 2020-09-16 07:47:04 UTC

Description Cesar Wong 2020-01-24 15:43:05 UTC
Description of problem:
Seeing failure to bootstrap in 4.3 branch tests
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.3/706
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.3/705
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/719
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/720

Version-Release number of selected component (if applicable):
Branch 4.3 latest

How reproducible:
Often

Steps to Reproduce:
See tests above

Actual results:
Cluster fails to bootstrap

Expected results:
Cluster starts


Additional info:

Comment 1 Cesar Wong 2020-01-24 15:44:21 UTC
Error snippet:

container=\"cluster-policy-controller-2\" is waiting: \"CreateContainerError\" - \"container create failed: time=\\\"2020-01-24T13:52:43Z\\\" level=error msg=\\\"container_linux.go:346: starting container process caused \\\\\\\"exec: \\\\\\\\\\\\\\\"cluster-policy-controller\\\\\\\\\\\\\\\": executable file not found in $PATH\\\\\\\"\\\"

Comment 3 Cesar Wong 2020-01-24 21:50:33 UTC
This is an issue with the 3.x version of imagebuilder used in OSBS

Comment 4 Ben Parees 2020-01-24 21:53:07 UTC
The copy in question was copying files(non-existent files in this case) from an earlier build stage.

Comment 5 Ben Parees 2020-01-24 22:59:29 UTC
Setting reported version to 3.11 as i understand this to be occurring on OSBS which uses OCP 3.11 which in turn uses imagebuilder when doing multistage dockerfile builds.

But after this is root-caused we should confirm that the same issue does not manifest itself in OCP 4.x w/ buildah since buildah is partially imagebuilder based.

Comment 6 Nalin Dahyabhai 2020-01-25 00:53:27 UTC
It looks like we've got two problems: using registry.svc.ci.openshift.org/ocp/builder:golang-1.12 as a base for building the cluster-policy-controller image causes `go list -mod=vendor ./cmd/...` to fail because the `-mod=vendor` build flag is only valid when using modules in 1.12 (1.13 doesn't seem to mind), which causes `GO_BUILD_PACKAGES_EXPANDED` to be empty, which causes `make build` to decide that there's nothing to do, so no binary is built and `make` appears to succeed.  Then, when imagebuilder master attempts to copy the binary that wasn't built, it doesn't detect an error when the source file is not found.  The logic in buildah master that handles COPY instructions appears to notice the error.

A Dockerfile which I think matches the second error looks like:

 FROM busybox AS builder
 FROM scratch
 COPY --from=builder /bin/-no-such-file-error- /usr/bin

Comment 7 Ben Parees 2020-01-25 01:47:51 UTC
yeah i believe the go build failure issue was already resolved, so they've successfully built a working image, we just want to fix imagebuilder so this doesn't happen again.

Comment 13 Ben Parees 2020-04-28 16:04:36 UTC
Mrunal this looks like it got stuck/abandoned, can you get it moving again?

Comment 15 Nalin Dahyabhai 2020-05-19 19:09:21 UTC
Opened https://github.com/openshift/imagebuilder/pull/162 for merging the changes onto imagebuilder's openshift-3.11 branch, which we should be able to vendor into the origin release-3.11 branch.

Comment 17 Tom Sweeney 2020-05-29 18:02:27 UTC
Assigning to Jindrich for packaging needs.

Comment 22 Nalin Dahyabhai 2020-07-30 22:01:01 UTC
Still working on testing for the PR that pulls this fix in for OpenShift 3.11.  I'm not certain that it'll happen during this sprint.

Comment 23 Nalin Dahyabhai 2020-08-21 19:05:16 UTC
Rebased my pull request on top of other changes that have landed in the 3.11 branch.  One of the other dependencies is no longer available from its original hosting site, so I had to change where we're getting it, which could complicates things a bit.  The OpenShift organization might want to fork a copy of https://bitbucket-archive.softwareheritage.org/projects/ww/ww/goautoneg.html and use that instead.

Comment 24 Nalin Dahyabhai 2020-08-26 20:30:56 UTC
The imagebuilder bump was merged as part of https://github.com/openshift/origin/pull/25416, so I'll link to that PR instead.

Comment 27 Sunil Choudhary 2020-09-15 09:28:35 UTC
Verified on v3.11.286. It is now detecting when there is no file and reporting no such file or directory during build.

# oc version
oc v3.11.286
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-12-20.ec2.internal:8443
openshift v3.11.286
kubernetes v1.11.0+d4cacc0

# oc get nodes
NAME                           STATUS    ROLES     AGE       VERSION
ip-172-18-10-59.ec2.internal   Ready     <none>    3h        v1.11.0+d4cacc0
ip-172-18-12-20.ec2.internal   Ready     master    3h        v1.11.0+d4cacc0
ip-172-18-5-226.ec2.internal   Ready     compute   3h        v1.11.0+d4cacc0

# docker version
Client:
 Version:         1.13.1
 API version:     1.26
 Package version: docker-1.13.1-162.git64e9980.el7_8.x86_64
 Go version:      go1.10.3
 Git commit:      64e9980/1.13.1
 Built:           Mon Jun 22 03:20:20 2020
 OS/Arch:         linux/amd64

Server:
 Version:         1.13.1
 API version:     1.26 (minimum version 1.12)
 Package version: docker-1.13.1-162.git64e9980.el7_8.x86_64
 Go version:      go1.10.3
 Git commit:      64e9980/1.13.1
 Built:           Mon Jun 22 03:20:20 2020
 OS/Arch:         linux/amd64
 Experimental:    false

# cat Dockerfile 
FROM busybox
COPY /nosuch-file /
RUN stat /nosuch-file

# docker build -t image .
Sending build context to Docker daemon 2.048 kB
Step 1/3 : FROM busybox
 ---> 6858809bf669
Step 2/3 : COPY /nosuch-file /
lstat nosuch-file: no such file or directory

Comment 29 errata-xmlrpc 2020-09-16 07:46:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.286 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3695

Comment 30 Red Hat Bugzilla 2023-09-14 05:50:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.