Bug 1710008 - [4.1.z] image build doesn't handle COPY correctly in some cases
Summary: [4.1.z] image build doesn't handle COPY correctly in some cases
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: Adam Kaplan
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On: 1707941
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-14 17:46 UTC by Adam Kaplan
Modified: 2019-06-04 10:49 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1707941
Environment:
Last Closed: 2019-06-04 10:48:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:49:49 UTC

Description Adam Kaplan 2019-05-14 17:46:18 UTC
+++ This bug was initially created as a clone of Bug #1707941 +++

Description of problem:
in a dockerfile,
COPY . .

is failing in some cases.


How reproducible:
always

Steps to Reproduce:
1. git clone git:operator-framework/helm.git
2. buildah bud .
3. see failure:
error building at STEP "COPY . .": error copying "/home/bparees/git/gocode/src/github.com/openshift/helm/pkg/chartutil/testdata/joonix/charts/frobnitz" to "/home/bparees/.local/share/containers/storage/vfs/dir/a95fa17f13262c63706f22e35a8c0186a522bff0df57c97028c88867df39bd02/go/src/k8s.io/helm": Can't copy a directory

4. docker build .
5. see success


Actual results:
buildah bud fails, docker build succeeds. 

Expected results:
both should succeed

Additional info:

This is a blocker for OCP4.1 because ocp image builds are experiencing the same failure.  buildah is just an easy reproducer.

There are also similar looking cases of COPY that seem to work fine:

git clone git:openshift/elasticsearch-operator.git
buildah bud .

succeeds despite doing pretty much the same COPY operation: https://github.com/openshift/elasticsearch-operator/blob/master/Dockerfile#L3

--- Additional comment from Ben Parees on 2019-05-08 18:46:40 UTC ---

full list of github repos i'm seeing this issue with:
operator-framework/helm
openshift/multus-admission-controller
openshift/node_exporter
openshift/grafana


I am also seeing a slightly different issue on these repos, but the overall effect is the same in that docker builds them fine, buildah fails:

operator-framework/operator-registry fails with:
STEP 13: RUN mkdir /registry
STEP 14: WORKDIR /registry
STEP 15: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/initializer /bin/initializer
STEP 16: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/registry-server /bin/registry-server
STEP 17: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/configmap-server /bin/configmap-server
STEP 18: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/appregistry-server /bin/appregistry-server
STEP 19: COPY --from=builder /go/bin/grpc_health_probe /bin/grpc_health_probe
STEP 20: RUN chgrp -R 0 /registry &&     chgrp -R 0 /dev &&     chmod -R g+rwx /registry &&     chmod -R g+rwx /dev
chgrp: changing group of '/dev/urandom': Permission denied
chgrp: changing group of '/dev/zero': Permission denied
chgrp: changing group of '/dev/tty': Permission denied
chgrp: changing group of '/dev/full': Permission denied
chgrp: changing group of '/dev/random': Permission denied
chgrp: changing group of '/dev/null': Permission denied
error building at STEP "RUN chgrp -R 0 /registry &&     chgrp -R 0 /dev &&     chmod -R g+rwx /registry &&     chmod -R g+rwx /dev": error while running runtime: exit status 1
ERRO[0260] exit status 1                                



openshift/cluster-api-provider-azure fails with:
STEP 1: FROM registry.svc.ci.openshift.org/openshift/release:golang-1.10 AS builder
STEP 2: WORKDIR /go/src/sigs.k8s.io/cluster-api-provider-azure
STEP 3: COPY pkg/    pkg/
STEP 4: COPY cmd/    cmd/
STEP 5: COPY vendor/ vendor/
error building at STEP "COPY vendor/ vendor/": error copying "/home/bparees/git/gocode/src/github.com/openshift/cluster-api-provider-azure/vendor/k8s.io/kubernetes/.bazelrc" to "/home/bparees/.local/share/containers/storage/vfs/dir/b2e6a7668c62fa0e1d9ac68cb38bf1bf367131424c88cfaef259cf7861a8b264/go/src/sigs.k8s.io/cluster-api-provider-azure/vendor": stat /home/bparees/git/gocode/src/github.com/openshift/cluster-api-provider-azure/vendor/k8s.io/kubernetes/.bazelrc: no such file or directory
ERRO[0115] exit status 1

--- Additional comment from Nalin Dahyabhai on 2019-05-08 19:43:06 UTC ---

It looks like the handling of .dockerignore files has difficulty with symbolic links (and probably other non-directory, non-regular items).

--- Additional comment from chris alfonso on 2019-05-08 20:22:05 UTC ---

Based upon your investigation, I'd like to move this to 4.2 as we wouldn't hold the GA release for this fix.

--- Additional comment from Ben Parees on 2019-05-08 22:45:17 UTC ---

Just to clarify the impact of this bug, based on my understanding from Nalin:

if you have a image build context directory containing:

1) a .dockerignore
2) a symlink (or other "unusual" file type)

and then you do a
COPY . /somedir

in your dockerfile.

Then your build will fail.  It does not matter if the .dockerignore references the symlink or not.

For the RUN issue, we should split it out into a separate (4.1.z+4.2.0) targeted bug as it's an unrelated issue and less severe in terms of likely users impacted.

--- Additional comment from Nalin Dahyabhai on 2019-05-09 15:30:30 UTC ---

https://github.com/containers/buildah/pull/1583 should fix the issues with symbolic links.

--- Additional comment from Nalin Dahyabhai on 2019-05-13 14:15:17 UTC ---

https://github.com/openshift/builder/pull/72 should merge the fix into the builder.

Comment 1 Adam Kaplan 2019-05-14 18:02:41 UTC
release-4.1 PR: https://github.com/openshift/builder/pull/73

Comment 4 wewang 2019-05-17 08:42:23 UTC
Verified it in image build side in version:
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-16-223922   True        False         5h58m   Cluster version is 4.1.0-0.nightly-2019-05-16-223922

Steps:
1.Create a new build, which dir had symlink and .dockerignore file.
  $oc new-build  https://github.com/wewang58/dockerignore2
2. Build complete 
[wewang@Desktop dockerignore2]$ oc get builds 
NAME              TYPE     FROM          STATUS     STARTED          DURATION
dockerignore2-1   Docker   Git@831c29a   Complete   23 seconds ago   18s


[wewang@Desktop dockerignore2]$ ls -al
total 44
drwxrwxr-x.  4 wewang wewang  4096 May 17 16:31 .
drwx------. 39 wewang wewang 20480 May 17 16:31 ..
-rw-rw-r--.  1 wewang wewang    22 May 17 16:31 Dockerfile
-rw-rw-r--.  1 wewang wewang    10 May 17 16:29 .dockerignore
drwxrwxr-x.  8 wewang wewang  4096 May 17 16:32 .git
-rw-rw-r--.  1 wewang wewang    16 May 16 16:53 README.md
drwxrwxr-x.  3 wewang wewang  4096 May 17 10:52 subdir
lrwxrwxrwx.  1 wewang wewang     6 May 17 10:23 symlink -> subdir

Comment 6 errata-xmlrpc 2019-06-04 10:48:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.