Bug 1710008

Summary: [4.1.z] image build doesn't handle COPY correctly in some cases
Product: OpenShift Container Platform Reporter: Adam Kaplan <adam.kaplan>
Component: ContainersAssignee: Adam Kaplan <adam.kaplan>
Status: CLOSED ERRATA QA Contact: weiwei jiang <wjiang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: adam.kaplan, aos-bugs, bparees, calfonso, dwalsh, jokerman, mmccomas, nalin, wewang, wjiang, wzheng, xtian
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1707941 Environment:
Last Closed: 2019-06-04 10:48:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1707941    
Bug Blocks:    

Description Adam Kaplan 2019-05-14 17:46:18 UTC
+++ This bug was initially created as a clone of Bug #1707941 +++

Description of problem:
in a dockerfile,
COPY . .

is failing in some cases.


How reproducible:
always

Steps to Reproduce:
1. git clone git:operator-framework/helm.git
2. buildah bud .
3. see failure:
error building at STEP "COPY . .": error copying "/home/bparees/git/gocode/src/github.com/openshift/helm/pkg/chartutil/testdata/joonix/charts/frobnitz" to "/home/bparees/.local/share/containers/storage/vfs/dir/a95fa17f13262c63706f22e35a8c0186a522bff0df57c97028c88867df39bd02/go/src/k8s.io/helm": Can't copy a directory

4. docker build .
5. see success


Actual results:
buildah bud fails, docker build succeeds. 

Expected results:
both should succeed

Additional info:

This is a blocker for OCP4.1 because ocp image builds are experiencing the same failure.  buildah is just an easy reproducer.

There are also similar looking cases of COPY that seem to work fine:

git clone git:openshift/elasticsearch-operator.git
buildah bud .

succeeds despite doing pretty much the same COPY operation: https://github.com/openshift/elasticsearch-operator/blob/master/Dockerfile#L3

--- Additional comment from Ben Parees on 2019-05-08 18:46:40 UTC ---

full list of github repos i'm seeing this issue with:
operator-framework/helm
openshift/multus-admission-controller
openshift/node_exporter
openshift/grafana


I am also seeing a slightly different issue on these repos, but the overall effect is the same in that docker builds them fine, buildah fails:

operator-framework/operator-registry fails with:
STEP 13: RUN mkdir /registry
STEP 14: WORKDIR /registry
STEP 15: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/initializer /bin/initializer
STEP 16: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/registry-server /bin/registry-server
STEP 17: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/configmap-server /bin/configmap-server
STEP 18: COPY --from=builder /go/src/github.com/operator-framework/operator-registry/bin/appregistry-server /bin/appregistry-server
STEP 19: COPY --from=builder /go/bin/grpc_health_probe /bin/grpc_health_probe
STEP 20: RUN chgrp -R 0 /registry &&     chgrp -R 0 /dev &&     chmod -R g+rwx /registry &&     chmod -R g+rwx /dev
chgrp: changing group of '/dev/urandom': Permission denied
chgrp: changing group of '/dev/zero': Permission denied
chgrp: changing group of '/dev/tty': Permission denied
chgrp: changing group of '/dev/full': Permission denied
chgrp: changing group of '/dev/random': Permission denied
chgrp: changing group of '/dev/null': Permission denied
error building at STEP "RUN chgrp -R 0 /registry &&     chgrp -R 0 /dev &&     chmod -R g+rwx /registry &&     chmod -R g+rwx /dev": error while running runtime: exit status 1
ERRO[0260] exit status 1                                



openshift/cluster-api-provider-azure fails with:
STEP 1: FROM registry.svc.ci.openshift.org/openshift/release:golang-1.10 AS builder
STEP 2: WORKDIR /go/src/sigs.k8s.io/cluster-api-provider-azure
STEP 3: COPY pkg/    pkg/
STEP 4: COPY cmd/    cmd/
STEP 5: COPY vendor/ vendor/
error building at STEP "COPY vendor/ vendor/": error copying "/home/bparees/git/gocode/src/github.com/openshift/cluster-api-provider-azure/vendor/k8s.io/kubernetes/.bazelrc" to "/home/bparees/.local/share/containers/storage/vfs/dir/b2e6a7668c62fa0e1d9ac68cb38bf1bf367131424c88cfaef259cf7861a8b264/go/src/sigs.k8s.io/cluster-api-provider-azure/vendor": stat /home/bparees/git/gocode/src/github.com/openshift/cluster-api-provider-azure/vendor/k8s.io/kubernetes/.bazelrc: no such file or directory
ERRO[0115] exit status 1

--- Additional comment from Nalin Dahyabhai on 2019-05-08 19:43:06 UTC ---

It looks like the handling of .dockerignore files has difficulty with symbolic links (and probably other non-directory, non-regular items).

--- Additional comment from chris alfonso on 2019-05-08 20:22:05 UTC ---

Based upon your investigation, I'd like to move this to 4.2 as we wouldn't hold the GA release for this fix.

--- Additional comment from Ben Parees on 2019-05-08 22:45:17 UTC ---

Just to clarify the impact of this bug, based on my understanding from Nalin:

if you have a image build context directory containing:

1) a .dockerignore
2) a symlink (or other "unusual" file type)

and then you do a
COPY . /somedir

in your dockerfile.

Then your build will fail.  It does not matter if the .dockerignore references the symlink or not.

For the RUN issue, we should split it out into a separate (4.1.z+4.2.0) targeted bug as it's an unrelated issue and less severe in terms of likely users impacted.

--- Additional comment from Nalin Dahyabhai on 2019-05-09 15:30:30 UTC ---

https://github.com/containers/buildah/pull/1583 should fix the issues with symbolic links.

--- Additional comment from Nalin Dahyabhai on 2019-05-13 14:15:17 UTC ---

https://github.com/openshift/builder/pull/72 should merge the fix into the builder.

Comment 1 Adam Kaplan 2019-05-14 18:02:41 UTC
release-4.1 PR: https://github.com/openshift/builder/pull/73

Comment 4 wewang 2019-05-17 08:42:23 UTC
Verified it in image build side in version:
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-16-223922   True        False         5h58m   Cluster version is 4.1.0-0.nightly-2019-05-16-223922

Steps:
1.Create a new build, which dir had symlink and .dockerignore file.
  $oc new-build  https://github.com/wewang58/dockerignore2
2. Build complete 
[wewang@Desktop dockerignore2]$ oc get builds 
NAME              TYPE     FROM          STATUS     STARTED          DURATION
dockerignore2-1   Docker   Git@831c29a   Complete   23 seconds ago   18s


[wewang@Desktop dockerignore2]$ ls -al
total 44
drwxrwxr-x.  4 wewang wewang  4096 May 17 16:31 .
drwx------. 39 wewang wewang 20480 May 17 16:31 ..
-rw-rw-r--.  1 wewang wewang    22 May 17 16:31 Dockerfile
-rw-rw-r--.  1 wewang wewang    10 May 17 16:29 .dockerignore
drwxrwxr-x.  8 wewang wewang  4096 May 17 16:32 .git
-rw-rw-r--.  1 wewang wewang    16 May 16 16:53 README.md
drwxrwxr-x.  3 wewang wewang  4096 May 17 10:52 subdir
lrwxrwxrwx.  1 wewang wewang     6 May 17 10:23 symlink -> subdir

Comment 6 errata-xmlrpc 2019-06-04 10:48:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758