Bug 1699129

Summary: COPY operation significantly slower in buildah
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: ContainersAssignee: Daniel Walsh <dwalsh>
Status: CLOSED WORKSFORME QA Contact: weiwei jiang <wjiang>
Severity: high Docs Contact:
Priority: medium    
Version: 4.1.0CC: adam.kaplan, aos-bugs, ccoleman, dwalsh, fshaikh, gblomqui, jokerman, mfiedler, mifiedle, mmccomas, nalin, tsweeney
Target Milestone: ---Keywords: BetaBlocker, Reopened
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos41-scalability
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-21 21:28:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ocp3 build log
none
ocp4 build log none

Description Ben Parees 2019-04-11 20:52:43 UTC
Description of problem:
Comparing ocp3 docker build performance with ocp4 buildah performance, when doing a COPY of a large directory structure (origin repo), it is much slower in buildah.

I am attaching a build log w/ timestamps from ocp3 and ocp4 so you can see where the time is being lost, but specifically:

OCP3 takes about 8 seconds:
2019-04-11T18:11:51.670842313Z --> FROM registry.svc.ci.openshift.org/ocp/builder:golang-1.10 as builder
2019-04-11T18:11:54.538474068Z --> WORKDIR /go/src/github.com/openshift/origin
2019-04-11T18:11:54.538498312Z --> COPY . .
2019-04-11T18:12:06.69885534Z --> RUN make build WHAT=cmd/hypershift

OCP4 takes about 1.5 minutes:
2019-04-11T16:01:32.110985181Z STEP 3: FROM 1fbf7c9856feb0ebd72822e5f0be7df3174077065fe94d692006895d6d2df6fe AS builder
2019-04-11T16:01:32.126446422Z STEP 4: COPY . .
2019-04-11T16:02:57.341556985Z --> 78d1796d5c10d149175adaeb337145eeb399e3eb5e95d0f513b990e1c5b57e7e
2019-04-11T16:02:57.341556985Z STEP 5: FROM 78d1796d5c10d149175adaeb337145eeb399e3eb5e95d0f513b990e1c5b57e7e AS builder
2019-04-11T16:02:57.527249493Z STEP 6: RUN make build WHAT=cmd/hypershift

Comment 1 Ben Parees 2019-04-11 20:53:16 UTC
Created attachment 1554648 [details]
ocp3 build log

Comment 2 Ben Parees 2019-04-11 20:53:33 UTC
Created attachment 1554649 [details]
ocp4 build log

Comment 3 Daniel Walsh 2019-04-12 01:49:52 UTC
Docker Does:

2019-04-11T18:11:54.538498312Z --> COPY . .
2019-04-11T18:12:06.69885534Z --> RUN make build WHAT=cmd/hypershift


Buildah Does:
2019-04-11T16:01:32.126446422Z STEP 4: COPY . .
2019-04-11T16:02:57.341556985Z --> 78d1796d5c10d149175adaeb337145eeb399e3eb5e95d0f513b990e1c5b57e7e
2019-04-11T16:02:57.341556985Z STEP 5: FROM 78d1796d5c10d149175adaeb337145eeb399e3eb5e95d0f513b990e1c5b57e7e AS builder
2019-04-11T16:02:57.527249493Z STEP 6: RUN make build WHAT=cmd/hypershift

Nalin any idea why we are doing this differently.  Are we committing and Docker is not?


Ben where can I get the Directory that is being copied in?

Comment 4 Ben Parees 2019-04-12 01:54:37 UTC
> Ben where can I get the Directory that is being copied in?


git clone https://github.com/openshift/origin

Comment 5 Daniel Walsh 2019-04-12 15:52:40 UTC
From this crude test it looks like buildah is faster the docker.

# pwd
/home/dwalsh/go/src/github.com/openshift/origin

# cat Dockerfile
FROM fedora
RUN date
COPY . .
RUN date

# buildah bud -t origin .
STEP 1: FROM fedora
STEP 2: RUN date
Fri Apr 12 15:50:26 UTC 2019
STEP 3: COPY . .
STEP 4: RUN date
Fri Apr 12 15:50:37 UTC 2019
STEP 5: COMMIT containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/containers/shared,overlay.mountopt=nodev]localhost/origin:latest
Getting image source signatures
Skipping blob b7ba3be6a0d6 (already present): 269.94 MiB / 269.94 MiB [=====] 0s
Skipping blob b7ba3be6a0d6 (already present): 269.94 MiB / 269.94 MiB [====] 16s
Skipping blob b7ba3be6a0d6 (already present): 269.94 MiB / 269.94 MiB [====] 16s
Copying blob b3177bfd89c8: 1.94 GiB / 1.94 GiB [===========================] 16s
Copying config cf7d482c4e6a: 1.09 KiB / 1.09 KiB [==========================] 0s
Writing manifest to image destination
Storing signatures

11 seconds

# docker build -t origin .
Sending build context to Docker daemon 2.089 GB
Step 1/4 : FROM fedora
 ---> d09302f77cfc
Step 2/4 : RUN date
 ---> Running in 1b6b874220d8
Fri Apr 12 15:46:36 UTC 2019
 ---> 8277a4d0db85
Removing intermediate container 1b6b874220d8
Step 3/4 : COPY . .
 ---> 2d3282c61cb9
Removing intermediate container 535c5f3e7da6
Step 4/4 : RUN date
 ---> Running in 3418afc50d63
Fri Apr 12 15:47:22 UTC 2019
 ---> f59d1106c910
Removing intermediate container 3

46 seconds 

docker 1.13.1
buildah 1.7

Comment 6 Ben Parees 2019-04-12 17:55:36 UTC
possibly related to the storage/storage driver we are using when we run in the build pod?

Comment 7 Clayton Coleman 2019-04-15 14:20:39 UTC
Raising severity, perf regressions from 3.11 are a top issue

Comment 8 Daniel Walsh 2019-04-15 14:28:21 UTC
Ben what storage driver are you using?  Is it overlay?

Comment 9 Nalin Dahyabhai 2019-04-15 14:45:42 UTC
(In reply to Daniel Walsh from comment #3)
> Docker Does:
> 
> 2019-04-11T18:11:54.538498312Z --> COPY . .
> 2019-04-11T18:12:06.69885534Z --> RUN make build WHAT=cmd/hypershift
> 
> Buildah Does:
> 2019-04-11T16:01:32.126446422Z STEP 4: COPY . .
> 2019-04-11T16:02:57.341556985Z -->
> 78d1796d5c10d149175adaeb337145eeb399e3eb5e95d0f513b990e1c5b57e7e
> 2019-04-11T16:02:57.341556985Z STEP 5: FROM
> 78d1796d5c10d149175adaeb337145eeb399e3eb5e95d0f513b990e1c5b57e7e AS builder
> 2019-04-11T16:02:57.527249493Z STEP 6: RUN make build WHAT=cmd/hypershift
> 
> Nalin any idea why we are doing this differently.  Are we committing and
> Docker is not?

Both _should_ be committing after COPY instructions, but I didn't think openshift/builder in 4.x was doing multi-layer builds (the image optimization policy is always ImageOptimizationSkipLayers), so it shouldn't have been committing at that point.

Comment 10 Nalin Dahyabhai 2019-04-15 14:53:36 UTC
Ah, it's multiple stages.  We currently commit to create an image at the end of a stage, even if it's not used as the basis for a later stage.  That's useful as a cache item, but if that isn't happening under docker build, that would cost us some time.

Comment 11 Ben Parees 2019-04-15 14:55:58 UTC
Should be overlay storage, yes.  Nalin can confirm.

Comment 12 Nalin Dahyabhai 2019-04-15 17:29:15 UTC
Yes, this should be using overlay since we're a privileged container.  If this is https://github.com/openshift/origin/blob/master/images/hypershift/Dockerfile.rhel, though, I don't understand why it's committing between the COPY on line 3 and the RUN on line 4, since we should only be committing at the end of the stage with the current optimization policy.

Comment 13 Daniel Walsh 2019-04-30 13:06:00 UTC
I don't see this as an issue with 1.7.3 release.

Comment 14 Nalin Dahyabhai 2019-05-02 06:21:38 UTC
https://github.com/openshift/builder/pull/67

Comment 17 weiwei jiang 2019-05-07 07:29:59 UTC
    Checked withed bumped version v1.8.2 and can not reproduce this issue, so move to verified.

    # ../../containers/buildah/buildah --version
    buildah version 1.8.2 (image-spec 1.0.0, runtime-spec 1.0.0)

    # ../../containers/buildah/buildah  bud -t origin -f ./images/hypershift/Dockerfile.rhel . 2>&1 | ts 
    May 07 15:07:02 STEP 1: FROM registry.svc.ci.openshift.org/ocp/builder:golang-1.11 AS builder
    May 07 15:07:02 STEP 2: WORKDIR /go/src/github.com/openshift/origin
    May 07 15:07:02 STEP 3: COPY . .
    May 07 15:07:23 STEP 4: RUN make build WHAT=cmd/hypershift;     mkdir -p /tmp/build;     cp /go/src/github.com/openshift/origin/_output/local/bin/linux/$(go env GOARCH)/hypershift /tmp/build/hypershift
    May 07 15:07:24 hack/build-go.sh cmd/hypershift

Comment 19 errata-xmlrpc 2019-06-04 10:47:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 24 Tom Sweeney 2019-11-20 15:03:50 UTC
Fatima, 

This looks like you're using Buildah directly rather than through OpenShift?  Can I ask you to instead open an issue on GitHub for this if so (https://github.com/containers/buildah), or if it's an issue you're seeing when running Buildah under OpenShift, then create a new BZ and add a link to it here please?

I don't have a quick answer to your question at the moment, we'll need to dig and do some comparisions.

Comment 26 Daniel Walsh 2019-11-21 13:54:53 UTC
Do we know what version of buildah they are using in OpenSHift 4.1?  Newer versions of buildah have fixed issues in this area.

That version of buildah should be available in RHEL8.1 release.

Comment 27 Tom Sweeney 2019-11-21 13:59:01 UTC
Dan IDK and have talked a little bit with Adam Kaplan about getting a way via some kind of 'oc status' command to list the Buildah version on board.  I think that will be important later on down the road.  If you want to push on that too, I think that would be good.

Comment 28 Daniel Walsh 2019-11-21 14:19:22 UTC
Why are doing questions in PRIVATE.  let's do everything in the open.  

Is my understanding incorrect, the problem is with the OpenShift Builder (Which vendors in Buildah), or is the 
issue with the actual Buildah package being used on OpenShift?

Comment 29 Tom Sweeney 2019-11-21 21:28:02 UTC
As Fatima agreed to start a new BZ or a Buildah issue to handdle this, I'm setting the status back to Closed for this BZ.

Comment 30 Red Hat Bugzilla 2023-09-18 00:15:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days