Bug 1481693

Summary: overcloud container image upload is unefficient
Product: Red Hat OpenStack Reporter: Attila Fazekas <afazekas>
Component: openstack-tripleo-commonAssignee: Steve Baker <sbaker>
Status: CLOSED ERRATA QA Contact: Artem Hrechanychenko <ahrechan>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: emacchi, hbrock, jschluet, jslagle, m.andre, mburns, mcornea, ohochman, rhel-osp-director-maint, sbaker, slinaber
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-8.3.1-0.20180123050219.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:33:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Attila Fazekas 2017-08-15 13:15:01 UTC
Description of problem:

openstack overcloud container image upload --config-file ./container_images.yaml

takes more than 10 minute on high core count system, 
even with extremely fast I/O (SSD , >= 10GbE net).

Version-Release number of selected component (if applicable):
docker-rhel-push-plugin-1.12.6-48.git0fdc778.el7.x86_64
docker-client-1.12.6-48.git0fdc778.el7.x86_64
python-docker-py-1.10.6-1.el7.noarch
docker-1.12.6-48.git0fdc778.el7.x86_64
python-docker-pycreds-1.10.6-1.el7.noarch
docker-distribution-2.6.1-1.1.gita25b9ef.el7.x86_64
docker-common-1.12.6-48.git0fdc778.el7.x86_64

openstack-tripleo-common-7.4.1-0.20170807001945.8c46306.el7ost.noarch
python-tripleoclient-7.2.1-0.20170807222309.a731597.el7ost.noarch


The original problem what the `overcloud container image upload` was intended to solve is having a user controllable registry where the user can made adjustment on the container images when it is really necessary.

However if the modification is not need needed, a registry in proxy mode would be more efficient to use.

The issue with the proxy registries they would try to forward the pushes to the origin registry, which is not necessary the thing what the users wants.

The `overcloud container image upload` does 
foreach {image_to_mange}:
  pull form origin registry , unpack and store locally
  tag locally  
  push (compress) to user registry (undercloud)

Docker does minimal parallelism in these steps,
for example if an image has multiple new layers it tries to operate in parallel, but after the first images are there usually we have just 1~2 new layer and just 1~2 core is utilized, the operation is clearly CPU intensive not the I/O is the bottleneck.

Populating the local docker images is just a side effect,
I wonder is there any better way for bypassing this step and directly uploading the downloaded layers to the undercloud registry.

The task can be made more effect by using parallel loops, it easily 
can made the operation 3 times faster, even just with 4 core system.

The issue with simply executing things in parallel, we will try to upload
the common layers multiple times at the first iteration, however the end result
was better in all measured case.

Comment 1 Omri Hochman 2017-08-16 13:18:54 UTC
Steve can it be improved by removing containers that are not going to be use during the overcloud deployment ?

Comment 2 Steve Baker 2017-10-29 21:56:47 UTC
Omri, sure, using the --service-environment-file and --roles-file arguments is always recommended when calling "openstack overcloud container image prepare" to minimise uploads to the containers actually used.

However this change is more about parallelising the uploads, and avoiding the unnecessary transfer to the local docker cache.

I have a change upstream to switch to skopeo to do the transfers, which avoids the local docker cache.

I'll use this bz to track skopeo landing, and also adding some concurrency to the transfers.

Comment 9 Omri Hochman 2018-06-06 15:01:45 UTC
the used of skopeo seems to improve the process times,  please re-open if discover other inefficiencies   

And using -e in the prepare command can assure to download only 
'to be used' containers

Comment 11 errata-xmlrpc 2018-06-27 13:33:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086