Bug 1456986
Summary: | [OSP12][openstack containers]: Errors while pulling images from remote registry causes overcloud deployment to fail. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Omri Hochman <ohochman> |
Component: | openstack-containers | Assignee: | Jiri Stransky <jstransk> |
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
Severity: | high | Docs Contact: | Andrew Burden <aburden> |
Priority: | high | ||
Version: | 12.0 (Pike) | CC: | bdobreli, dprince, dyasny, jcoufal, jstransk, lruzicka, m.andre, ohochman, rhallise, sasha, tvignaud |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 12.0 (Pike) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-13 19:14:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1434060 |
Description
Omri Hochman
2017-05-30 21:19:57 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. We should check if the issue is related with Bz: https://bugzilla.redhat.com/show_bug.cgi?id=1455683 Another question would be: -------------------------- why do we need to have the remote registry configured during overcloud deployment, if the images are already downloaded and pushed on the local registry on the undercloud ? local : 192.168.24.1:8787 remote: docker-registry.engineering.redhat.com (In reply to Omri Hochman from comment #2) > why do we need to have the remote registry configured during overcloud > deployment, if the images are already downloaded and pushed on the local > registry on the undercloud ? This all depends on the DockerNamespace parameter value you provided to the deployment. From the logs above it looks like you pointed DockerNamespace at docker-registry.engineering.redhat.com directly (i guess as an attempt to work around bug 1455683 ?). In this case it would download the images from that registry instead of using the one on undercloud. If we would go with a solution (such as remote SAT ) or anything else , that does not include to build the local registry on the undercloud , then we would need to check again - If during overcloud deployment we're getting connection time-outs that breaks deployment ? In that case it would be a blocker. Omri, I believe intermittent download issues can not be treated as blockers. The same type of issues may be faced with installing packages. How should these situations be handled? Is automatic retry steps required, like "yum clean expire-cache && yum history redo last" for the latter example? Would manual recovery steps make it a blocker as well? Are there any recent occurrences of this bug? I'm hitting this as well now. One solution I found was to set docker-puppet.py's PROCESS_COUNT env variable to 3. This matches docker daemon's default pull count as well and might be a safer default. I've proposed this in the upstream patch here: https://bugs.launchpad.net/tripleo/+bug/1713188 (In reply to Jiri Stransky from comment #8) > Are there any recent occurrences of this bug? According to Comment#9 seems that Dan just reproduced it on latest version , and then, changing the settings fixed the issue. In addition to the PROCESS_COUNT patch above there is also a docker pull retry we added in commit 9a1015581d59a7a38e1bdb2ff97da1161123e05c Author: Dan Prince <dprince> Date: Thu Sep 7 16:48:28 2017 -0400 Add a docker pull retry to docker-puppet.py Co-Authored-By: Ian Main <imain> Change-Id: Iad6d38690340f4a064a4527c58ed439d91fa5188 Closes-bug: #1715136 (cherry picked from commit d3b3361a76c2e8b188fa8e586d9fb7f3c60bb66f) Both of these are in the latest puddles. Curios to see if anyone from QE is still hitting this. *** Bug 1483756 has been marked as a duplicate of this bug. *** And I've also added the same retry loop to the docker pull from the 'overcloud container image upload' command in https://review.openstack.org/#/c/505681/ to accommodate with occasional I/O failures. Environment: instack-undercloud-7.4.2-0.20171010064304.el7ost.noarch openstack-puppet-modules-11.0.0-0.20170828113154.el7ost.noarch openstack-tripleo-common-containers-7.6.3-0.20171022171808.el7ost.noarch openstack-tripleo-common-7.6.3-0.20171022171808.el7ost.noarch Able to deploy successfully with remote registry Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3457 |