Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1622256

Summary: Master api does not start due to slow image pull
Product: OpenShift Container Platform Reporter: Michael Gugino <mgugino>
Component: Cluster Version OperatorAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas, wsun
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:25:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gugino 2018-08-24 22:43:18 UTC
Description of problem:
Upgrading from openshift 3.10 to 3.11 forces registry.access.redhat.com users to migrate to registry.redhat.io.  During upgrades, we update static pod definitions to create api service from new registry.  We do not attempt to pre-pull or ensure image is present on the host before attempting to start the api service.  Timeout of ~120 seconds (120 retries, 1 second delay) might not be enough considering the size of the image.

How reproducible: Depends on network/registry speed, but should happen with some regularity.

Steps to Reproduce:
1.  Upgrade openshift 3.10 -> 3.11 using upgrade_control_plane.yml

Actual results:
Timeout waiting for api to start, due primarily to image now being present on host and taking ~4 minutes to download.

Expected results:
We should pre-pull the image across all masters before attempting to upgrade.

Comment 1 Michael Gugino 2018-08-27 15:12:26 UTC
PR Created in master: https://github.com/openshift/openshift-ansible/pull/9779

Comment 3 liujia 2018-09-05 05:58:18 UTC
Version:openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch

Checked that pr9779 merged. Upgrade succeed with control plane images pulled in advance.

Comment 5 errata-xmlrpc 2018-10-11 07:25:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652