Bug 1575898

Summary: docker_image_availability check in upgrade failed due to ose-control-plane&ose-node:latest image unavailable
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Scott Dodson <sdodson>
Status: CLOSED WONTFIX QA Contact: liujia <jiajliu>
Severity: low Docs Contact:
Priority: low    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas, sdodson, wmeng
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-18 18:00:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2018-05-08 08:30:31 UTC
Description of problem:
Upgrade failed at TASK [Run health checks (upgrade)] due to ose-control-plane&ose-node:latest image unavailable. openshift_health_checker role was run before openshift_version role, so openshift_release/openshift_image_tag was not set to any value which caused all images's tag was set to default latest during healthy check.

# vim roles/openshift_health_checker/openshift_checks/docker_image_availability.py
image_tag = self.get_var("openshift_image_tag", default="latest")

Failure summary:
  1. Hosts:    x.x.x.x
     Play:     OpenShift Health Checks
     Task:     Run health checks (upgrade)
     Message:  One or more checks failed
     Details:  check "docker_image_availability":
               One or more required container images are not available:
                   openshift3/ose-control-plane:latest,
                   openshift3/ose-node:latest
               Checked with: skopeo inspect [--tls-verify=false] [--creds=<user>:<pass>] docker://<registry>/<image>
               Default registries searched: registry.reg-aws.openshift.com:443, registry.access.redhat.com
               Blocked registries: registry.hacker.com

# docker pull openshift3/ose-control-plane:latest
Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/ose-control-plane ... 
Pulling repository registry.reg-aws.openshift.com:443/openshift3/ose-control-plane
Trying to pull repository registry.access.redhat.com/openshift3/ose-control-plane ... 
Trying to pull repository registry.access.redhat.com/openshift3/ose-control-plane ... 
Trying to pull repository docker.io/openshift3/ose-control-plane ... 
repository docker.io/openshift3/ose-control-plane not found: does not exist or no pull access


# docker pull openshift3/ose-control-plane:v3.10
Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/ose-control-plane ... 
v3.10: Pulling from registry.reg-aws.openshift.com:443/openshift3/ose-control-plane
d1fe25896eb5: Already exists 
001d79f68470: Already exists 
51c5e732a200: Pull complete 
4d0779510506: Pull complete 
Digest: sha256:47b10c1856fdde3d3af8bb6759e19127608e99e5ac6e37a653b556fb5e408890
Status: Downloaded newer image for registry.reg-aws.openshift.com:443/openshift3/ose-control-plane:v3.10


Version-Release number of the following components:
openshift-ansible-3.10.0-0.36.0.git.0.521f0ef.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Run upgrade against docker containerized ocp on rhel
2.
3.

Actual results:
Upgrade will fail due to docker_image_availability check.

Expected results:
docker_image_availability check should use correct image tag.

Additional info:
workaround:openshift_disable_check=docker_image_availability

Comment 1 Scott Dodson 2018-05-11 19:18:15 UTC
This is an issue only with the internal registry that's been fixed by pushing the latest tag. This shouldn't happen in registry.access.redhat.com. Lets verify that this fixes the problem and we can go ahead and close this.

Comment 2 liujia 2018-05-15 01:38:05 UTC
Hi, Scott

Yes, when image was tagged with "latest" in internal registry, then this issue can be workaround.

However, I file this bug because I think the root cause should be that openshift_version role should run before openshift_health_checker role, so that openshift_release/openshift_image_tag be set to correct value. And I think docker_image_availability need check the image(v3.10/v3.10.0-x.x.x.x) which will be really used in later upgrade, but not default(latest) one which will not be pulled at all.

BTW, Though this wouldn't happen in registry.access.redhat.com, but I'm not sure if "latest" tag always existed for user who prefer an internal registry(just like our registry.reg-aws.openshift.com:443). If not, I think maybe the workaround should not work. 

Change status back to wait a more reasonable solution. Of course, this issue does not block anything.

Comment 3 Scott Dodson 2019-02-18 18:00:31 UTC
There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached.