1292679 – Ansible can not judge whether the specified images exist accurately

Bug 1292679 - Ansible can not judge whether the specified images exist accurately

Summary: Ansible can not judge whether the specified images exist accurately

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Scott Dodson
QA Contact:	Ma xiaoqiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-18 03:18 UTC by Ma xiaoqiang
Modified:	2016-07-04 00:46 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-01-27 19:43:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0075	0	normal	SHIPPED_LIVE	Red Hat OpenShift Enterprise atomic-openshift-utils bug fix update	2016-01-28 00:42:22 UTC

Description Ma xiaoqiang 2015-12-18 03:18:04 UTC

Description of problem:
Ansible can not judge whether the images exist accurately

Version-Release number of selected component (if applicable):
https://github.com/sdodson/openshift-ansible containers

How reproducible:
Always


Steps to Reproduce:

1. Install a containerized env with external etcd on
vim hosts_test
<--snip-->
osm_etcd_image=virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd:latest
<--snip-->




Actual results:
TASK: [etcd | Pull etcd container] ******************************************** 
changed: [10.66.79.125] => {"changed": true, "cmd": ["docker", "pull", "virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd:latest"], "delta": "0:00:00.086483", "end": "2015-12-18 10:19:49.823635", "rc": 0, "start": "2015-12-18 10:19:49.737152", "stderr": "", "stdout": "Trying to pull repository virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd ... latest: Pulling from rhel7/etcd\n6c3a84d798dc: Already exists\na15079cec631: Already exists\na15079cec631: Already exists\nDigest: sha256:43af248c2a7e60290a24a0c7d8a48042c63136491ef9218cd2ff50f43a3ade93\nStatus: Image is up to date for virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd:latest", "warnings": []}

This step still pull the image when etcd image exists

Expected results:
Should skip pulling images

Additional info:
After pulling images. the 'wait images' step will timeout. 

msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

Comment 1 Brenton Leanhardt 2016-01-05 13:35:00 UTC

Personally, my main concern is the experience around the 'wait images' timeout.  Obviously, a timeout needs to be set because we can't have it waiting forever.  We just need to make sure the admin knows how they can manually download the images to debug whatever latency or connection problem may trigger the issue.

It seems reasonable to always attempt pulling the images.  We don't really have any other way to ensure the latest one exists on disk.  It's not going to re-download any layers that are already on disk so I don't see this as a huge problem.

Comment 2 Brenton Leanhardt 2016-01-05 13:39:46 UTC

Hmm, looking at https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_master/tasks/main.yml#L99 it seems ansible is supposed to avoid the redownload.  There may be a bug in the playbook then.

Comment 3 Scott Dodson 2016-01-05 14:15:27 UTC

I was working on fixing this here. https://github.com/openshift/openshift-ansible/pull/1097

I'm going to refactor it per Jason's suggestion to use a filter rather than awk and I'll probably drop the wait loop all together. I think the behavior I was working around via the wait loop was actually a  symptom of poor image name matching rather than a problem where docker pull returns success pre-maturely.

Comment 4 Scott Dodson 2016-01-07 22:27:39 UTC

I've switched to always pulling the images now.

https://github.com/openshift/openshift-ansible/pull/1097

You may test by running the following on your checkout

git pull https://github.com/sdodson/openshift-ansible containers

Comment 8 Ma xiaoqiang 2016-01-12 06:58:09 UTC

Check on openshift-ansible -b master

check the oc and oadm command

docker run -i --privileged --net=host --user=${user}:${group} -v ~/.kube:/root/.kube -v /tmp:/tmp -v /etc/origin:/etc/origin -e KUBECONFIG=/root/.kube/config --entrypoint ${cmd} --rm rcm-img-docker01.build.eng.bos.redhat.com:5001/openshift3/ose:v3.1.1.1 "${@}"

Use the specified image. move it to VERIFIED.

Comment 10 errata-xmlrpc 2016-01-27 19:43:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0075

Note You need to log in before you can comment on or make changes to this bug.