Description of problem: Ansible can not judge whether the images exist accurately Version-Release number of selected component (if applicable): https://github.com/sdodson/openshift-ansible containers How reproducible: Always Steps to Reproduce: 1. Install a containerized env with external etcd on vim hosts_test <--snip--> osm_etcd_image=virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd:latest <--snip--> Actual results: TASK: [etcd | Pull etcd container] ******************************************** changed: [10.66.79.125] => {"changed": true, "cmd": ["docker", "pull", "virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd:latest"], "delta": "0:00:00.086483", "end": "2015-12-18 10:19:49.823635", "rc": 0, "start": "2015-12-18 10:19:49.737152", "stderr": "", "stdout": "Trying to pull repository virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd ... latest: Pulling from rhel7/etcd\n6c3a84d798dc: Already exists\na15079cec631: Already exists\na15079cec631: Already exists\nDigest: sha256:43af248c2a7e60290a24a0c7d8a48042c63136491ef9218cd2ff50f43a3ade93\nStatus: Image is up to date for virt-openshift-05.lab.eng.nay.redhat.com:5000/rhel7/etcd:latest", "warnings": []} This step still pull the image when etcd image exists Expected results: Should skip pulling images Additional info: After pulling images. the 'wait images' step will timeout. msg: Task failed as maximum retries was encountered FATAL: all hosts have already failed -- aborting
Personally, my main concern is the experience around the 'wait images' timeout. Obviously, a timeout needs to be set because we can't have it waiting forever. We just need to make sure the admin knows how they can manually download the images to debug whatever latency or connection problem may trigger the issue. It seems reasonable to always attempt pulling the images. We don't really have any other way to ensure the latest one exists on disk. It's not going to re-download any layers that are already on disk so I don't see this as a huge problem.
Hmm, looking at https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_master/tasks/main.yml#L99 it seems ansible is supposed to avoid the redownload. There may be a bug in the playbook then.
I was working on fixing this here. https://github.com/openshift/openshift-ansible/pull/1097 I'm going to refactor it per Jason's suggestion to use a filter rather than awk and I'll probably drop the wait loop all together. I think the behavior I was working around via the wait loop was actually a symptom of poor image name matching rather than a problem where docker pull returns success pre-maturely.
I've switched to always pulling the images now. https://github.com/openshift/openshift-ansible/pull/1097 You may test by running the following on your checkout git pull https://github.com/sdodson/openshift-ansible containers
Check on openshift-ansible -b master check the oc and oadm command docker run -i --privileged --net=host --user=${user}:${group} -v ~/.kube:/root/.kube -v /tmp:/tmp -v /etc/origin:/etc/origin -e KUBECONFIG=/root/.kube/config --entrypoint ${cmd} --rm rcm-img-docker01.build.eng.bos.redhat.com:5001/openshift3/ose:v3.1.1.1 "${@}" Use the specified image. move it to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0075