This is happening in 3.1.1.6 with the docker build strategy. This is what's happening (via OSE's docker builder; I'm listing the rough equivalent CLI steps): 1. docker build -t $registry/$project/$image:latest 2. docker push $registry/$project/$image:latest 3. in parallel: 3a. image change trigger kicks off a deployment and it happens to land on the same node, this does 'docker pull $registry/$project/$image@sha256:...' 3. docker rmi $registry/$project/$image:latest The removal of the image tagged :latest happens at about the same time that the image is being pulled by its sha256 digest. We see in the journalctl output for docker that the image removal is issued a bit before the pull by digest occurs. The image removal removes layers not in use by any other image/container, and the pull by digest is trying to pull them down at the same time.
Version is 3.1.1
verified with : openshift v3.1.1.6-43-gf583589 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1038
Brenton, sorry I don't have a theory or explanation for why in the TSI case restarting the node could have made the patch start working. While debugging with Matt, we did verify two things: 1) The image that the builds were using was the image that contained the fix. We did this by looking at the output of /usr/bin/origin version using the image of one of the completed build containers. 2) The symptoms we were seeing were consistent with the bug that was fixed in the new builder image. After a build completed, the image was no longer present in the local Docker, stalling the pre-deployment pod.