Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1331038 - Pods are stuck in pending state due to failed image pulling
Pods are stuck in pending state due to failed image pulling
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build (Show other bugs)
3.1.0
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: Cesar Wong
Wang Haoran
:
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2016-04-27 09:56 EDT by Miheer Salunke
Modified: 2016-05-27 10:51 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-11 09:33:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1038 normal SHIPPED_LIVE Moderate: openshift security update 2016-05-11 13:32:46 EDT
Red Hat Product Errata RHSA-2016:1064 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 16:19:17 EDT

  None (edit)
Comment 5 Andy Goldstein 2016-04-27 14:58:59 EDT
This is happening in 3.1.1.6 with the docker build strategy. This is what's happening (via OSE's docker builder; I'm listing the rough equivalent CLI steps):

1. docker build -t $registry/$project/$image:latest
2. docker push $registry/$project/$image:latest
3. in parallel:
  3a. image change trigger kicks off a deployment and it happens to land on the same node, this does 'docker pull $registry/$project/$image@sha256:...'
  3. docker rmi $registry/$project/$image:latest

The removal of the image tagged :latest happens at about the same time that the image is being pulled by its sha256 digest. We see in the journalctl output for docker that the image removal is issued a bit before the pull by digest occurs. The image removal removes layers not in use by any other image/container, and the pull by digest is trying to pull them down at the same time.
Comment 8 Nicolas Dordet 2016-04-28 12:43:19 EDT
Version is 3.1.1
Comment 20 Wang Haoran 2016-05-03 01:51:49 EDT
verified with :
openshift v3.1.1.6-43-gf583589
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2
Comment 27 errata-xmlrpc 2016-05-11 09:33:19 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1038
Comment 32 Cesar Wong 2016-05-23 08:36:33 EDT
Brenton, sorry I don't have a theory or explanation for why in the TSI case restarting the node could have made the patch start working.

While debugging with Matt, we did verify two things:

1) The image that the builds were using was the image that contained the fix. We did this by looking at the output of /usr/bin/origin version using the image of one of the completed build containers.

2) The symptoms we were seeing were consistent with the bug that was fixed in the new builder image. After a build completed, the image was no longer present in the local Docker, stalling the pre-deployment pod.

Note You need to log in before you can comment on or make changes to this bug.