Bug 1574379 - Unable to pull image with error "Manifest unknown" when docker-registry memory is near pod memory limit.
Summary: Unable to pull image with error "Manifest unknown" when docker-registry memor...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.z
Assignee: Alexey Gladkov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-03 07:32 UTC by Jaspreet Kaur
Modified: 2018-08-22 02:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-22 02:39:15 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jaspreet Kaur 2018-05-03 07:32:46 UTC
Description of problem: Pods are not starting up as it is unable to pull the image, failing with the error:

Failed to pull image "172.30.150.150:5000/ci/mac-slave2.nexus:1.0": manifest unknown: manifest unknown

We have confirmed that the image is valid and has worked previously. The image is hosted in an external docker registry and is used via the image stream "mac-slave2.nexus" which has the docker pull spec: "172.30.150.150:5000/ci/mac-slave2.nexus". 

We have confirmed that the tag 1.0 has been resolved in the image stream as we can see the image hash as well as pull spec defined (previously if we had an invalid image name / tag, the image hash and pull spec will not be populated after creating the image stream). 

This issue only happens when the memory of the integrated docker-registry memory is near the docker-registry pod's memory limit. 

Re-deploying the docker-registry pod resolves the issue.

1. Memory on the docker-registry pod reaches the limit in a day or two which depends on the number of pods deployed in that day. More pod deployments can bring down the memory even earlier.
2. The memory used up will never be freed unless a restart is done.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: Fails when registry pod memory limit reaches in 1 or 2 days 


Expected results: should not have an issue with deployments in such a short span.


Additional info:

Comment 14 Venkata Tadimarri 2018-07-16 02:29:40 UTC
That's right. Restarting the docker-registry pod (and essentially flushing the memory) resolves the issue, without having to rebuild the image OR recreate the image stream either.


Note You need to log in before you can comment on or make changes to this bug.