Description of problem: Pods are not starting up as it is unable to pull the image, failing with the error:
Failed to pull image "172.30.150.150:5000/ci/mac-slave2.nexus:1.0": manifest unknown: manifest unknown
We have confirmed that the image is valid and has worked previously. The image is hosted in an external docker registry and is used via the image stream "mac-slave2.nexus" which has the docker pull spec: "172.30.150.150:5000/ci/mac-slave2.nexus".
We have confirmed that the tag 1.0 has been resolved in the image stream as we can see the image hash as well as pull spec defined (previously if we had an invalid image name / tag, the image hash and pull spec will not be populated after creating the image stream).
This issue only happens when the memory of the integrated docker-registry memory is near the docker-registry pod's memory limit.
Re-deploying the docker-registry pod resolves the issue.
1. Memory on the docker-registry pod reaches the limit in a day or two which depends on the number of pods deployed in that day. More pod deployments can bring down the memory even earlier.
2. The memory used up will never be freed unless a restart is done.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Actual results: Fails when registry pod memory limit reaches in 1 or 2 days
Expected results: should not have an issue with deployments in such a short span.
That's right. Restarting the docker-registry pod (and essentially flushing the memory) resolves the issue, without having to rebuild the image OR recreate the image stream either.