Bug 1948441 - ImagePullBackOff: Source image rejected: Too many open files
Summary: ImagePullBackOff: Source image rejected: Too many open files
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.z
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On: 1953071
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-12 07:51 UTC by Andy Bartlett
Modified: 2024-10-01 17:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1953071 (view as bug list)
Environment:
Last Closed: 2021-05-12 12:18:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github containers image pull 1210 0 None open release-5.5: fix docker.GetDigest docker.makeRequestToResolvedURL doc… 2021-04-23 20:27:41 UTC
Github cri-o cri-o pull 4800 0 None open [release-1.19] hostport manager clean up host ports 2021-04-23 19:47:33 UTC
Red Hat Product Errata RHBA-2021:1487 0 None None None 2021-05-12 12:18:29 UTC

Description Andy Bartlett 2021-04-12 07:51:28 UTC
Description of problem:

My customer is seeing the following error when pulling images from Artificatory:

Openshift complains that it fails to pull images from their artifactory, when looking in the evens of the namespace we see:

[root@control-host-01 ~]# oc get events -n argocd
LAST SEEN   TYPE      REASON              OBJECT                                      MESSAGE
46m         Normal    Pulling             pod/argocd-secret-hook-kjghj                Pulling image "<fqdn>/gitlab/ubi8m-oc:latest"
46m         Warning   Failed              pod/argocd-secret-hook-kjghj                Failed to pull image "<fqdn>/gitlab/ubi8m-oc:latest": rpc error: code = Unknown desc = Source image rejected: Too many open files
46m         Warning   Failed              pod/argocd-secret-hook-kjghj                Error: ErrImagePull

The customer is not able to see anything odd in the openshift logs. Pulling the images only fails from a certain cluster, other clusters that are pulling from the same artifactory does not seem to have the same issues.

Further to this during testing it was noted:

podman pull would work
crioctl pull would fail with the above error.

Version-Release number of selected component (if applicable):

OCP 4.6.15


How reproducible:

Randomly reproducible, this does not happen all the time. 

Actual results:


Expected results:


Additional info:

Comment 18 Peter Hunt 2021-04-23 19:47:37 UTC
The configured ulimits should be able to handle the number of open FDs CRI-O has, but I've also discovered a leak in CRI-O that we forgot to backport to 4.6:
https://github.com/cri-o/cri-o/pull/4800

This should mitigate the situation (these connections would have been cleaned up, but it takes a while)

I believe upgrading to  a version of CRI-O with this patch will make this situation not happen anymore (or be *much* harder to reproduce). As such, moving this to POST

Comment 19 Peter Hunt 2021-04-23 20:27:44 UTC
here's another PR that *may* help once integrated (and is a leak regardless, so worth picking up)

Comment 21 Peter Hunt 2021-04-28 13:45:34 UTC
both attached PRs merged and will be in the next z stream

Comment 24 Sunil Choudhary 2021-05-06 04:20:54 UTC
Tried to trigger the issue locally by setting ulimit on an node just above what was currently being used and pulled an image.
Could not reproduce the issue. Also from bug description I see the issue happened randomly. I will mark it verified based on comment 18, 19.

Comment 26 errata-xmlrpc 2021-05-12 12:18:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.28 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1487


Note You need to log in before you can comment on or make changes to this bug.