Description of problem: My customer is seeing the following error when pulling images from Artificatory: Openshift complains that it fails to pull images from their artifactory, when looking in the evens of the namespace we see: [root@control-host-01 ~]# oc get events -n argocd LAST SEEN TYPE REASON OBJECT MESSAGE 46m Normal Pulling pod/argocd-secret-hook-kjghj Pulling image "<fqdn>/gitlab/ubi8m-oc:latest" 46m Warning Failed pod/argocd-secret-hook-kjghj Failed to pull image "<fqdn>/gitlab/ubi8m-oc:latest": rpc error: code = Unknown desc = Source image rejected: Too many open files 46m Warning Failed pod/argocd-secret-hook-kjghj Error: ErrImagePull The customer is not able to see anything odd in the openshift logs. Pulling the images only fails from a certain cluster, other clusters that are pulling from the same artifactory does not seem to have the same issues. Further to this during testing it was noted: podman pull would work crioctl pull would fail with the above error. Version-Release number of selected component (if applicable): OCP 4.6.15 How reproducible: Randomly reproducible, this does not happen all the time. Actual results: Expected results: Additional info:
The configured ulimits should be able to handle the number of open FDs CRI-O has, but I've also discovered a leak in CRI-O that we forgot to backport to 4.6: https://github.com/cri-o/cri-o/pull/4800 This should mitigate the situation (these connections would have been cleaned up, but it takes a while) I believe upgrading to a version of CRI-O with this patch will make this situation not happen anymore (or be *much* harder to reproduce). As such, moving this to POST
here's another PR that *may* help once integrated (and is a leak regardless, so worth picking up)
both attached PRs merged and will be in the next z stream
Tried to trigger the issue locally by setting ulimit on an node just above what was currently being used and pulled an image. Could not reproduce the issue. Also from bug description I see the issue happened randomly. I will mark it verified based on comment 18, 19.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.28 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1487