Description of problem: Same problem described on bug 1987112, but happening on an Image Registry Operator with around 1.5 millions images. My understanding is that the workarounds implemented and described for the previous bug cannot be applied to the Image Registry Operator unless it is in unmanaged (and therefore unsupported) status. If this is wrong, please kindly let me know. Version-Release number of selected component (if applicable): 4.8 How reproducible: The problem happens always when the Image Registry Pods are trying to get started. Actual results: They stay in ContainerCreating until runtime-request-timeout is over. Even configuring 30 minutes is not enough in our case. Expected results: The pods start
Verified on 4.11.0-0.nightly-2022-04-06-213816 cluster $oc rsh image-registry-7b5787bb5-krhmq df -i /dev/sdb 13107200 13106107 1093 100% /registry With 13 millons files, the registry recreate with 3 mins oc get pods -w NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-6d87668f65-s4l9z 1/1 Running 0 4h36m image-registry-5d48768f6-25gvw 0/1 ContainerCreating 0 62s node-ca-9bcbj 1/1 Running 0 4h24m node-ca-czm2d 1/1 Running 0 4h24m node-ca-kkwbk 1/1 Running 0 4h24m node-ca-l9gv6 1/1 Running 0 4h24m node-ca-w6fn6 1/1 Running 0 4h24m node-ca-zk4ts 1/1 Running 0 4h24m image-registry-5d48768f6-25gvw 0/1 Running 0 2m42s image-registry-5d48768f6-25gvw 1/1 Running 0 3m Then Override the annonation and runtimeclass oc edit config.image unsupportedConfigOverrides: deployment: annotations: io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: "true" runtimeClassName: selinux The registry recreate time cuts down to 2m with 13 millons files. oc get pods -w NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-6d87668f65-s4l9z 1/1 Running 0 8h image-registry-7b5787bb5-krhmq 0/1 ContainerCreating 0 16s node-ca-9bcbj 1/1 Running 1 7h58m node-ca-czm2d 1/1 Running 0 7h58m node-ca-kkwbk 1/1 Running 0 7h58m node-ca-l9gv6 1/1 Running 1 7h58m node-ca-w6fn6 1/1 Running 0 7h58m node-ca-zk4ts 1/1 Running 1 7h58m image-registry-7b5787bb5-krhmq 0/1 Running 0 96s image-registry-7b5787bb5-krhmq 1/1 Running 0 2m oc rsh image-registry-7b5787bb5-krhmq ls -lZ /registry/docker/registry/v2/repositories/default total 4 drwxr-sr-x. 5 1000330000 1000330000 system_u:object_r:container_file_t:s0:c12,c18 4096 Apr 7 03:28 httpd-ex oc get pods image-registry-7b5787bb5-krhmq -o yaml apiVersion: v1 kind: Pod metadata: annotations: imageregistry.operator.openshift.io/dependencies-checksum: sha256:0a825836652a80e42b437e0cd9fd7f9a3b3585717c05600c853ad7251dbc936b io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: "true" ====================snip==================== runtimeClassName: selinux schedulerName: default-scheduler securityContext: fsGroup: 1000330000 fsGroupChangePolicy: OnRootMismatch seLinuxOptions: level: s0:c18,c12 ====================snip====================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069