2067995 – Internal registries with a big number of images delay pod creation due to recursive SELinux file context relabeling

Bug 2067995 - Internal registries with a big number of images delay pod creation due to recursive SELinux file context relabeling

Summary: Internal registries with a big number of images delay pod creation due to rec...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Image Registry
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Ricardo Maraschini
QA Contact:	XiuJuan Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2074050
TreeView+	depends on / blocked

Reported:	2022-03-24 08:18 UTC by Lucas López Montero
Modified:	2022-10-12 02:47 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:01:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-image-registry-operator pull 763	None	open	Bug 2067995: Deployment annotations, runtimeClassName override and fs policy change	2022-03-31 13:16:09 UTC
Red Hat Issue Tracker	SUPPORTEX-8460	None	None	None	2022-03-25 12:24:09 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 11:02:25 UTC

Description Lucas López Montero 2022-03-24 08:18:37 UTC

Description of problem:

Same problem described on bug 1987112, but happening on an Image Registry Operator with around 1.5 millions images.

My understanding is that the workarounds implemented and described for the previous bug cannot be applied to the Image Registry Operator unless it is in unmanaged (and therefore unsupported) status. If this is wrong, please kindly let me know.


Version-Release number of selected component (if applicable): 4.8


How reproducible:

The problem happens always when the Image Registry Pods are trying to get started.


Actual results:

They stay in ContainerCreating until runtime-request-timeout is over. Even configuring 30 minutes is not enough in our case.


Expected results:

The pods start

Comment 18 XiuJuan Wang 2022-04-07 10:25:22 UTC

Verified on 4.11.0-0.nightly-2022-04-06-213816 cluster
$oc rsh image-registry-7b5787bb5-krhmq
df -i 
/dev/sdb       13107200 13106107     1093  100% /registry

With 13 millons files, the registry recreate with 3 mins
oc get pods -w
NAME                                               READY   STATUS              RESTARTS   AGE
cluster-image-registry-operator-6d87668f65-s4l9z   1/1     Running             0          4h36m
image-registry-5d48768f6-25gvw                     0/1     ContainerCreating   0          62s
node-ca-9bcbj                                      1/1     Running             0          4h24m
node-ca-czm2d                                      1/1     Running             0          4h24m
node-ca-kkwbk                                      1/1     Running             0          4h24m
node-ca-l9gv6                                      1/1     Running             0          4h24m
node-ca-w6fn6                                      1/1     Running             0          4h24m
node-ca-zk4ts                                      1/1     Running             0          4h24m
image-registry-5d48768f6-25gvw                     0/1     Running             0          2m42s
image-registry-5d48768f6-25gvw                     1/1     Running             0          3m

Then Override the annonation and runtimeclass

oc edit config.image
  unsupportedConfigOverrides:
    deployment:
      annotations:
        io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: "true"
      runtimeClassName: selinux

The registry recreate time cuts down to 2m with 13 millons files.
oc get pods -w
NAME                                               READY   STATUS              RESTARTS   AGE
cluster-image-registry-operator-6d87668f65-s4l9z   1/1     Running             0          8h
image-registry-7b5787bb5-krhmq                     0/1     ContainerCreating   0          16s
node-ca-9bcbj                                      1/1     Running             1          7h58m
node-ca-czm2d                                      1/1     Running             0          7h58m
node-ca-kkwbk                                      1/1     Running             0          7h58m
node-ca-l9gv6                                      1/1     Running             1          7h58m
node-ca-w6fn6                                      1/1     Running             0          7h58m
node-ca-zk4ts                                      1/1     Running             1          7h58m
image-registry-7b5787bb5-krhmq                     0/1     Running             0          96s
image-registry-7b5787bb5-krhmq                     1/1     Running             0          2m


oc rsh image-registry-7b5787bb5-krhmq
ls -lZ /registry/docker/registry/v2/repositories/default
total 4
drwxr-sr-x. 5 1000330000 1000330000 system_u:object_r:container_file_t:s0:c12,c18 4096 Apr  7 03:28 httpd-ex

oc get pods image-registry-7b5787bb5-krhmq -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    imageregistry.operator.openshift.io/dependencies-checksum: sha256:0a825836652a80e42b437e0cd9fd7f9a3b3585717c05600c853ad7251dbc936b
    io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: "true"
====================snip====================
  runtimeClassName: selinux
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000330000
    fsGroupChangePolicy: OnRootMismatch
    seLinuxOptions:
      level: s0:c18,c12
====================snip====================

Comment 20 errata-xmlrpc 2022-08-10 11:01:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.