1995779 – pipelines-scc fsGroup.type set to MustRunAs which is causing Pipeline pods timeouts. It should be set to RunAsAny

Bug 1995779 - pipelines-scc fsGroup.type set to MustRunAs which is causing Pipeline pods timeouts. It should be set to RunAsAny

Summary: pipelines-scc fsGroup.type set to MustRunAs which is causing Pipeline pods ti...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat OpenShift Pipelines
Classification:	Red Hat
Component:	Operator
Sub Component:
Version:	1.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Nikhil Thomas
QA Contact:	Nobody
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-19 18:21 UTC by Novonil Choudhuri
Modified:	2025-04-04 13:07 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-10-27 09:53:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHDEVDOCS-3243	0	None	None	None	2021-08-26 12:30:02 UTC

Description Novonil Choudhuri 2021-08-19 18:21:42 UTC

Description of problem:

Users are reporting issues where in pipelines are randomly failing as result of volumes mounts of type "ReadWriteMany" failing. Further investigation surfaced this error in the pod logs

"Aug 18 14:57:48 pd103-7h7tj-worker-f-2ft2l hyperkube[2771]: W0818 14:57:48.856233 2771 volume_linux.go:51] Setting volume ownership for /var/lib/kubelet/pods/ddee86f0-e50a-4af0-a794-b7200c7e99e8/volumes/kubernetes.io~portworx-volume/pvc-a6dbd1bd-1c6f-42af-98f1-7c8bf3a316b7 and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699"

This is happening because Kubernetes/OpenShift is trying to recursive chmod/chown, to match random UID assigned by OpenShift, since the "pipelines-scc" has "fsGroup.type" set to "MustRunAs", instead of "RunAsAny".

It looks like Kubernetes/OpenShift allocates a fixed time of 2 minutes is allowed volume to mount and whenever it see fsGroup is would do recursive chmod+chown, thus if you have many file we are seeing volume mount timeouts.

- https://github.com/kubernetes/kubernetes/issues/69699

- https://github.com/openshift/origin/blob/release-4.7/vendor/k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go#L71-L80

- https://access.redhat.com/solutions/4900491

- https://bugzilla.redhat.com/show_bug.cgi?id=1503906

This issues does not occur if we switch to PVC with accessMode set "ReadWriteOnce", not only that pipeline completes in substantially faster since the "pipelines-scc" has "supplementalGroups.type" set to "RunAsAny".

This leads me to following conclusion that "pipelines-scc" installed by OpenShift-Pipelines operator must updated to "fsGroup.type=RunAsAny".

Version-Release number of selected component (if applicable):

Openshift v4.7.21
OpenShift Pipeline v1.4

How reproducible:

1. Run parallel Pipeline tasks with some of the tasks having files doing random chmod/chown by the running pods.
2. Kubernetes/OpenShift is trying to recursive chmod/chown, to match random UID assigned by OpenShift, since the "pipelines-scc" has "fsGroup.type" set to "MustRunAs", instead of "RunAsAny".
3. Error found in events logs as in description.

Actual results: Pipeline failures as described in description

Expected results: "pipelines-scc" installed by OpenShift-Pipelines operator must updated to "fsGroup.type=RunAsAny".

Additional info:

Note You need to log in before you can comment on or make changes to this bug.