Bug 1995779 - pipelines-scc fsGroup.type set to MustRunAs which is causing Pipeline pods timeouts. It should be set to RunAsAny
Summary: pipelines-scc fsGroup.type set to MustRunAs which is causing Pipeline pods ti...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenShift Pipelines
Classification: Red Hat
Component: Operator
Version: 1.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Nikhil Thomas
QA Contact: Nobody
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-19 18:21 UTC by Novonil Choudhuri
Modified: 2025-04-04 13:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-27 09:53:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHDEVDOCS-3243 0 None None None 2021-08-26 12:30:02 UTC

Description Novonil Choudhuri 2021-08-19 18:21:42 UTC
Description of problem: 

Users are reporting issues where in pipelines are randomly failing as result of volumes mounts of type "ReadWriteMany" failing. Further investigation surfaced this error in the pod logs

 

"Aug 18 14:57:48 pd103-7h7tj-worker-f-2ft2l hyperkube[2771]: W0818 14:57:48.856233    2771 volume_linux.go:51] Setting volume ownership for /var/lib/kubelet/pods/ddee86f0-e50a-4af0-a794-b7200c7e99e8/volumes/kubernetes.io~portworx-volume/pvc-a6dbd1bd-1c6f-42af-98f1-7c8bf3a316b7 and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699"

 

This is happening because Kubernetes/OpenShift is trying to recursive chmod/chown, to match random UID assigned by OpenShift, since the "pipelines-scc" has "fsGroup.type" set to "MustRunAs", instead of "RunAsAny".

 

It  looks like Kubernetes/OpenShift allocates a fixed time of 2 minutes is allowed volume to mount and whenever it see fsGroup is would do recursive chmod+chown, thus if you have many file we are seeing volume mount timeouts.

 

- https://github.com/kubernetes/kubernetes/issues/69699

- https://github.com/openshift/origin/blob/release-4.7/vendor/k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go#L71-L80

- https://access.redhat.com/solutions/4900491

- https://bugzilla.redhat.com/show_bug.cgi?id=1503906

 

This issues does not occur if we switch to PVC with accessMode set "ReadWriteOnce", not only that pipeline completes in substantially faster since the "pipelines-scc" has "supplementalGroups.type" set to "RunAsAny".

 

This leads me to following conclusion that "pipelines-scc" installed by OpenShift-Pipelines operator must updated to "fsGroup.type=RunAsAny".

 

Version-Release number of selected component (if applicable): 

Openshift v4.7.21
OpenShift Pipeline v1.4


How reproducible: 

1. Run parallel Pipeline tasks with some of the tasks having files doing random chmod/chown by the running pods.
2. Kubernetes/OpenShift is trying to recursive chmod/chown, to match random UID assigned by OpenShift, since the "pipelines-scc" has "fsGroup.type" set to "MustRunAs", instead of "RunAsAny".
3. Error found in events logs as in description.


Actual results: Pipeline failures as described in description


Expected results: "pipelines-scc" installed by OpenShift-Pipelines operator must updated to "fsGroup.type=RunAsAny".

Additional info:


Note You need to log in before you can comment on or make changes to this bug.