2042960 – azure-file CI fails with "gid(0) in storageClass and pod fsgroup(1000) are not equal"

Bug 2042960 - azure-file CI fails with "gid(0) in storageClass and pod fsgroup(1000) are not equal"

Summary: azure-file CI fails with "gid(0) in storageClass and pod fsgroup(1000) are no...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Tomas Smetana
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-20 12:18 UTC by Jan Safranek
Modified:	2022-03-10 16:41 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:40:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift azure-file-csi-driver-operator pull 23	0	None	open	Bug 2042960: Remove UID, GID from StorageClass asset	2022-01-24 16:16:56 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:41:09 UTC

Description Jan Safranek 2022-01-20 12:18:38 UTC

Recent azure-file CI job runs fail with this event when mounting a volume:

MountVolume.MountDevice failed for volume "pvc-be6c6935-9b3b-4160-beb8-536ae2969a8b" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_azure-file-csi-driver-operator/22/pull-ci-openshift-azure-file-csi-driver-operator-main-e2e-azure-file-csi/1483743420659798016

It could be related to Kubernetes rebase.

Version-Release number of selected component (if applicable):
4.10 CI

How reproducible:
always?

Comment 1 Tomas Smetana 2022-01-24 13:56:36 UTC

We use the StorageClass from https://github.com/openshift/azure-file-csi-driver-operator/blob/main/assets/storageclass.yaml. I think we just need to include the upstream change: https://github.com/kubernetes-sigs/azurefile-csi-driver/commit/451d5776b17791de2a7c2640d4dcfab2f658ecd0.

Comment 4 Wei Duan 2022-01-25 13:33:06 UTC

Before the fix:

Jan 25 11:02:24.765: INFO: At 2022-01-25 11:01:15 +0000 UTC - event for pod-189b165b-1680-443f-8ab3-7c07d5f347d5: {kubelet wduan-0125c-w27vs-worker-centralus2-kbcd8} FailedMount: MountVolume.MountDevice failed for volume "pvc-79c15966-6464-45e9-8415-71b395e8188c" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal

failed: (6m58s) 2022-01-25T11:02:25 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on the same node"



Jan 25 11:02:58.765: INFO: At 2022-01-25 11:01:47 +0000 UTC - event for pod-226ebb13-b9b2-402e-b36b-e4731a3560ab: {kubelet wduan-0125c-w27vs-worker-centralus2-9n87q} FailedMount: MountVolume.MountDevice failed for volume "pvc-f6506dd0-e635-4749-875d-b55fb4a6e9b0" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal

failed: (6m59s) 2022-01-25T11:02:59 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single read-only volume from pods on the same node"

$ egrep "fsgroup\(1000\)" result_20220125_105520.log  |wc -l
108


=================================================

Checking the storageclass with the fix:
$ oc get sc azurefile-csi -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-01-25T12:35:28Z"
  name: azurefile-csi
  resourceVersion: "40778"
  uid: 6948ed10-fc23-465a-8f2d-9bd5dc7e60b1
mountOptions:
- dir_mode=0777
- file_mode=0777
- mfsymlinks
- cache=strict
- nosharesock
- actimeo=30
parameters:
  skuName: Standard_LRS
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate


After the fix these cases passed:

$ grep "multiVolume" result_20220125_125232.log  | grep "^pass"
passed: (48.8s) 2022-01-25T12:53:26 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on different node"
passed: (48.3s) 2022-01-25T12:53:57 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on different node"
passed: (53.1s) 2022-01-25T12:54:54 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on the same node"
passed: (51.7s) 2022-01-25T12:54:55 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single read-only volume from pods on the same node"
passed: (28.6s) 2022-01-25T12:54:55 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on the same node"

$ egrep "fsgroup\(1000\)" result_20220125_125232.log  |wc -l
0


Verified as VERIFIED.

Comment 7 errata-xmlrpc 2022-03-10 16:40:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.