Recent azure-file CI job runs fail with this event when mounting a volume: MountVolume.MountDevice failed for volume "pvc-be6c6935-9b3b-4160-beb8-536ae2969a8b" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_azure-file-csi-driver-operator/22/pull-ci-openshift-azure-file-csi-driver-operator-main-e2e-azure-file-csi/1483743420659798016 It could be related to Kubernetes rebase. Version-Release number of selected component (if applicable): 4.10 CI How reproducible: always?
We use the StorageClass from https://github.com/openshift/azure-file-csi-driver-operator/blob/main/assets/storageclass.yaml. I think we just need to include the upstream change: https://github.com/kubernetes-sigs/azurefile-csi-driver/commit/451d5776b17791de2a7c2640d4dcfab2f658ecd0.
Before the fix: Jan 25 11:02:24.765: INFO: At 2022-01-25 11:01:15 +0000 UTC - event for pod-189b165b-1680-443f-8ab3-7c07d5f347d5: {kubelet wduan-0125c-w27vs-worker-centralus2-kbcd8} FailedMount: MountVolume.MountDevice failed for volume "pvc-79c15966-6464-45e9-8415-71b395e8188c" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal failed: (6m58s) 2022-01-25T11:02:25 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on the same node" Jan 25 11:02:58.765: INFO: At 2022-01-25 11:01:47 +0000 UTC - event for pod-226ebb13-b9b2-402e-b36b-e4731a3560ab: {kubelet wduan-0125c-w27vs-worker-centralus2-9n87q} FailedMount: MountVolume.MountDevice failed for volume "pvc-f6506dd0-e635-4749-875d-b55fb4a6e9b0" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal failed: (6m59s) 2022-01-25T11:02:59 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single read-only volume from pods on the same node" $ egrep "fsgroup\(1000\)" result_20220125_105520.log |wc -l 108 ================================================= Checking the storageclass with the fix: $ oc get sc azurefile-csi -o yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: "2022-01-25T12:35:28Z" name: azurefile-csi resourceVersion: "40778" uid: 6948ed10-fc23-465a-8f2d-9bd5dc7e60b1 mountOptions: - dir_mode=0777 - file_mode=0777 - mfsymlinks - cache=strict - nosharesock - actimeo=30 parameters: skuName: Standard_LRS provisioner: file.csi.azure.com reclaimPolicy: Delete volumeBindingMode: Immediate After the fix these cases passed: $ grep "multiVolume" result_20220125_125232.log | grep "^pass" passed: (48.8s) 2022-01-25T12:53:26 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on different node" passed: (48.3s) 2022-01-25T12:53:57 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on different node" passed: (53.1s) 2022-01-25T12:54:54 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on the same node" passed: (51.7s) 2022-01-25T12:54:55 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single read-only volume from pods on the same node" passed: (28.6s) 2022-01-25T12:54:55 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on the same node" $ egrep "fsgroup\(1000\)" result_20220125_125232.log |wc -l 0 Verified as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056