Bug 2042960 - azure-file CI fails with "gid(0) in storageClass and pod fsgroup(1000) are not equal"
Summary: azure-file CI fails with "gid(0) in storageClass and pod fsgroup(1000) are no...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Tomas Smetana
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-20 12:18 UTC by Jan Safranek
Modified: 2022-03-10 16:41 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:40:58 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift azure-file-csi-driver-operator pull 23 0 None open Bug 2042960: Remove UID, GID from StorageClass asset 2022-01-24 16:16:56 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:41:09 UTC

Description Jan Safranek 2022-01-20 12:18:38 UTC
Recent azure-file CI job runs fail with this event when mounting a volume:

MountVolume.MountDevice failed for volume "pvc-be6c6935-9b3b-4160-beb8-536ae2969a8b" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_azure-file-csi-driver-operator/22/pull-ci-openshift-azure-file-csi-driver-operator-main-e2e-azure-file-csi/1483743420659798016

It could be related to Kubernetes rebase.

Version-Release number of selected component (if applicable):
4.10 CI

How reproducible:
always?

Comment 4 Wei Duan 2022-01-25 13:33:06 UTC
Before the fix:

Jan 25 11:02:24.765: INFO: At 2022-01-25 11:01:15 +0000 UTC - event for pod-189b165b-1680-443f-8ab3-7c07d5f347d5: {kubelet wduan-0125c-w27vs-worker-centralus2-kbcd8} FailedMount: MountVolume.MountDevice failed for volume "pvc-79c15966-6464-45e9-8415-71b395e8188c" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal

failed: (6m58s) 2022-01-25T11:02:25 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on the same node"



Jan 25 11:02:58.765: INFO: At 2022-01-25 11:01:47 +0000 UTC - event for pod-226ebb13-b9b2-402e-b36b-e4731a3560ab: {kubelet wduan-0125c-w27vs-worker-centralus2-9n87q} FailedMount: MountVolume.MountDevice failed for volume "pvc-f6506dd0-e635-4749-875d-b55fb4a6e9b0" : rpc error: code = InvalidArgument desc = gid(0) in storageClass and pod fsgroup(1000) are not equal

failed: (6m59s) 2022-01-25T11:02:59 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single read-only volume from pods on the same node"

$ egrep "fsgroup\(1000\)" result_20220125_105520.log  |wc -l
108


=================================================

Checking the storageclass with the fix:
$ oc get sc azurefile-csi -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-01-25T12:35:28Z"
  name: azurefile-csi
  resourceVersion: "40778"
  uid: 6948ed10-fc23-465a-8f2d-9bd5dc7e60b1
mountOptions:
- dir_mode=0777
- file_mode=0777
- mfsymlinks
- cache=strict
- nosharesock
- actimeo=30
parameters:
  skuName: Standard_LRS
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate


After the fix these cases passed:

$ grep "multiVolume" result_20220125_125232.log  | grep "^pass"
passed: (48.8s) 2022-01-25T12:53:26 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on different node"
passed: (48.3s) 2022-01-25T12:53:57 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on different node"
passed: (53.1s) 2022-01-25T12:54:54 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on the same node"
passed: (51.7s) 2022-01-25T12:54:55 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single read-only volume from pods on the same node"
passed: (28.6s) 2022-01-25T12:54:55 "External Storage [Driver: file.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on the same node"

$ egrep "fsgroup\(1000\)" result_20220125_125232.log  |wc -l
0


Verified as VERIFIED.

Comment 7 errata-xmlrpc 2022-03-10 16:40:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.