Bug 2102308

Summary: ARO 4.11 Unable to provision PVCs from AzureFile CSI StorageClasses
Product: OpenShift Container Platform Reporter: bbergen
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Kubernetes External Components QA Contact: Wei Duan <wduan>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: unspecified CC: jdobson, jsafrane
Version: 4.11   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-06 19:20:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description bbergen 2022-06-29 15:50:35 UTC
On a new install of ARO that is upgraded to 4.11.0-fc.3, both the azurefile-csi and azurefile-csi-nfs StorageClasses cannot provision new disks, and all PVCs are stuck in Pending that use these StorageClasses.

Version-Release number of selected component (if applicable): 4.11.0-fc.3

How reproducible:

Steps to Reproduce:
1. Upgrade an ARO cluster to 4.11
2. Create a PVC using the StorageClass azurefile-csi or azurefile-csi-nfs and Pod that mounts the PV created

Actual results:

1. No PV is created for the PVC
2. The azure-file-csi-driver-controller csi-provisioner logs indicate 404 errors when attempting to fetch a token to authenticate and interact with Azure

Expected results:

1. PV is created and mounted successfully into the Pod

Master Log:

Node Log (of failed Pods):

```
$ oc logs -n openshift-cluster-csi-drivers -l app=azure-file-csi-driver-controller -c csi-provisioner
...
I0629 15:40:52.803985       1 controller.go:1337] provision "default/azurefile-csi" class "azurefile-csi": started
I0629 15:40:52.804229       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"azurefile-csi", UID:"a3d6fdd0-4a7f-4d10-a2ac-6e2f686a3854", APIVersion:"v1", ResourceVersion:"663365", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/azurefile-csi"
I0629 15:40:53.095272       1 controller.go:1075] Final error received, removing PVC a3d6fdd0-4a7f-4d10-a2ac-6e2f686a3854 from claims in progress
W0629 15:40:53.095707       1 controller.go:934] Retrying syncing claim "a3d6fdd0-4a7f-4d10-a2ac-6e2f686a3854", failure 286
E0629 15:40:53.095742       1 controller.go:957] error syncing claim "a3d6fdd0-4a7f-4d10-a2ac-6e2f686a3854": failed to provision volume with StorageClass "azurefile-csi": rpc error: code = Internal desc = failed to ensure storage account: could not list storage accounts for account type Standard_LRS: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts?api-version=2021-02-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body:  Endpoint https://login.microsoftonline.com/oauth2/token
I0629 15:40:53.095441       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"azurefile-csi", UID:"a3d6fdd0-4a7f-4d10-a2ac-6e2f686a3854", APIVersion:"v1", ResourceVersion:"663365", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "azurefile-csi": rpc error: code = Internal desc = failed to ensure storage account: could not list storage accounts for account type Standard_LRS: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts?api-version=2021-02-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body:  Endpoint https://login.microsoftonline.com/oauth2/token
```

PV Dump:

N/A

PVC Dump:

```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile-csi
spec:
  storageClassName: azurefile-csi
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: azurefile-csi
spec:
  containers:
    - name: bb
      image: busybox:1.28
      volumeMounts:
      - mountPath: "/var/www/html"
        name: www
  volumes:
    - name: www
      persistentVolumeClaim:
        claimName: azurefile-csi
```

and

```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile-csi-nfs
spec:
  storageClassName: azurefile-csi-nfs
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: azurefile-csi-nfs
spec:
  containers:
    - name: bb
      image: busybox:1.28
      volumeMounts:
      - mountPath: "/var/www/html"
        name: www
  volumes:
    - name: www
      persistentVolumeClaim:
        claimName: azurefile-csi-nfs
```

StorageClass Dump (if StorageClass used by PV/PVC):

```
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-06-28T15:42:47Z"
  name: azurefile-csi
  resourceVersion: "530399"
  uid: 48d97a21-fa1a-414c-9294-0ed8f9255526
mountOptions:
- mfsymlinks
- cache=strict
- nosharesock
- actimeo=30
parameters:
  skuName: Standard_LRS
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

and

```
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-06-28T15:42:47Z"
  name: azurefile-csi-nfs
  resourceVersion: "479684"
  uid: 99893980-0683-44d0-8ab6-1b7e07e8b6ae
parameters:
  protocol: nfs
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

Additional info:

It's uncertain if this is an identical bug, but this behavior is very similar behavior to what was seen for https://bugzilla.redhat.com/show_bug.cgi?id=2095049, just with different storage classes

Comment 1 Jan Safranek 2022-06-30 08:30:18 UTC
This indeed looks like a dup of bug #2095049. Can you please test with 4.11.0-rc.0?

Comment 2 bbergen 2022-07-01 17:08:05 UTC
Thanks for taking another look. I've upgraded to 4.11.0-rc.0 and am seeing a little progress, but new hurdles too.

On 4.11.0-rc.0, all PVs for all SCs create successfully:

```
$ oc get pvc,pv
NAME                                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
persistentvolumeclaim/azurefile-csi       Bound    pvc-173f056f-815c-4d23-8366-7aa7bd0970fe   1Gi        RWO            azurefile-csi       59m
persistentvolumeclaim/azurefile-csi-nfs   Bound    pvc-8340391a-7931-46e4-ada3-48b8d2cb6e12   1Gi        RWO            azurefile-csi-nfs   59m
persistentvolumeclaim/managed-csi         Bound    pvc-01f9c544-97d6-41a9-8934-fcb2d6ff897e   1Gi        RWO            managed-csi         59m
persistentvolumeclaim/managed-premium     Bound    pvc-42c9b721-f128-431b-b3fe-ceef2a5c32ad   1Gi        RWO            managed-premium     59m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS        REASON   AGE
persistentvolume/pvc-01f9c544-97d6-41a9-8934-fcb2d6ff897e   1Gi        RWO            Delete           Bound    default/managed-csi         managed-csi                  59m
persistentvolume/pvc-173f056f-815c-4d23-8366-7aa7bd0970fe   1Gi        RWO            Delete           Bound    default/azurefile-csi       azurefile-csi                59m
persistentvolume/pvc-42c9b721-f128-431b-b3fe-ceef2a5c32ad   1Gi        RWO            Delete           Bound    default/managed-premium     managed-premium              59m
persistentvolume/pvc-8340391a-7931-46e4-ada3-48b8d2cb6e12   1Gi        RWO            Delete           Bound    default/azurefile-csi-nfs   azurefile-csi-nfs            59m
```

However, only the Azure Disk PVs can be mounted to Pods (as specified in the original ticket above):

```
$ oc get po
NAME                    READY   STATUS              RESTARTS   AGE
pod/azurefile-csi	0/1     ContainerCreating   0          51s
pod/azurefile-csi-nfs   0/1     ContainerCreating   0          51s
pod/managed-csi         1/1     Running             0          51s
pod/managed-premium     1/1     Running             0          51s
```

When I investigated the Nodes these Pods were on, I found...

"File Not Found" errors for the Azure File CSI PV being mounted:

```
Jul 01 16:50:00 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: E0701 16:50:00.240380    1839 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^aro-bbergen411gap2#clusteryjv8c#pvc-78b603c9-4c76-4dbc-a797-ef37f6183f66# podName: nodeName:}" failed. No retries permitted until 2022-07-01 16:52:02.240351151 +0000 UTC m=+77323.456591930 (durationBeforeRetry 2m2s). Error:MountVolume.MountDevice failed for volume "pvc-78b603c9-4c76-4dbc-a797-ef37f6183f66" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^aro-bbergen-411gap2#clusteryjv8c#pvc-78b603c9-4c76-4dbc-a797-ef37f6183f66#")pod "azurefile-csi" (UID: "28169470-4a24-43af-825d-3eb6801d7120") : rpc error: code = Internal desc = volume(aro-bbergen-411gap2#clusteryjv8c#pvc-78b603c9-4c76-4dbc-a797-ef37f6183f66#) mount "//clusteryjv8c.file.ore.windows.net/pvc-78b603c9-4c76-4dbc-a797-ef37f6183f66" on "/var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/17a990b2bf5b65f8076c450c2aaf882acaac5219e662d2ce174c97642ee3a9e1/globalmount" failed wit mount failed: exit status 32
Jul 01 16:50:00 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Mounting command: mount
Jul 01 16:50:00 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Mounting arguments: -t cifs -o mfsymlinks,cache=strict,nosharesock,actimeo=30,file_mode=0777,dir_mode=0777,<masked> //clusteryjv8c.file.cre.windows.net/pvc-78b603c9-4c76-4dbc-a797-ef37f6183f66 /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/17a990b2bf5b65f8076c450c2aaf882acaac5219e662d2ce174c97642ee3a9e1/globalmount
Jul 01 16:50:00 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Output: mount error(2): No such file or directory
Jul 01 16:50:00 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
```

And "Unhandled error -121" on the Azure File CSI NFS PV:

```
Jul 01 16:50:26 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: I0701 16:50:26.687270    1839 reconciler.go:245] "operationExecutor.MountVolume started for volume \"pvc-aa13c116-9cb8-48cb-80a2-03d4e6141a0\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^aro-bbergen-411gap2#f2d601f249cdb49c58027b7#pvcn-aa13c116-9cb8-48cb-80a2-03d4e61491a0#\") pod \"azurefile-csi-nfs\" (UID: \"b79b3e49-ff28-4013-8546-90fad457698\") " pod="default/azurefile-csi-nfs"
Jul 01 16:50:26 bbergen-411gap2-j86fl-worker-eastus3-zjk45 kernel: NFS: nfs4_discover_server_trunking unhandled error -121. Exiting with error EIO
Jul 01 16:50:26 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: E0701 16:50:26.745954    1839 csi_attacher.go:344] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = volue(aro-bbergen-411gap2#f2d601f249cdb49c58027b7#pvcn-aa13c116-9cb8-48cb-80a2-03d4e61491a0#) mount "f2d601f249cdb49c58027b7.file.core.windows.net:/f2d601f249cdb49c58027b7/pvcn-aa13c116-9cb8-48cb-80a2-03d4e61491a0" o "/var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/e02d15467d4a5d33d5790a033242212556da909ce977cb75ebd101cb47ff1122/globalmount" failed with mount failed: exit status 32
Jul 01 16:50:26 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Mounting command: mount
Jul 01 16:50:26 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Mounting arguments: -t nfs -o vers=4,minorversion=1,sec=sys f2d601f249cdb49c58027b7.file.core.windows.net:/f2d601f249cdb49c58027b7/pvcn-a13c116-9cb8-48cb-80a2-03d4e61491a0 /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/e02d15467d4a5d33d5790a033242212556da909ce977cb75ebd101cb47ff1122/globalmount
Jul 01 16:50:26 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Output: mount.nfs: mount system call failed
n-411gap2#f2d601f249cdb49c58027b7#pvcn-aa13c116-9cb8-48cb-80a2-03d4e61491a0# podName: nodeName:}" failed. No retries permitted until 2022-07-01 16:52:28.746161516 +0000 UTC m=+77349.962402195 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-aa13c116-9cb8-48cb-80a2-03d4e61491a0" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^aro-bbergen-411gap2#f2d601f249cdb49c58027b7#pvcn-aa13c116-9cb8cb-80a2-03d4e61491a0#") pod "azurefile-csi-nfs" (UID: "b79b3e49-ff28-4013-8546-90fad4057698") : rpc error: code = Internal desc = volume(aro-bbergen-411gap2#f2d601f249cdb49c58027b7#pvcn-aa13c116-9cb8-48cb-80a2-0
d4e61491a0#) mount "f2d601f249cdb49c58027b7.file.core.windows.net:/f2d601f249cdb49c58027b7/pvcn-aa13c116-9cb8-48cb-80a2-03d4e61491a0" on "/var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/e02d15467d4a
d33d5790a033242212556da909ce977cb75ebd101cb47ff1122/globalmount" failed with mount failed: exit status 32
```

When I updated the mount path in the Pod spec to a directory I knew exists in the container filesystem (/tmp), I got Azure File CSI NFS to mount successfully, but this still fails for Azure File CSI:

```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile-csi
spec:
  storageClassName: azurefile-csi
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: azurefile-csi
spec:
  containers:
    - name: bb
      image: busybox:1.28
      command: ["sleep"]
      args: ["5000"]
      volumeMounts:
        - mountPath: "/tmp"
          name: www
        # - mountPath: "/var/www/html"
        #   name: www
  volumes:
    - name: www
      persistentVolumeClaim:
        claimName: azurefile-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile-csi-nfs
spec:
  storageClassName: azurefile-csi-nfs
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: azurefile-csi-nfs
spec:
  containers:
    - name: bb
      image: busybox:1.28
      command: ["sleep"]
      args: ["5000"]
      volumeMounts:
        - mountPath: "/tmp"
          name: www
        # - mountPath: "/var/www/html"
        #   name: www
  volumes:
    - name: www
      persistentVolumeClaim:
        claimName: azurefile-csi-nfs
```

```
NAME                    READY   STATUS              RESTARTS   AGE
pod/azurefile-csi       0/1     ContainerCreating   0          41s
pod/azurefile-csi-nfs   1/1     Running             0          41s
pod/managed-csi         1/1     Running             0          40s
pod/managed-premium     1/1     Running             0          41s

NAME                                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
persistentvolumeclaim/azurefile-csi	  Bound    pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d   1Gi 	 RWO            azurefile-csi       42s
persistentvolumeclaim/azurefile-csi-nfs   Bound    pvc-af7b6e95-da56-4c60-ac3a-910b9489ba34   1Gi        RWO            azurefile-csi-nfs   42s
persistentvolumeclaim/managed-csi         Bound    pvc-21bd0a40-b4fc-43e7-a86a-1955073a48ea   1Gi 	 RWO            managed-csi         42s
persistentvolumeclaim/managed-premium     Bound    pvc-1d923352-9422-4ff3-9c15-ff651029b372   1Gi 	 RWO            managed-premium     42s

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS        REASON   AGE
persistentvolume/pvc-1d923352-9422-4ff3-9c15-ff651029b372   1Gi        RWO            Delete           Bound    default/managed-premium     managed-premium              40s
persistentvolume/pvc-21bd0a40-b4fc-43e7-a86a-1955073a48ea   1Gi        RWO            Delete           Bound    default/managed-csi	    managed-csi                  39s
persistentvolume/pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d   1Gi        RWO            Delete           Bound    default/azurefile-csi       azurefile-csi                41s
persistentvolume/pvc-af7b6e95-da56-4c60-ac3a-910b9489ba34   1Gi        RWO            Delete           Bound    default/azurefile-csi-nfs   azurefile-csi-nfs            22s
```

Node level logs for Azure File CSI now show authentication errors:

```
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: I0701 17:00:10.682777    1839 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pvc-7392617a-4b7
b-4601-a891-f6e9a01f1d8d\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^aro-bbergen-411gap2#clusteryjv8c#pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d#\") pod \"azurefile-csi\" (UID: \"c42c2967-7101-4363-b35c-c
e53b9b00447\") " pod="default/azurefile-csi"
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: I0701 17:00:10.682854    1839 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-
r5s5l\" (UniqueName: \"kubernetes.io/projected/c42c2967-7101-4363-b35c-ce53b9b00447-kube-api-access-r5s5l\") pod \"azurefile-csi\" (UID: \"c42c2967-7101-4363-b35c-ce53b9b00447\") " pod="default/azurefile-csi"
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: I0701 17:00:10.783847    1839 reconciler.go:245] "operationExecutor.MountVolume started for volume \"kube-api-access-r5s5l\" (UniqueName
:
 \"kubernetes.io/projected/c42c2967-7101-4363-b35c-ce53b9b00447-kube-api-access-r5s5l\") pod \"azurefile-csi\" (UID: \"c42c2967-7101-4363-b35c-ce53b9b00447\") " pod="default/azurefile-csi"
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: I0701 17:00:10.783934    1839 reconciler.go:245] "operationExecutor.MountVolume started for volume \"pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^aro-bbergen-411gap2#clusteryjv8c#pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d#\") pod \"azurefile-csi\" (UID: \"c42c2967-7101-4363-b35c-ce53b9b00447\") " pod="default/azurefile-csi"
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: I0701 17:00:10.812007    1839 operation_generator.go:703] "MountVolume.SetUp succeeded for volume \"kube-api-access-r5s5l\" (UniqueName: \"kubernetes.io/projected/c42c2967-7101-4363-b35c-ce53b9b00447-kube-api-access-r5s5l\") pod \"azurefile-csi\" (UID: \"c42c2967-7101-4363-b35c-ce53b9b00447\") " pod="default/azurefile-csi"
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 kernel: CIFS: Attempting to mount \\clusteryjv8c.file.core.windows.net\pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 kernel: CIFS: VFS: Could not allocate crypto hmac(md5)
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 kernel: CIFS: VFS: Error -2 during NTLMSSP authentication
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 kernel: CIFS: VFS: \\clusteryjv8c.file.core.windows.net Send error in SessSetup = -2
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 kernel: CIFS: VFS: cifs_mount failed w/return code = -2
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: E0701 17:00:10.991557    1839 csi_attacher.go:344] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = volume(aro-bbergen-411gap2#clusteryjv8c#pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d#) mount "//clusteryjv8c.file.core.windows.net/pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d" on "/var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/b0fcc3716650847aa8657e579934ff6484704512e95c6a745be9518dadf84226/globalmount" failed with mount failed: exit status 32
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Mounting command: mount
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Mounting arguments: -t cifs -o mfsymlinks,cache=strict,nosharesock,actimeo=30,file_mode=0777,dir_mode=0777,<masked> //clusteryjv8c.file.core.windows.net/pvc-7392617a-4b7b-4601-a891-f6e9a01f1d8d /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/b0fcc3716650847aa8657e579934ff6484704512e95c6a745be9518dadf84226/globalmount
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Output: mount error(2): No such file or directory
Jul 01 17:00:10 bbergen-411gap2-j86fl-worker-eastus3-zjk45 hyperkube[1839]: Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
```

SUMMARY:

PVs now create, but it looks like there are a few bugs when mounting Azure File CSI PVs:

1. Neither Azure File CSI driver create the directory in the container filesystem if it does not already exist before attempting to mount (I feel like this is a standard, no?)
2. Azure File CSI is unable to mount volumes due to authentication issues

Comment 3 Fabio Bertinatto 2022-07-05 22:51:18 UTC
According to our documentation [0][1], Azure File does not work with FIPS mode.

Also, it seems like some time ago FIPS mode was disabled in openshift-azure [2][3] due to this reason. However, I can see that this particular cluster has it enabled:

$ oc debug node/bbergen-411gap2-j86fl-ephemos-eastus1-rvl95
$ sysctl crypto.fips_enabled
crypto.fips_enabled = 1

@Brendan, is this specific to this cluster or every ARO cluster has FIPS enabled? We tested Azure File CSI driver in bug #2095049 and it worked OK, so I suspect it's the former.

[0] https://access.redhat.com/solutions/256053
[1] https://docs.openshift.com/container-platform/4.10/installing/installing-fips.html#installing-fips-mode_installing-fips
[2] https://github.com/openshift/openshift-azure/pull/1773
[3] https://github.com/openshift/openshift-azure/issues/1772

Comment 4 Fabio Bertinatto 2022-07-06 12:43:25 UTC
> CIFS: VFS: Could not allocate crypto hmac(md5)

MD5 is not FIPS-compliant, that's the reason CIFS doesn't work here.

Microsoft documents [1] this limitation and suggests to either schedule pods on non-FIPS nodes or use NFS.

I tested with the following objects and it worked:

$ cat fips.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azurefile-sc-fips
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
  skuName: Premium_LRS
  protocol: nfs

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile-pvc-fips
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile-sc-fips
  resources:
    requests:
      storage: 100Gi

---

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: fedora
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: azurefile-pvc-fips

$ oc apply -f fips.yaml

storageclass.storage.k8s.io/azurefile-sc-fips created
persistentvolumeclaim/azurefile-pvc-fips created

$ oc get pod/app
NAME   READY   STATUS    RESTARTS   AGE
app    1/1     Running   0          61s

[1] https://docs.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/fail-to-mount-azure-file-share#fipsnodepool

Comment 5 bbergen 2022-07-06 19:10:57 UTC
Great catch! I didn't realize that our dev workflow had me creating FIPS compliant clusters... I recreated, and all PVs are mounting as expected!

NAME                    READY   STATUS    RESTARTS   AGE
pod/azurefile-csi       1/1     Running   0          2m53s
pod/azurefile-csi-nfs   1/1     Running   0          2m53s
pod/managed-csi         1/1     Running   0          2m53s
pod/managed-premium     1/1     Running   0          2m53s

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS        REASON   AGE
persistentvolume/pvc-48da782a-dc56-410c-843c-45cedf008d73   1Gi        RWO            Delete           Bound    default/managed-csi         managed-csi                  2m50s
persistentvolume/pvc-80fa0d7c-ca05-4c81-a261-f75c58535cad   1Gi        RWO            Delete           Bound    default/managed-premium     managed-premium              2m50s
persistentvolume/pvc-848783cf-7a9a-4f32-a6c4-1e7a761e91ec   1Gi        RWO            Delete           Bound    default/azurefile-csi       azurefile-csi                2m52s
persistentvolume/pvc-cb21e791-e1c4-4f3d-b410-bbb71e5b6993   1Gi        RWO            Delete           Bound    default/azurefile-csi-nfs   azurefile-csi-nfs            2m33s

NAME                                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
persistentvolumeclaim/azurefile-csi       Bound    pvc-848783cf-7a9a-4f32-a6c4-1e7a761e91ec   1Gi        RWO            azurefile-csi       2m53s
persistentvolumeclaim/azurefile-csi-nfs   Bound    pvc-cb21e791-e1c4-4f3d-b410-bbb71e5b6993   1Gi        RWO            azurefile-csi-nfs   2m53s
persistentvolumeclaim/managed-csi         Bound    pvc-48da782a-dc56-410c-843c-45cedf008d73   1Gi        RWO            managed-csi         2m53s
persistentvolumeclaim/managed-premium     Bound    pvc-80fa0d7c-ca05-4c81-a261-f75c58535cad   1Gi        RWO            managed-premium     2m54s

Please close this ticket. Not a bug :)