Bug 2275977

Summary: Pods created with nfs pvcs which are created with nfs restricted storageclasses are stuck at 'Container Creating'
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Amrita Mahapatra <ammahapa>
Component: csi-driverAssignee: Niels de Vos <ndevos>
Status: ASSIGNED --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.16CC: brgardne, kramdoss, mrajanna, muagarwa, nberry, odf-bz-bot, rar
Target Milestone: ---Keywords: TestBlocker
Target Release: ---Flags: muagarwa: needinfo? (ammahapa)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amrita Mahapatra 2024-04-18 19:30:51 UTC
Description of problem (please be detailed as possible and provide log
snippests):
 Pods created with nfs pvcs which are created with nfs restricted storageclasses are stuck at 'Container Creating'
The error message is,

"Generated from kubelet on ip-10-0-85-151.us-east-2.compute.internal

MountVolume.SetUp failed for volume "pvc-406b305d-3731-41e1-8603-e44b03d23130" : rpc error: code = Internal desc = nfs: failed to mount "ocs-storagecluster-cephnfs-service:/0001-0011-openshift-storage-0000000000000001-0b58eb88-db27-4f3d-b8dc-3e232864a06b" to "/var/lib/kubelet/pods/58785246-977c-4746-a800-92cd2a337c84/volumes/kubernetes.io~csi/pvc-406b305d-3731-41e1-8603-e44b03d23130/mount" : mount failed: exit status 32 Mounting command: mount Mounting arguments: -t nfs ocs-storagecluster-cephnfs-service:/0001-0011-openshift-storage-0000000000000001-0b58eb88-db27-4f3d-b8dc-3e232864a06b /var/lib/kubelet/pods/58785246-977c-4746-a800-92cd2a337c84/volumes/kubernetes.io~csi/pvc-406b305d-3731-41e1-8603-e44b03d23130/mount Output: mount.nfs: mounting ocs-storagecluster-cephnfs-service:/0001-0011-openshift-storage-0000000000000001-0b58eb88-db27-4f3d-b8dc-3e232864a06b failed, reason given by server: No such file or directory stderr: ""


nfs-ganesha log:

18/04/2024 19:24:53 : epoch 66216cf8 : openshift-storage-ocs-storagecluster-cephnfs : nfs-ganesha-1[svc_95] nfs4_export_check_access :NFS4 :INFO :Access not allowed on Export_Id 1 /0001-0011-openshift-storage-0000000000000001-45e2a520-1af9-4684-90f7-21afbbb798a3 for client ::ffff:100.64.0.5


Version of all relevant components (if applicable):
OCP: 4.16.0-0.nightly-2024-04-15-184947
ODF: 4.16.0-75.stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? Yes


Is there any workaround available to the best of your knowledge? NA


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)? 3


Can this issue reproducible? Yes


Can this issue reproduce from the UI? Yes


If this is a regression, please provide more details to justify this: NA


Steps to Reproduce:
1. Create odf cluster enable nfs feature
2. Create a nfs restricted storage class with clients: <supported hosts>
3. Create a pvc with the restricted nfs sc
4. Create a pod with the nfs pvc


Actual results:
Pods created with nfs pvcs which are created with nfs restricted storageclasses are stuck at 'Container Creating'

Expected results:
Pods should be Running successfully.

Additional info:

Comment 6 Niels de Vos 2024-04-19 12:05:30 UTC
The NFS-Ganesha logs contain "Access not allowed" for "client ::ffff:100.64.0.5". This is the IPv6 notation for IPv4 address 100.64.0.5.

The workernode is expected to be able to mount the NFS-export when 100.64.0.5 is included in the "clients:" parameter of the StorageClass.

It is unclear where IPv4 100.64.0.5 comes from when the node is connecting. The nodes (and Ceph-CSI pods with host-networking) are part of a different IP-range.

The main question is, why is that IP-address used, and not the real IP-address of the workernode.

Comment 15 Niels de Vos 2024-05-07 11:37:35 UTC
Hi Blaine!

Is it possible that the Rook configured NFS-Ganesha server does not have the `HAProxy_Hosts` configuration option set? Without that option, NFS-Ganesha might not try to detect/parse the HA-proxy header, which causes the "Permission Denied" errors.

The option is documented here:
https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/doc/man/ganesha-core-config.rst#nfs_core_param-