Bug 1898917

Summary: [CephFS NFS] Unable to access to a mount point after denying an access to a share during writing action.
Product: Red Hat OpenStack Reporter: lkuchlan <lkuchlan>
Component: openstack-manilaAssignee: OpenStack Manila Bugzilla Bot <openstack-manila-bugs>
Status: CLOSED NOTABUG QA Contact: lkuchlan <lkuchlan>
Severity: medium Docs Contact: RHOS Documentation Team <rhos-docs>
Priority: medium    
Version: 16.1 (Train)CC: gouthamr, rraja, vhariria, vimartin
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-04 18:20:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description lkuchlan 2020-11-18 10:24:09 UTC
Description of problem:
After denying an access to a share during writing action.
On CephFS-NFS Ganesha we get "stale file handle" when trying
to ls to the mount point, however, on Netapp and CephFS-Native
we get "Permission denied".
So that we can access to the mounted directory we have to unmount
and remount again.

Version-Release number of selected component (if applicable):
puppet-manila-15.4.1-1.20200818131916.6c1e210.el8ost.noarch
python3-manila-tests-tempest-1.1.0-0.20200728083439.eba8fa9.el8ost.noarch
python3-manilaclient-1.29.0-0.20200310223441.1b2cafb.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Launch an instance
2. Create a share 
3. Grant RW access to the share
4. Perform ssh to instance
5. Mount the share
6. Writing to the share and during the writing operation changing the access level to "ro".
7. Perform ls to the mount point.

Actual results:
Unable to access to the mount point.
"ls: cannot access '/mnt': Stale file handle"

Expected results:
We should be able to access to the mount point and see that there is a file.

Additional info:

Adding break point right after step 6
=====================================


(Pdb) remote_client_inst.exec_command("ls -l /mnt")
*** tempest.lib.exceptions.SSHExecCommandFailed: Command 'ls -l /mnt', exit status: 2, stderr:
ls: cannot access '/mnt': Stale file handle

stdout:

(overcloud) [stack@undercloud-0 ~]$ manila list --all
+--------------------------------------+-----------------------------------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+----------------------------------+
| ID                                   | Name                              | Size | Share Proto | Status    | Is Public | Share Type Name | Host                    | Availability Zone | Project ID                       |
+--------------------------------------+-----------------------------------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+----------------------------------+
| 5960fa55-222f-4c9c-9ea6-814ef833a3b4 | tempest-manila-scenario-116671568 | 1    | NFS         | available | False     | default         | hostgroup@cephfs#cephfs | nova              | f8315334c1d1400d995b2d5f35528217 |
+--------------------------------------+-----------------------------------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+----------------------------------+
                
(overcloud) [stack@undercloud-0 ~]$ manila share-export-location-list 5960fa55-222f-4c9c-9ea6-814ef833a3b4
+--------------------------------------+--------------------------------------------------------------------+-----------+
| ID                                   | Path                                                               | Preferred |
+--------------------------------------+--------------------------------------------------------------------+-----------+
| ffb2cd84-36bf-4ea4-b833-db041fb28ce6 | 172.17.5.70:/volumes/_nogroup/185de532-998e-4ebd-93ad-e76f7d9fbf83 | False     |
+--------------------------------------+--------------------------------------------------------------------+-----------+


# Check the share is mounted:

(Pdb) remote_client_inst.exec_command("sudo mount | grep 172.17.5.70")
'172.17.5.70:/volumes/_nogroup/185de532-998e-4ebd-93ad-e76f7d9fbf83 on /mnt type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.17.5.230,local_lock=none,addr=172.17.5.70)\n'

(overcloud) [stack@undercloud-0 ~]$ manila access-list 5960fa55-222f-4c9c-9ea6-814ef833a3b4
+----+-------------+-----------+--------------+-------+------------+------------+------------+
| id | access_type | access_to | access_level | state | access_key | created_at | updated_at |
+----+-------------+-----------+--------------+-------+------------+------------+------------+
+----+-------------+-----------+--------------+-------+------------+------------+------------+

# Grand RW/RO access to the share:

(Pdb) n
> /home/stack/tempest-auto/manila-tempest-plugin/manila_tempest_tests/tests/scenario/test_share_async_actions.py(105)test_share_read_write_async()
-> locations=location, access_level='rw')

(overcloud) [stack@undercloud-0 ~]$ manila access-list 5960fa55-222f-4c9c-9ea6-814ef833a3b4
+--------------------------------------+-------------+--------------+--------------+--------+------------+----------------------------+------------+
| id                                   | access_type | access_to    | access_level | state  | access_key | created_at                 | updated_at |
+--------------------------------------+-------------+--------------+--------------+--------+------------+----------------------------+------------+
| a1d05087-8828-4217-ac34-1391fcd9c478 | ip          | 172.17.5.230 | rw           | active | None       | 2020-11-17T15:33:36.000000 | None       |
+--------------------------------------+-------------+--------------+--------------+--------+------------+----------------------------+------------+

# Try to see the file under the mount point(this is not part of the test):

(Pdb) remote_client_inst.exec_command("ls -l /mnt")
*** tempest.lib.exceptions.SSHExecCommandFailed: Command 'ls -l /mnt', exit status: 2, stderr:
ls: cannot access '/mnt': Stale file handle

stdout:

(Pdb) remote_client_inst.exec_command("sudo mount | grep 172.17.5.70")
'172.17.5.70:/volumes/_nogroup/185de532-998e-4ebd-93ad-e76f7d9fbf83 on /mnt type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.17.5.230,local_lock=none,addr=172.17.5.70)\n'

# Try to umount and remount the share:

(Pdb) remote_client_inst.exec_command("sudo umount /mnt")
''
(Pdb) remote_client_inst.exec_command("ls -l /mnt")

'total 0\n'
(Pdb) remote_client_inst.exec_command("sudo mount -t nfs 172.17.5.70:/volumes/_nogroup/185de532-998e-4ebd-93ad-e76f7d9fbf83 /mnt")

''

# After umount and remount we can see the file:

(Pdb) remote_client_inst.exec_command("ls -l /mnt")
'total 262144\n-rw-r--r--. 1 nobody nobody 268435456 Nov 17 10:09 t1\n'

(Pdb) remote_client_inst.exec_command("ls -lh /mnt")
'total 256M\n-rw-r--r--. 1 nobody nobody 256M Nov 17 10:09 t1\n'

Comment 1 lkuchlan 2020-11-19 14:47:34 UTC
It also reproduces in CephFS Native with a different ssh error message:

(overcloud) [stack@undercloud-0 ~]$ manila share-export-location-list 4bf36c00-a6f0-48b4-89fb-e656d847c4a5
+--------------------------------------+------------------------------------------------------------------------------------------------------------+-----------+
| ID                                   | Path                                                                                                       | Preferred |
+--------------------------------------+------------------------------------------------------------------------------------------------------------+-----------+
| febaaf88-98e2-43be-a286-a58abbe684f9 | 172.17.3.68:6789,172.17.3.84:6789,172.17.3.141:6789:/volumes/_nogroup/cc34857d-3d13-4871-930b-10a0ae6174c3 | False     |
+--------------------------------------+------------------------------------------------------------------------------------------------------------+-----------+

(overcloud) [stack@undercloud-0 ~]$ manila access-list 4bf36c00-a6f0-48b4-89fb-e656d847c4a5
+--------------------------------------+-------------+---------------------------------------------------------+--------------+--------+------------------------------------------+----------------------------+----------------------------+
| id                                   | access_type | access_to                                               | access_level | state  | access_key                               | created_at                 | updated_at                 |
+--------------------------------------+-------------+---------------------------------------------------------+--------------+--------+------------------------------------------+----------------------------+----------------------------+
| ecdd50f7-1fd6-499c-b732-e1360d7bbec9 | cephx       | tempest-TestShareAsyncActionsCEPHFS-cephx-id-1156475563 | ro           | active | AQDTgrZfEEBpKhAAbc6GjHpoHjSkLp6mOfccfw== | 2020-11-19T14:36:03.000000 | 2020-11-19T14:36:03.000000 |
+--------------------------------------+-------------+---------------------------------------------------------+--------------+--------+------------------------------------------+----------------------------+----------------------------+

# The share is mounted:

(Pdb) remote_client_inst.exec_command("mount | grep cc34857d-3d13-4871-930b-10a0ae6174c3")
'172.17.3.68:6789,172.17.3.84:6789,172.17.3.141:6789:/volumes/_nogroup/cc34857d-3d13-4871-930b-10a0ae6174c3 on /mnt type ceph (rw,relatime,name=tempest-TestShareAsyncActionsCEPHFS-cephx-id-241655770,secret=<hidden>,acl)\n'

# We get permission denied even though we have "RO" permmision.
# We get an access to directory after umount and remount.

Pdb) remote_client_inst.exec_command("sudo ls -l /mnt")
*** tempest.lib.exceptions.SSHExecCommandFailed: Command 'sudo ls -l /mnt', exit status: 2, stderr:
ls: cannot access '/mnt': Permission denied

Comment 4 Ram Raja 2022-01-12 22:08:09 UTC
I don't think client access level change should require remount of the client. It defeats the purpose of dynamic access level updates. Can we test this out manually? I expect the `ls` to work after a few retries or time delay. For how long was the mount unusable?

Comment 6 Ram Raja 2022-01-17 23:08:17 UTC
> 6. Writing to the share and during the writing operation changing the access level to "ro".

How do you do change the access level to 'ro'? You remove the 'rw' access rule using 'access-deny' manila API and then create a new 'ro' access rule using 'access-allow' manila API?


You don't face this issue with the CephFS native driver? You can list the contents of the share from the Ceph mount without a re-mount even after changing the access-level of the cephx ID from 'rw' to 'ro' using manila APIs?

Comment 7 lkuchlan 2022-01-18 09:20:22 UTC
(In reply to Ram Raja from comment #6)
> > 6. Writing to the share and during the writing operation changing the access level to "ro".
> 
> How do you do change the access level to 'ro'? You remove the 'rw' access
> rule using 'access-deny' manila API and then create a new 'ro' access rule
> using 'access-allow' manila API?

  Yes, I know of no other way. In my personal opinion there should be a possibility to update an
  existing access rule. It doesn't make sense to remove 'rw' acccess and then create a new one
  with 'ro'.
> 
> 
> You don't face this issue with the CephFS native driver? You can list the
> contents of the share from the Ceph mount without a re-mount even after
> changing the access-level of the cephx ID from 'rw' to 'ro' using manila
> APIs?

  I don't remember about CephFS native, but if I opened the bug specific for CephFS NFS
  so probably I didn't face this issue.
  Currently, I don't have an environment with CephFS native.
  I'll try get a free server for testing CephFS native.

Comment 8 lkuchlan 2022-01-19 12:49:04 UTC
Hi Ramana,

On CephFS native the mount operation is different. For each access there is an unique keyring,
so I guess it's required to remount the share.

Anyway, as I mentioned before, I think we should allow an update to an existing access.

Comment 9 Ram Raja 2022-01-19 18:31:41 UTC
(In reply to lkuchlan from comment #8)
> Hi Ramana,
> 
> On CephFS native the mount operation is different. For each access there is
> an unique keyring,
> so I guess it's required to remount the share.
> 
> Anyway, as I mentioned before, I think we should allow an update to an
> existing access.

Good point! Yes, in the case of CephFS native driver when we try to remove access, we call _deny_access() and internally evict the CephFS client that mounted the share https://github.com/openstack/manila/blob/stable/xena/manila/share/drivers/cephfs/driver.py#L850 . Even after the new 'ro' manila access rule is created the client will need to remount to have 'ro' access to the share.

Let's suppose there is a new user facing manila API that allows updating access rules/level without needing to remove previous access rules/level, CephFS native driver still cannot support changing the client access level without a existing client needing to remount the share. The cephX ID's  subvolume access level changes (MDS capabilities changes) will not take affect unless the client using the cephx ID is remounted. So the driver will need to evict the existing clients, update the cephX ID's capabilities, and the user's CephFS client will need to remount the share.

Comment 10 Goutham Pacha Ravi 2022-05-04 18:20:37 UTC
Thanks for all the inputs Liron and Ramana. 
It's evident that Ceph enforces an "evict" operation which causes client disconnections; and manila (and in consequence the CephFS driver) do not support online changes to the access level. Perhaps we should open an upstream RFE for this as Liron notes. 

Since this is not a "bug", we'll close this request; if upstream has a solution, we can pursue a new feature/RFE BZ. 

Thanks!