Bug 2118263

Summary: NFS client unable to see newly created files when listing directory contents in a FS subvolume clone
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: lkuchlan <lkuchlan>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.2CC: ceph-eng-bugs, cephqe-warriors, gfarnum, gouthamr, hyelloji, jdurgin, lhh, mhicks, pasik, tserlin, vdas, vhariria, vshankar
Target Milestone: ---   
Target Release: 5.3z1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.10-122.el8cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2135573 (view as bug list) Environment:
Last Closed: 2023-02-28 10:05:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2135573    

Description lkuchlan 2022-08-15 09:25:47 UTC
Description of problem:
When trying to create a file in attached manila share that was 
created from snapshot, the file is created but it's not displayed 
or saved or flushed to disk.

Version-Release number of selected component (if applicable):
cephadm-16.2.8-79.el9cp.noarch

How reproducible:
100%

Steps to Reproduce:
1. Launch an instance
2. Create share
3. Allow access
4. Create snapshot from the share
5. Create share from the snapshot
6. Allow access
6. Perform ssh to instance
7. Mount share
8. Try to create a file by touch command

[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ mount | grep 172.17.5.118
172.17.5.118:/volumes/_nogroup/ee22821d-217f-4448-9c0d-7affd8016c83/bbc37679-dbc0-415a-a94c-dc1b10303ec6 on /mnt/parent type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.17.5.193,local_lock=none,addr=172.17.5.118)
172.17.5.118:/volumes/_nogroup/55e5ae8b-8a0a-41b5-a6de-621d553a295f/5dde0706-e0e9-4067-a4c8-967e9520f617 on /mnt/child type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.17.5.193,local_lock=none,addr=172.17.5.118)
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ ll
total 0
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ sudo touch file1
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ ll
total 0
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ sudo touch file2
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ ll
total 0
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ sudo touch file3
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ ll
total 0

9. Create file by vi command

[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ sudo vi file4
[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ ll
total 0
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:19 file1
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:19 file2
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:19 file3
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:20 file4

10. Now try again to create a file by touch command

[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ sudo touch file5

All created files are displayed

[manila@tempest-testsharebasicopsnfs-server-880731321 child]$ ll
total 0
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:19 file1
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:19 file2
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:19 file3
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:20 file4
-rw-r--r--. 1 nobody nobody 0 Aug 15 05:20 file5


Actual results:
The file is't displayed

Expected results:
The file should be displayed

Comment 1 RHEL Program Management 2022-08-15 16:51:59 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Goutham Pacha Ravi 2022-08-15 20:33:05 UTC
Ganesha Configuration:

# Ansible managed


NFS_Core_Param
{
       Bind_Addr=172.17.5.111;
}

EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}

CACHEINODE {
        Dir_Chunk = 0;

        NParts = 1;
        Cache_Size = 1;
}

RADOS_URLS {
   ceph_conf = '/etc/ceph/ceph.conf';
   userid = "manila";
}
%url rados://manila_data/ganesha-export-index

NFSv4 {
        RecoveryBackend = 'rados_kv';
        IdmapConf = "/etc/ganesha/idmap.conf";
}
RADOS_KV {
        ceph_conf = '/etc/ceph/ceph.conf';
        userid = "manila";
        pool = "manila_data";
}


LOG {
        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha.log";
                enable = active;
        }
}

Comment 3 vhariria 2022-08-16 18:08:00 UTC
The same scenario test passed in a manila-native-ceph environment.
Compose tested with RHOS-17.0-RHEL-9-20220811.

Comment 5 Ram Raja 2022-08-17 19:24:21 UTC
I have a few questions on https://bugzilla.redhat.com/show_bug.cgi?id=2118263#c0


Were there contents in the snapshot before a share was created from it? The output of the first `ll` within the share is empty. 


Are you able to see the snapshotted contents from within the share that was created from the snapshot?


When using a regularCephFS NFS share, do you observe the same issue of files just created not being listed sometimes by a NFS client?  It's not obvious to me why a share created from a snapshot (a new CephFS subvolume) would behave differently from a regular share (another CephFS subvolume).


Is this a regression? Was this test performed on earlier releases?

Comment 9 Ram Raja 2022-08-19 22:46:30 UTC
Created tracker ticket https://tracker.ceph.com/issues/57210 with reproducer steps on a Ceph dev vstart cluster . I can observe this issue in ceph main branch as well.

Comment 10 Ram Raja 2022-08-24 23:54:08 UTC
Goutham, did this test "test_write_data_to_share_created_from_snapshot"  ever pass in the upstream CI for the CephFS NFS driver? 

In my testing, I found that directly listing the newly created file in a cloned subvolume (share created from snapshot) works. See the description in https://tracker.ceph.com/issues/57210

To unblock your scenario testing upstream, you can disable the test  "test_write_data_to_share_created_from_snapshot" . And also copy over the test, modify it to check for the existence/absence of file in a subvolume clone by listing the file directly, instead of checking the contents of parent directory's listing. What do you think?

Once we root cause and fix the issue, we can re-enable the failed test.

Comment 11 Goutham Pacha Ravi 2022-08-25 23:45:20 UTC
(In reply to Ram Raja from comment #10)
> Goutham, did this test "test_write_data_to_share_created_from_snapshot" 
> ever pass in the upstream CI for the CephFS NFS driver? 
> 
> In my testing, I found that directly listing the newly created file in a
> cloned subvolume (share created from snapshot) works. See the description in
> https://tracker.ceph.com/issues/57210
> 
> To unblock your scenario testing upstream, you can disable the test 
> "test_write_data_to_share_created_from_snapshot" . And also copy over the
> test, modify it to check for the existence/absence of file in a subvolume
> clone by listing the file directly, instead of checking the contents of
> parent directory's listing. What do you think?
> 
> Once we root cause and fix the issue, we can re-enable the failed test.

Hey Ramana, 

Yes it did. We first enabled this test with CephFS-NFS and Ceph Nautilus with OpenStack Wallaby: https://review.opendev.org/c/openstack/manila-tempest-plugin/+/778188
We had this passing until very recent versions including with Ceph Octopus and Pacific. It feels like the breakage was recent. We can sure skip the test for now while we wait for this issue to be root-caused. Thanks Ramana!

Comment 14 Ram Raja 2022-10-18 02:44:10 UTC
The fix is merged in Ceph main branch.

For now, I've created a upstream pacific (16.2.x) backport PR, https://github.com/ceph/ceph/pull/48521 , for the pacific backport tracker ticket, https://tracker.ceph.com/issues/57880

Comment 15 Ram Raja 2022-10-21 01:22:37 UTC
Hemanth, I'm copying over the steps  from https://tracker.ceph.com/issues/57210 I used to reproduce this issue in a Ceph cluster without needing OpenStack manila . I used a vstart cluster, but the steps should be the same in a QE test cluster
```
$ ./bin/ceph fs volume create a
$ ./bin/ceph fs subvolume create a subvol01
$ ./bin/ceph fs subvolume getpath a subvol01

$ ./bin/ceph nfs cluster create nfs-ganesha
$ ./bin/ceph nfs export create cephfs nfs-ganesha /cephfs3 a `./bin/ceph fs subvolume getpath a subvol01`
$ sudo mount.nfs4 127.0.0.1:/cephfs3 /mnt/nfs1/
$ pushd /mnt/nfs1/
$ sudo touch file00
$ # can see newly created file when listing directory contents
$ ls
file00
$ popd

$ ./bin/ceph fs subvolume snapshot create a subvol01 snap01
$ ./bin/ceph fs subvolume snapshot clone a subvol01 snap01 clone01
$ ./bin/ceph nfs export create cephfs nfs-ganesha /cephfs4 a `./bin/ceph fs subvolume getpath a clone01`
$ sudo mount.nfs4 127.0.0.1:/cephfs4 /mnt/nfs2/
$ pushd /mnt/nfs2/
$ ls
file00
$ sudo touch file01
$ # can see cloned 'file00' but cannot see the newly created file 'file01' when reading the directory contents within the clone
$ ls
file00
```

With this fix, you should be able to see the newly created 'file01' too in the FS subvolume clone when listing using the NFS client.

Comment 26 errata-xmlrpc 2023-02-28 10:05:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 5.3 Bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0980