Bug 1975689

Summary: Listing of snapshots are not always successful on nfs exports
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hemanth Kumar <hyelloji>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact: Akash Raj <akraj>
Priority: high    
Version: 5.0CC: akraj, ceph-eng-bugs, gfarnum, kdreyer, kkeithle, mbenjamin, pdonnell, vshankar
Target Milestone: ---   
Target Release: 6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-17.2.6-9.el9cp Doc Type: Bug Fix
Doc Text:
._mtime_ and _change_attr_ are now updated for snapshot directory when snapshots are created Previously, `libcephfs` clients would not update _mtime_, and would change the attribute when snaps were created or deleted. Due to this, NFS clients could not list CephFS snapshots within a CephFS NFS-Ganesha export correctly. With this fix, _mtime_ and _change_attr_ are updated for the snapshot directory, `.snap`, when snapshots are created, deleted, and renamed. Correct _mtime_ and _change_attr_ ensure that listing snapshots do not return stale snapshot entries.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-15 09:15:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2192813    

Comment 2 Patrick Donnelly 2021-06-25 15:24:53 UTC
> We do not have support of creating snapshots on NFS mounts (BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1974416), due to this the replication of NFS exports to the remote cluster is also not possible.
>
>I feel the support of replication should be enabled on NFS exports too.

It's possible to snapshot the NFS shares but not via the NFS client. You could setup a snapshot schedule instead using the "snap_schedule" module.

Comment 11 Varsha 2021-07-08 05:48:49 UTC
Hi Hemanth,

I don't see snaps created even with ceph-fuse and kernel.

# ceph nfs export create cephfs a vstart /cephfs 
{
    "bind": "/cephfs",
    "fs": "a",
    "path": "/",
    "cluster": "vstart",
    "mode": "RW"
}

# ceph fs snap-schedule add --path=/ 1h fs=a
Schedule set for path /

# date
Thu Jul  8 10:48:43 AM IST 2021
# ./bin/ceph fs snap-schedule status /
{"fs": "a", "subvol": null, "path": "/", "rel_path": "/", "schedule": "1h", "retention": {}, "start": "2021-07-08T00:00:00", "created": "2021-07-08T05:18:38", "first": null, "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, "active": true}

# mount -t nfs -o port=2049 192.168.0.138:/cephfs /mnt
# ls /mnt
testfile  testfile2

# date
Thu Jul  8 11:00:13 AM IST 2021
# ls -a /mnt/.snap
.  ..
# ./bin/ceph fs snap-schedule status /
{"fs": "a", "subvol": null, "path": "/", "rel_path": "/", "schedule": "1h", "retention": {}, "start": "2021-07-08T00:00:00", "created": "2021-07-08T05:18:38", "first": null, "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, "active": true}

# umount /mnt

# ./bin/ceph-fuse -m 192.168.0.138:40705 /mnt
ceph-fuse[62657]: starting ceph client
ceph-fuse[62657]: starting fuse
[root@localhost build]# ls /mnt
testfile  testfile2
[root@localhost build]# ls /mnt/.snap

# mount -t ceph 192.168.0.138:40705:/ /mnt -o name=admin,secret=AQDpieZg6KUiIxAAFqgwdEknjmA1/p7XXWSPDw==
]# ls /mnt
testfile  testfile2
# ls  -a /mnt/.snap
.  ..

Comment 14 Varsha 2021-07-12 11:30:35 UTC
I can also see the snapshots now. The steps are similar to Hemanth's. I was able to see them after one hour. The following line in doc[1] caused confusion.
` So when a snapshot schedule with repeat interval 1h is added at 13:50 with the default start time, the first snapshot will be taken at 14:00.` 

Also this throws error
[root@localhost build]# ./bin/ceph config set mgr allow_m_granularity true
Error EINVAL: unrecognized config option 'allow_m_granularity'

[1] https://docs.ceph.com/en/latest/cephfs/snap-schedule/#usage

Comment 15 Patrick Donnelly 2021-07-12 14:38:08 UTC
(In reply to Varsha from comment #14)
> Also this throws error
> [root@localhost build]# ./bin/ceph config set mgr allow_m_granularity true
> Error EINVAL: unrecognized config option 'allow_m_granularity'

Should be:

> ceph config set mgr mgr/snap_schedule/allow_m_granularity true

Comment 19 Jeff Layton 2021-07-14 17:05:49 UTC
Not sure why we're keeping all of these comments private. I think this is just a case where the metadata (in particular, the mtime and change attribute) on the .snap directory isn't changing. The client did a full readdir earlier and since nothing appears to have changed in the directory inode metadata, it's satisfying readdir out of the cache.

This probably requires a fix in libcephfs to ensure that the mtime and change attribute get bumped when a snap is created or deleted.

Comment 20 Patrick Donnelly 2021-07-14 23:52:13 UTC
I thought we specifically use configs in nfs-ganesha to prevent using its dcache?

Comment 21 Jeff Layton 2021-07-14 23:57:51 UTC
We do. This is the NFS client's caching. The cache is invalidated if the metadata changes, but otherwise it's trusted.

The bottom line is that we're adding to and removing dentries from the .snap/ directory without ensuring that the mtime and change attribute are updated. If we fix the client to update the mtime and change attr when snaps are created and deleted, this should work as expected.

Note that I see the same lack of change to the metadata with ceph-fuse too, it's just that it never caches the .snap directory contents because .snap directories are "special".

This should probably be transitioned to a libcephfs bug, IMO.

Comment 22 Patrick Donnelly 2021-07-15 00:15:44 UTC
Very well, thanks for explaining Jeff.

Comment 24 Patrick Donnelly 2021-09-16 14:10:16 UTC
Ramana, can you look into this?

Comment 32 Ram Raja 2022-08-25 00:55:08 UTC
(In reply to Patrick Donnelly from comment #20)
> I thought we specifically use configs in nfs-ganesha to prevent using its
> dcache?

We don't by default disable dirent and attribute caching in cephadm deployed NFS-Ganesha. See, https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/templates/services/nfs/ganesha.conf.j2

In ansible managed ganesha.conf that we use for OpenStack Manila, we've the following settings to disabling attribute and dentry caching in NFS-Ganesha,

``` 
EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}

CACHEINODE {
        Dir_Chunk = 0;
        NParts = 1;
        Cache_Size = 1;
}
```

For Ceph FSAL, we recommend turning off attribute and dentry caching.
See,
https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf#L58
and
https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf#L97


I don't see the ganesha.conf used for this setup in the BZ. We could have checked whether QE turned off the caching using custom configuration.


I also found this old BZ, https://bugzilla.redhat.com/show_bug.cgi?id=1429347 , where listing CephFS snapshots didn't work unless dirent caching was disabled in NFS-Ganesha.

Comment 33 Ram Raja 2022-08-25 17:17:28 UTC
(In reply to Ram Raja from comment #32)
> (In reply to Patrick Donnelly from comment #20)
> > I thought we specifically use configs in nfs-ganesha to prevent using its
> > dcache?
> 
> We don't by default disable dirent and attribute caching in cephadm deployed
> NFS-Ganesha. See,
> https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/templates/
> services/nfs/ganesha.conf.j2
> 
> In ansible managed ganesha.conf that we use for OpenStack Manila, we've the
> following settings to disabling attribute and dentry caching in NFS-Ganesha,
> 
> ``` 
> EXPORT_DEFAULTS {
>         Attr_Expiration_Time = 0;
> }
> 
> CACHEINODE {
>         Dir_Chunk = 0;
>         NParts = 1;
>         Cache_Size = 1;
> }
> ```
> 
> For Ceph FSAL, we recommend turning off attribute and dentry caching.
> See,
> https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.
> conf#L58
> and
> https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.
> conf#L97
> 
> 
> I don't see the ganesha.conf used for this setup in the BZ. We could have
> checked whether QE turned off the caching using custom configuration.
> 
> 
> I also found this old BZ,
> https://bugzilla.redhat.com/show_bug.cgi?id=1429347 , where listing CephFS
> snapshots didn't work unless dirent caching was disabled in NFS-Ganesha.

So attribute caching is disabled at export level instead of at ganesha instance/cluster level by mgr/nfs module.

https://github.com/ceph/ceph/pull/41574/commits/283a8c184e748a0af5a5deedc2d80b855d747924

Example of a CephFS NFS export stored in a RADOS object by mgr/nfs module with 'attr_expiration_time = 0' config setting

$ ./bin/rados -p .nfs -N nfs-ganesha get export-1 - | grep 'attr_expiration_time'

    attr_expiration_time = 0;


Dirent caching was re-enabled since RGW FSAL relies on NFS-Ganesha's dirent caching for RGW directory listing.
See, https://www.spinics.net/lists/dev-ceph/msg03232.html
https://github.com/ceph/ceph/pull/41574/commits/9d114e910e953c0adf4ba5d31ab1d17bdfa1f638


We've tested CephFS/NFS-Ganesha only with dirent caching *disabled*. I don't think we investigated the effects of enabling NFS-Ganesha dirent caching on CephFS. This BZ could be one of the side-effects of enabling Ganesha dirent caching.
See https://www.spinics.net/lists/dev-ceph/msg03233.html

It doesn't look like dirent caching can be disabled per export. This means that a NFS Ganesha instance/cluster with dirent caching disabled can only be exclusively used for CephFS exports.

Comment 34 Ram Raja 2022-08-26 16:39:24 UTC
(In reply to Patrick Donnelly from comment #20)
> I thought we specifically use configs in nfs-ganesha to prevent using its
> dcache?

(In reply to Ram Raja from comment #32)
> (In reply to Patrick Donnelly from comment #20)
> > I thought we specifically use configs in nfs-ganesha to prevent using its
> > dcache?
> 
> We don't by default disable dirent and attribute caching in cephadm deployed
> NFS-Ganesha. See,
> https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/templates/
> services/nfs/ganesha.conf.j2

Looks like ganesha dirent caching was renabled in pacific only in Sept 2021 [1] and available in release 16.2.7 (release Dec 2021). Before that as Jeff mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1975689#c21,  ganesha dirent caching was disabled since 16.1.0 [2]. Most likely QE tested this BZ with ganesha dirent caching disabled in July 2021.

[1] https://github.com/ceph/ceph/pull/43075/commits/db053d0056d231a7dd430c6cdb335f95aa5d50e7
[2] https://github.com/ceph/ceph/pull/34382/commits/25f4dedd3e75c81b19911fd33b171e613ab1c559
> 
> In ansible managed ganesha.conf that we use for OpenStack Manila, we've the
> following settings to disabling attribute and dentry caching in NFS-Ganesha,
> 
> ``` 
> EXPORT_DEFAULTS {
>         Attr_Expiration_Time = 0;
> }
> 
> CACHEINODE {
>         Dir_Chunk = 0;
>         NParts = 1;
>         Cache_Size = 1;
> }
> ```
> 
> For Ceph FSAL, we recommend turning off attribute and dentry caching.
> See,
> https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.
> conf#L58
> and
> https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.
> conf#L97
> 
> 
> I don't see the ganesha.conf used for this setup in the BZ. We could have
> checked whether QE turned off the caching using custom configuration.
> 
> 
> I also found this old BZ,
> https://bugzilla.redhat.com/show_bug.cgi?id=1429347 , where listing CephFS
> snapshots didn't work unless dirent caching was disabled in NFS-Ganesha.

Comment 46 Venky Shankar 2023-03-30 13:46:35 UTC
https://github.com/ceph/ceph/pull/50730

Comment 51 Hemanth Kumar 2023-04-18 11:51:50 UTC
All the snapshots are listed under an NFS mount inside a snap dir.

[root@magna049 .snap]# ls -ltr
total 25
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_43_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_42_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_41_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_40_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_39_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_38_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_37_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_36_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_35_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_34_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_33_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_32_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_31_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_30_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_29_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_28_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_27_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_26_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_25_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_24_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_23_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_22_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_21_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_20_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_19_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_18_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_17_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_16_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_15_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_14_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_13_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_12_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_11_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_10_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_09_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_08_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_07_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_06_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_05_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_04_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_03_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_02_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_01_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-11_00_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-10_59_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-10_58_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-10_57_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-10_56_01_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-10_55_00_UTC
drwxr-xr-x. 3 root root 124 Apr 18 10:39 scheduled-2023-04-18-10_54_00_UTC
[root@magna049 .snap]#

------------------------------------------------------------------------------------

But, I am unable to list and get the status of the snapshot schedules using "ceph fs snap-schedule list" & "ceph fs snap-schedule status" - Filed a different issue for this regression issue - https://bugzilla.redhat.com/show_bug.cgi?id=2187659

[root@magna021 ~]# ceph fs snap-schedule add --path=/ 1M --fs=cephfs
Error ENOTSUP: Module 'snap_schedule' is not enabled (required by command 'fs snap-schedule add'): use `ceph mgr module enable snap_schedule` to enable it

[root@magna021 ~]# ceph mgr module enable snap_schedule

[root@magna021 ~]# ceph fs snap-schedule add --path=/ 1M --fs=cephfs
Schedule set for path /

[root@magna021 ~]# ceph fs snap-schedule add --path=/volumes/_nogroup/subv1/ 1M --fs=cephfs
Schedule set for path /volumes/_nogroup/subv1/

[root@magna049 .snap]# ceph fs snap-schedule list /
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1758, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/snap_schedule/module.py", line 104, in snap_schedule_list
    abs_path = self.resolve_subvolume_path(use_fs, subvol, path)
NameError: name 'subvol' is not defined

[root@magna049 .snap]# ceph fs snap-schedule list / --recursive
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1758, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/snap_schedule/module.py", line 104, in snap_schedule_list
    abs_path = self.resolve_subvolume_path(use_fs, subvol, path)
NameError: name 'subvol' is not defined

[root@magna049 .snap]# ceph fs snap-schedule status /
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1758, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/snap_schedule/module.py", line 83, in snap_schedule_get
    abs_path = self.resolve_subvolume_path(use_fs, subvol, path)
NameError: name 'subvol' is not defined

Comment 52 Venky Shankar 2023-04-18 12:56:55 UTC
(In reply to Hemanth Kumar from comment #51)
> All the snapshots are listed under an NFS mount inside a snap dir.

In that case, can this BZ be move to VERIFIED state?

Comment 53 Hemanth Kumar 2023-04-18 12:59:53 UTC
(In reply to Venky Shankar from comment #52)
> (In reply to Hemanth Kumar from comment #51)
> > All the snapshots are listed under an NFS mount inside a snap dir.
> 
> In that case, can this BZ be move to VERIFIED state?

I'll leave the schedules to go on for a few more hours and monitor and then close it.

Comment 57 errata-xmlrpc 2023-06-15 09:15:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3623