Bug 1429347 - cannot list contents of a snapshot without ganesha restart
Summary: cannot list contents of a snapshot without ganesha restart
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NFS-Ganesha
Version: 3.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1561457
TreeView+ depends on / blocked
 
Reported: 2017-03-06 07:36 UTC by Ram Raja
Modified: 2022-08-11 08:31 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1561457 (view as bug list)
Environment:
Last Closed: 2020-06-24 11:08:25 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-5063 0 None None None 2022-08-11 08:31:00 UTC

Description Ram Raja 2017-03-06 07:36:25 UTC
Description of problem:
From a NFSv4 client, after creating a snapshot [1] of a share served by CephFS/Ganesha backend, cannot list the contents of the snapshot. Need to restart the ganesha server, and remount (umount + mount) the share to be able to list the contents of the snapshot.

[1] http://docs.ceph.com/docs/master/dev/cephfs-snapshots/
 
Version-Release number of selected component (if applicable):
NFS-Ganesha (source-install using next branch)
V2.5-dev-16, commit SHA1 17808ba522fdd9e42747e6

Ceph, libcephfs (master branch)
12.0.0-957-gf1ae21b-1xenial, commit SHA1 f1ae21b

How reproducible:
Always

Steps to Reproduce:

1. Mount a share served by Ganesha with Ceph FSAL using NFSv4 client.
   Turned off ganesha's directory caching and attribute caching, and NFS client caching (actimeo=0).

$ sudo cat /etc/ganesha/ganesha.conf
 CACHE_INODE {
     Attr_Expiration_Time = 0;
     Dir_Max = 1;
}

EXPORT {
    
    CLIENT {
        Clients = 10.0.2.15;
        Access_Type = rw;
    }

    SecType = sys;
    Pseudo = /volumes/_nogroup/543c7074-1210-4c12-945d-3fcb78935fc5;
    Squash = None;
    
    FSAL {
        Name = CEPH;
        User_Id = "ganesha";
        Secret_Access_Key = "AQCMarhYXl//GRAAgSWMDPnN4sMgevCHZP/Xfw==";
    }
    
    Tag = share-543c7074-1210-4c12-945d-3fcb78935fc5;
    Path = /volumes/_nogroup/543c7074-1210-4c12-945d-3fcb78935fc5;
    Export_Id = 102;
    

}

$  sudo mount.nfs -o vers=4,rw,actimeo=0 10.0.2.15:/volumes/_nogroup/543c7074-1210-4c12-945d-3fcb78935fc5 /mnt/nfs4/
   
2. Create files in the share.
$ cd /mnt/nfs4/
$ touch file{0..10}

3. Create a snapshot by creating a dir in the hidden .snap folder within the share.
$ mkdir .snap/snap42

4. List contents of the snapshot. Unable to see the snapshot contents.
$ ls .snap/snap42

5. Restart the ganesha server, and remount (umount + mount) the share.

6. Now list the snapshot folder. You should be able to see the snapshot contents.


Actual results:
Cannot list contents of the snapshot without restarting the ganesha server,
and remounting share.

Expected results:
Don't need a ganesha server restart and share remount to see snapshot contents.

Additional info:

Comment 1 Jeff Layton 2017-03-06 11:58:05 UTC
(In reply to Ram Raja from comment #0)

> 4. List contents of the snapshot. Unable to see the snapshot contents.
> $ ls .snap/snap42
> 

What actually happens here? Do you get an error back or does it just look empty?

Comment 2 Ram Raja 2017-03-06 12:15:19 UTC
Sorry! I wasn't clear. Without the ganesha server restart, the snapshot looks empty even though it isn't.

In Steps to Reproduce:

>  4. List contents of the snapshot. Unable to see the snapshot contents.
>  $ ls .snap/snap42

It's just empty.

Listing a particular file/directory within a snapshot works.
$ ls .snap/snap42/file0
.snap/snap99/file

And, now if I again list the contents of the snapshot, I can see the file/directory that I'd particularly listed earlier, other files/directories in the snaphot are still missing.
$ ls .snap/snap42
file0

Comment 3 Jeff Layton 2017-03-06 17:28:38 UTC
Ok, another question too -- you need to restart ganesha _and_ unmount/remount the nfs mount? Just remounting isn't sufficient?

Comment 4 Ram Raja 2017-03-07 04:56:28 UTC
(In reply to Jeff Layton from comment #3)
> Ok, another question too -- you need to restart ganesha _and_
> unmount/remount the nfs mount? Just remounting isn't sufficient?

Just remounting isn't sufficient. I initially tried remounting with no success.  Then moved onto the restarting ganesha server, after which the client received ESTALE while listing the snapshot folder. So proceeded to unmount/remount the client and could immediately list the snapshot contents.

Comment 5 Jeff Layton 2017-03-07 11:15:11 UTC
Ok, thanks. That helps narrow it down to a real server-side problem, as it doesn't sound like it's due to missing cache invalidations on the NFS client or anything like that.

Comment 6 Frank Filz 2017-03-07 20:52:57 UTC
Hmm, Ganesha mkdir assumes the directory will be empty on creation, so it marks the directory as "populated".

Dir_Max = 1 doesn't actually disable caching, it just means a directory with more than 1 entry won't be cached....

Will need to think on this one...

Comment 7 Frank Filz 2017-03-08 00:38:58 UTC
Hmm, tried to re-create, but I get an EPERM when trying to create the directory in .snap. Am I missing something to actually enable snapshots?

The .snap directory does exist, though it never shows up in ls.

Comment 8 Ram Raja 2017-03-08 05:20:35 UTC
(In reply to Frank Filz from comment #7)
> Hmm, tried to re-create, but I get an EPERM when trying to create the
> directory in .snap. Am I missing something to actually enable snapshots?
> 
> The .snap directory does exist, though it never shows up in ls.

Yeah, CephFS's snapshot feature is experimental and is disabled by default.
Can you try enabling it by,
# ceph mds set allow_new_snaps true --yes-i-really-mean-it

Comment 9 Frank Filz 2017-03-08 18:45:04 UTC
Just tested with dirent chunking, and the files show up immediately!

Comment 10 Ram Raja 2017-03-09 06:51:21 UTC
(In reply to Frank Filz from comment #9)
> Just tested with dirent chunking, and the files show up immediately!

Nice!

How does dirent chunking solve the issue that you explained in Comment 6? I see quite a few patches in review related to dirent chunking, but I'm unable to connect the dots. Thanks!

Comment 11 Frank Filz 2017-03-09 19:55:09 UTC
Ganesha currently has a flag for directories, MDCACHE_DIR_POPULATED that indicates that the dirent cache has been loaded with all entries.

This flag is set on directory creation (under the assumption a new directory is empty - so actually the fix for pre-dirent chunking is to just remove this flag setting from creation of a new directory).

Chunking readdir actually never sets that flag (since it doesn't really track that it has read the entire directory).

The chunking readdir does CHECK that flag (which actually is a problem because it's not being set means the directory will always be invalidated - oops.. fix to patch coming very soon...), but chunking readdir will ALWAYS read the directory if there are no chunks. No entries in a newly created directory means no chunks, thus even though chunking readdir considers the dirent cache valid, an empty cache is effectively always considered invalid.

That flag will actually disappear completely once dirent chunking replaces the old scheme completely.

Comment 12 Ram Raja 2017-09-27 07:18:50 UTC
Dirent chunking feature in NFS-Ganesha in stable v2.5 branch solves this issue as mentioned in Comment 11.

Comment 13 Ram Raja 2018-03-28 11:14:38 UTC
It's recommended to turn off dirent chunking for FSAL_CEPH
https://github.com/nfs-ganesha/nfs-ganesha/commit/720b1466a7c982604c24c439e22a4c4d461eed4c#diff-d750277ebcac38a79b101cc0d2ed3f00R60

If we do this , we again run into this issue of not being to list CephFS snapshot contents. Ramakrishnan Periysamy hit this issue after setting,
```
CACHEINODE {
    Dir_Chunk = 0;
    Dir_Max = 1;
}
```
in the ganesha.conf. He later set,
```
CACHEINIODE{
    Dir_Chunk = 1;
    Dir_Max = 1;
}
```
and he could see the snapshot contents.

Is there a way to list CephFS snapshot contents with dirent chunking turned off, or should we recommend setting Dir_Chunk=1 to work around the issue?

Comment 14 Ram Raja 2018-03-28 11:18:50 UTC
Ramakrishnan hit the issue with NFS-Ganesha v2.5.5 and libcephfs, Ceph v12.2.4

Comment 15 Daniel Gryniewicz 2018-04-06 14:45:24 UTC
So, this is not a problem with 2.6, as the old dirent caching has been removed, so disabling dir_chunk just disables caching entirely, fixing this problem.

Unfortunately, the only way to disable the old dirent caching is to either enable dirent chunking, or (per directory) have a directory with more entries than Dir_Max.  Dir_Max cannot be 0, so it will only trigger in directories with at least 2 dirents (not counting "." and "..").

The "proper" fix for this kind of issue is to have the FSAL send an upcall to invalidate the dirents when a new snapshot is created (as must be done whenever a dirent is created behind Ganesha's back).  Maybe a workaround would be to not mark a newly created directory as populated?

Comment 16 Ram Raja 2018-06-05 14:21:48 UTC
Dan provided a work around in comment#15 for setups using NFS-Ganesha 2.5.5. The issue should go away with NFS-Ganesha 2.6.


Note You need to log in before you can comment on or make changes to this bug.