Bug 1566467
Summary: | [nfs-ganesha] NFS SETATTR requests are not durable with NFS-Ganesha backed by CephFS | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Ram Raja <rraja> | ||||
Component: | CephFS | Assignee: | Matt Benjamin (redhat) <mbenjamin> | ||||
Status: | CLOSED ERRATA | QA Contact: | Ramakrishnan Periyasamy <rperiyas> | ||||
Severity: | high | Docs Contact: | Aron Gunn <agunn> | ||||
Priority: | high | ||||||
Version: | 3.0 | CC: | agunn, ceph-eng-bugs, ceph-qe-bugs, jlayton, john.spray, mbenjamin, pdonnell, rperiyas, sostapov, vakulkar | ||||
Target Milestone: | z3 | ||||||
Target Release: | 3.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-12.2.4-10.el7cp Ubuntu: ceph_12.2.4-14redhat1xenial | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-05-15 18:20:31 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ram Raja
2018-04-12 11:32:47 UTC
Patches to fix this have been merged upstream Ceph repo's master branch and NFS-Ganesha repo's next branch. We'd need to backport these patches. Patch in Ceph Master -------------------- commit 78628a5fde277927126d4d6e5d2957e7e6d42fb1 Author: Jeff Layton <jlayton> Date: Wed Mar 14 12:25:36 2018 -0400 client: add ceph_ll_sync_inode Ganesha's NFS SETATTR requests are expected to be synchronous, but libcephfs can cache attribute changes if the client has the right caps. An fsync after changing the attributes would flush the caps back to the MDS, but we may not have an Fh in the context of a SETATTR request. Add a way to sync a specific Inode, even when we don't have easy access to a Fh. Tracker: http://tracker.ceph.com/issues/23291 Signed-off-by: Jeff Layton <jlayton> Patches in NFS-Ganesha 'next' branch ------------------------------------ commit fdd020fab3034ef96571646450c28c2ec9329b70 Author: Jeff Layton <jlayton> Date: Mon Apr 2 14:42:18 2018 -0400 cmake: add USE_FSAL_CEPH_LL_SYNC_INODE to config-h.in.cmake As Dan points out, I left this (rather important) bit out. Change-Id: I69aa582a203dc099577c591f474367381a535d17 Reported-by: Daniel Gryniewicz <dang> Signed-off-by: Jeff Layton <jlayton> commit d9fe5c92a1543a2c8b1c1a91c4d610f4adb92e07 Author: Jeff Layton <jlayton> Date: Wed Mar 14 13:49:48 2018 -0400 FSAL_CEPH: add check for and call to ceph_ll_sync_inode libcephfs can cache the results of a setattr request after replying to it. This violates the semantics that NFS expects. Add a call to ceph_ll_sync_inode after the setattr, if the function exists. Change-Id: Ibbe86fe547bc571dd71e7a5d93cc13111c03910a Signed-off-by: Jeff Layton <jlayton> Created attachment 1421030 [details]
patch to add ceph testcase to demonstrate problem
This patch demonstrates the problem with a cephfs testcase. It fails for me, but if you uncomment the ceph_ll_sync_inode call, it works. If you use Zheng's reclaim branch, you can uncomment the other calls and it then runs quicker too.
So, I believe this problem is theoretically possible with ganesha, but I'm not sure exactly how to make it occur on demand.
This may be uncovering a different issue as well. With a ceph pull from earlier this week, I'm seeing ~5s stalls in the ll_sync_inode calls during a ganesha SETATTR op. It looks like we're issuing the fsync to the MDS, but it's taking a full tick to reply. That said, I need to confirm that and turn up the debugging. http://tracker.ceph.com/issues/23714 for the slow ll_sync_inode problem. PR for slow ll_sync_inode calls is here: https://github.com/ceph/ceph/pull/21498 We'll definitely want to pull that patch in as part of this backport (assuming it passes testing and gets merged, of course). > No reliable reproducer is known so QE should just run basic regression
> testing on this.
Since there is no reliable reproducer and basic regression is enough to move this bz to verified we have ran basic tests with IO for CephFS-NFS as mentioned below.
1. Basic IO workloads using IO tools like Smallfiles, Crefi, fio and "rm -rf" of files and folders in another thread. All these IO were ran parallel in same NFS mount on different directory.
2. IO Workloads used during testing (repeated below workloads after completion of IO in another directory)
fio: sudo ./fio --name=global --rw=readwrite --size=50G --name=job1 --name=job2 --directory=/mnt/nfs/_dir4_1/ --verify=meta --runtime=2h --time_based
crefi: for i in create rename chmod chown chgrp symlink hardlink setxattr truncate; do sudo ./crefi.py --multi -b 10 -d 10 -n 1000 -t sparse --random --min=1K --max=512K /mnt/nfs/@#dir8/ ; done
smallfiles: for i in create append overwrite delete create mkdir read rename readdir symlink rmdir stat chmod setxattr rename getxattr ls-l ; do sudo python ./smallfile_cli.py --operation $i --threads 8 --file-size 1024 --files 2048 --top /mnt/nfs/dir10/ ; done
3. Restart of ganesha service during IO in progress, not seen any failures, IO's were running without any issues (i.e. observed a delay in IO but not faiure)
4. With 2 NFS client(only mount no NFS HA), running IO's from both client on different directory. Used the same workloads as in 1 & 2 from both clients.
5. Restart of ganesha service while IO's in progress from both clients.
6. Basic command tests like creating files and directories with special characters, scp, vi editors, files with big names, etc.
7. During Client IO's MDS failover
Moving this bz to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1563 |