Bug 2216442
| Summary: | [NFS-Ganesha] File locking test from multiple nfs client on same file is failing with ganesha service crash | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Manisha Saini <msaini> |
| Component: | NFS-Ganesha | Assignee: | Frank Filz <ffilz> |
| Status: | ASSIGNED --- | QA Contact: | Manisha Saini <msaini> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.1 | CC: | cephqe-warriors, ffilz, kkeithle, vdas |
| Target Milestone: | --- | ||
| Target Release: | 7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Can we get a tcpdump trace of the two clients? Also, do you have a stack backtrace from the coredump? Tested this on # rpm -qa | grep ganesha nfs-ganesha-selinux-5.3-1.el9cp.noarch nfs-ganesha-5.3-1.el9cp.x86_64 nfs-ganesha-ceph-5.3-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.3-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.3-1.el9cp.x86_64 nfs-ganesha-rgw-5.3-1.el9cp.x86_64 Not hitting the crash anymore.Locking works as expected Client 1 ====== \# ./a.out /mnt/ganesha/file1 opening /mnt/ganesha/file1 opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking # ./a.out /mnt/ganesha/file1 opening /mnt/ganesha/file1 opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking Client 2: ======= # ./a.out /mnt/ganesha/file1 opening /mnt/ganesha/file1 opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking # ./a.out /mnt/ganesha/file1 opening /mnt/ganesha/file1 opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking Can we close this one then? Hi Frank, Since the issue is fixed in NFS v5.3,we can move this BZ to on_qa for verification once we have RHCS 7.0 official builds available with nfs v5.3. QE will validate this issue with RHCS 7.0 builds Sounds good. |
Description of problem: =========== Taking lock on same file from 2 different nfs clients is failing on RHCS 6.1. Core is getting generated and ganesha process crashed and restarted Version-Release number of selected component (if applicable): ======== [ceph: root@ceph-mani-vt0a3y-node1-installer /]# rpm -qa | grep ganesha nfs-ganesha-selinux-5.1-1.el9cp.noarch nfs-ganesha-5.1-1.el9cp.x86_64 nfs-ganesha-ceph-5.1-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.1-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.1-1.el9cp.x86_64 nfs-ganesha-rgw-5.1-1.el9cp.x86_64 How reproducible: ============ 3/3 Steps to Reproduce: =============== 1. Deploy RHCS 6.1 cluster 2. Deploy NFS-Ganesha cluster on 2 of the nodes 3. Create a subvolume and export it via ganesha 4. Mount the volume on 2 clients #sudo mount -t nfs 10.0.210.192:/ceph /mnt/ganesha/ 5. Create a file 6. Take lock on file 1 from Client 1 --> Success Take lock on file 1 from Client 2 --> Failed which is expected Release lock on File 1 from client 1 --> Success Try to take lock on file 1 from client 2 --> It got failed. But ideally it should allow the client the acquire lock from client 2 Client 1 --- # ./a.out /mnt/ganesha/5G opening /mnt/ganesha/5G opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking Client 2 ---- $ ./a.out /mnt/ganesha/4G opening /mnt/ganesha/4G opened; hit Enter to lock... locking fcntl failed (Bad file descriptor) locked; hit Enter to write... Write failed locked; hit Enter to unlock... unlocking fcntl UNLOCK failed (Bad file descriptor) ----- Even after the lock is released from client 1 , client 2 is unable to take lock ----- $ ./a.out /mnt/ganesha/4G opening /mnt/ganesha/4G opened; hit Enter to lock... locking fcntl failed (Bad file descriptor) locked; hit Enter to write... Write failed locked; hit Enter to unlock... unlocking fcntl UNLOCK failed (Bad file descriptor) Actual results: ====== [ceph: root@ceph-mani-vt0a3y-node1-installer /]# ceph nfs cluster info nfsganesha { "nfsganesha": { "virtual_ip": null, "backend": [ { "hostname": "ceph-mani-vt0a3y-node4", "ip": "10.0.210.192", "port": 2049 }, { "hostname": "ceph-mani-vt0a3y-node6", "ip": "10.0.209.80", "port": 2049 } ] } } --------- ]# ceph orch ps --daemon_type=nfs NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID nfs.nfsganesha.0.0.ceph-mani-vt0a3y-node4.xwjsss ceph-mani-vt0a3y-node4 *:2049 running (97s) 94s ago 97s 17.0M - 5.1 b765ab0f02a7 e47becaf8f35 nfs.nfsganesha.1.0.ceph-mani-vt0a3y-node6.fhhzii ceph-mani-vt0a3y-node6 *:2049 running (96s) 94s ago 95s 19.0M - 5.1 b765ab0f02a7 4704f64c9c5e ------- [ceph: root@ceph-mani-vt0a3y-node1-installer /]# ceph fs subvolume ls cephfs --group_name nfs [ { "name": "ganesha1" }, { "name": "ganesha2" } ] ----- # ceph fs subvolume getpath cephfs ganesha2 --group_name nfs /volumes/nfs/ganesha2/fd964897-7046-4062-af4b-f3d0eb191de0 # ceph nfs export create cephfs nfsganesha /ceph cephfs --path=/volumes/nfs/ganesha2/fd964897-7046-4062-af4b-f3d0eb191de0 { "bind": "/ceph", "fs": "cephfs", "path": "/volumes/nfs/ganesha2/fd964897-7046-4062-af4b-f3d0eb191de0", "cluster": "nfsganesha", "mode": "RW" } # ceph nfs export get nfsganesha /ceph { "export_id": 1, "path": "/volumes/nfs/ganesha2/fd964897-7046-4062-af4b-f3d0eb191de0", "cluster_id": "nfsganesha", "pseudo": "/ceph", "access_type": "RW", "squash": "none", "security_label": true, "protocols": [ 4 ], "transports": [ "TCP" ], "fsal": { "name": "CEPH", "user_id": "nfs.nfsganesha.1", "fs_name": "cephfs" }, "clients": [] } Expected results: ============== Ganesha service should not crash and it should allow to take lock from client 2 if the lock is already release from client 1 Additional info: ============== ganesha.log --------- P :EVENT :------------------------------------------------- Jun 20 06:30:41 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 10:30:41 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED Jun 20 06:30:41 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 10:30:41 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- Jun 20 06:30:41 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 10:30:41 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[reaper] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 Jun 20 06:30:41 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 10:30:41 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[reaper] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend Jun 20 06:30:41 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 10:30:41 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) Jun 20 06:30:41 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 10:30:41 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE Jun 20 07:02:56 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 11:02:56 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[sigmgr] sigmgr_thread :MAIN :EVENT :SIGHUP_HANDLER: Received SIGHUP.... initiating export list reload Jun 20 07:02:56 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 11:02:56 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[sigmgr] reclaim_reset :FSAL :EVENT :start_reclaim failed: -2 Jun 20 07:02:57 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 11:02:57 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[sigmgr] reread_exports :CONFIG :EVENT :Reread exports complete Jun 20 14:41:06 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:06 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_27] nfs4_op_create_session :CLIENT ID :WARN :Invalid channel attributes for st_create_session.py:testTooSmallMaxReq Jun 20 14:41:06 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:06 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_27] nfs4_op_create_session :CLIENT ID :WARN :Invalid channel attributes for st_create_session.py:testTooSmallMaxRS Jun 20 14:41:07 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:07 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_24] rpc :TIRPC :EVENT :xdr_array_decode:783 ERROR size 2 > max 1 Jun 20 14:41:23 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:23 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_24] nfs4_Compound :NFS4 :MAJ :An empty COMPOUND (no operation in it) was received Jun 20 14:41:23 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:23 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_34] rpc :TIRPC :EVENT :xdr_opaque_decode:441 ERROR opaque Jun 20 14:41:23 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:23 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_24] nfs4_Compound :NFS4 :CRIT :Bad Minor Version 50 Jun 20 14:41:23 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:23 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_24] nfs4_Compound :NFS4 :CRIT :Bad Minor Version 50 Jun 20 14:41:24 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 18:41:24 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_24] rpc :TIRPC :EVENT :xdr_array_decode:783 ERROR size 2 > max 1 Jun 20 15:21:36 ceph-mani-vt0a3y-node4 ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss[24815]: 20/06/2023 19:21:36 : epoch 64917fce : ceph-mani-vt0a3y-node4 : ganesha.nfsd-2[svc_170] destroy_fsal_fd :RW LOCK :CRIT :Error 16, Destroy mutex 0x7efec8084fd0 (&fsal_fd->work_mutex) at /builddir/build/BUILD/nfs-ganesha-5.1/src/include/fsal_types.h:1029 Jun 20 15:21:36 ceph-mani-vt0a3y-node4 systemd-coredump[140045]: Process 24819 (ganesha.nfsd) of user 0 dumped core. Jun 20 15:21:36 ceph-mani-vt0a3y-node4 podman[140051]: 2023-06-20 15:21:36.768359053 -0400 EDT m=+0.043847334 container died e47becaf8f35726d808d19c6d0eee431e81c034301c4f3ea8e454e50a9dd681d (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:406b0370e6931584eefa37039eedd2255168b789f31e2d21f13bec9936b9d3ae, name=ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss, io.openshift.tags=rhceph ceph, io.buildah.version=1.27.3, io.openshift.expose-services=, CEPH_POINT_RELEASE=, com.redhat.license_terms=https://www.redhat.com/agreements, distribution-scope=public, name=rhceph, com.redhat.component=rhceph-container, maintainer=Guillaume Abrioux <gabrioux>, GIT_BRANCH=main, release=176, vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, RELEASE=main, io.k8s.description=Red Hat Ceph Storage 6, vcs-type=git, version=6, build-date=2023-05-30T00:53:52, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-176, GIT_CLEAN=True, description=Red Hat Ceph Storage 6, ceph=True, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., architecture=x86_64, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, GIT_REPO=https://github.com/ceph/ceph-container.git, vendor=Red Hat, Inc.) Jun 20 15:21:36 ceph-mani-vt0a3y-node4 podman[140051]: 2023-06-20 15:21:36.800127174 -0400 EDT m=+0.075615456 container remove e47becaf8f35726d808d19c6d0eee431e81c034301c4f3ea8e454e50a9dd681d (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:406b0370e6931584eefa37039eedd2255168b789f31e2d21f13bec9936b9d3ae, name=ceph-20c36492-0f49-11ee-96cd-fa163e5538cc-nfs-nfsganesha-0-0-ceph-mani-vt0a3y-node4-xwjsss, io.openshift.tags=rhceph ceph, com.redhat.license_terms=https://www.redhat.com/agreements, maintainer=Guillaume Abrioux <gabrioux>, version=6, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-176, architecture=x86_64, release=176, vendor=Red Hat, Inc., GIT_REPO=https://github.com/ceph/ceph-container.git, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, distribution-scope=public, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, ceph=True, CEPH_POINT_RELEASE=, GIT_BRANCH=main, io.openshift.expose-services=, GIT_CLEAN=True, description=Red Hat Ceph Storage 6, io.k8s.description=Red Hat Ceph Storage 6, io.buildah.version=1.27.3, com.redhat.component=rhceph-container, name=rhceph, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., RELEASE=main, vcs-type=git, build-date=2023-05-30T00:53:52) Jun 20 15:21:36 ceph-mani-vt0a3y-node4 systemd[1]: ceph-20c36492-0f49-11ee-96cd-fa163e5538cc.0.0.ceph-mani-vt0a3y-node4.xwjsss.service: Main process exited, code=exited, status=134/n/a Jun 20 15:21:37 ceph-mani-vt0a3y-node4 systemd[1]: ceph-20c36492-0f49-11ee-96cd-fa163e5538cc.0.0.ceph-mani-vt0a3y-node4.xwjsss.service: Failed with result 'exit-code'. Jun 20 15:21:37 ceph-mani-vt0a3y-node4 systemd[1]: ceph-20c36492-0f49-11ee-96cd-fa163e5538cc.0.0.ceph-mani-vt0a3y-node4.xwjsss.service: Consumed 27.818s CPU time. Jun 20 15:21:47 ceph-mani-vt0a3y-node4 systemd[1]: ceph-20c36492-0f49-11ee-96cd-fa163e5538cc.0.0.ceph-mani-vt0a3y-node4.xwjsss.service: Scheduled restart job, restart counter is at 1. Jun 20 15:21:47 ceph-mani-vt0a3y-node4 systemd[1]: Stopped Ceph nfs.nfsganesha.0.0.ceph-mani-vt0a3y-node4.xwjsss for 20c36492-0f49-11ee-96cd-fa163e5538cc. Jun 20 15:21:47 ceph-mani-vt0a3y-node4 systemd[1]: ceph-20c36492-0f49-11ee-96cd-fa163e5538cc.0.0.ceph-mani-vt0a3y-node4.xwjsss.service: Consumed 27.818s CPU time. Jun 20 15:21:47 ceph-mani-vt0a3y-node4 systemd[1]: Starting Ceph nfs.nfsganesha.0.0.ceph-mani-vt0a3y-node4.xwjsss for 20c36492-0f49-11ee-96cd-fa163e5538cc... Jun 20 15:21:47 ceph-mani-vt0a3y-node4 podman[140138]: