Description of problem: =========== Export 5 subvolume via cephFS (Ganesha1,ganesha2,ganesha3,ganesha4,ganesha5). Delete 1 export (ganesha4). Mount the export --> ganesha1 on 2 clients via NFS 4.2 say client 1 and Client 2 Copy the tar file on mount point from Client 1 and perform lookup from Client 2 on mount point while the copy file operation is in process. Version-Release number of selected component (if applicable): ========== [ceph: root@ceph-mani-oo0maz-node1-installer /]# rpm -qa | grep nfs libnfsidmap-2.5.4-18.el9.x86_64 nfs-utils-2.5.4-18.el9.x86_64 nfs-ganesha-selinux-5.1-1.el9cp.noarch nfs-ganesha-5.1-1.el9cp.x86_64 nfs-ganesha-ceph-5.1-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.1-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.1-1.el9cp.x86_64 nfs-ganesha-rgw-5.1-1.el9cp.x86_64 How reproducible: ========= 3/3 Steps to Reproduce: 1.Create ganesha cluster on 2 RHCS nodes # ceph nfs cluster info nfsganesha { "nfsganesha": { "virtual_ip": null, "backend": [ { "hostname": "ceph-mani-oo0maz-node5", "ip": "10.0.208.192", "port": 2049 }, { "hostname": "ceph-mani-oo0maz-node6", "ip": "10.0.210.195", "port": 2049 } ] } } 2.Create CephFS filesystem and create and export 5 subvolume via ganesha # ceph fs volume ls [ { "name": "cephfs" } ] # ceph fs subvolumegroup ls cephfs [ { "name": "ganesha4" }, { "name": "ganesha1" }, { "name": "ganesha2" }, { "name": "ganesha5" }, { "name": "ganesha3" } ] # ceph nfs export ls nfsganesha [ "/ganesha1", "/ganesha2", "/ganesha3", "/ganesha4", "/ganesha5" ] 3.Delete export /ganesha4 4. Mount the export on 2 clients (Client 1 and client 2) via NFS v 4.2 5. Copy the file on client 1 on mount point and at the same time perform lookup from client 2 Client 1: [root@ceph-mani-oo0maz-node11 ganesha1]# cp /root/linux-6.4.tar.xz /mnt/ganesha1/ [root@ceph-mani-oo0maz-node11 ganesha1]# Client 2: [root@ceph-mani-oo0maz-node10 ganesha1]# ls f2 linux-6.4.tar.xz Actual results: ===== NFS-Ganesha process getting crash and dumped core while performing lookup from client 2 Expected results: ==== NFS-Ganesha process should not crash Additional info: ===== [root@ceph-mani-oo0maz-node5 coredump]# lldb -c core.ganesha\\x2enfsd.0.76643ba0b43d472c8c2e29f59a62ae7e.60786.1687792030000000 (lldb) target create --core "core.ganesha\\x2enfsd.0.76643ba0b43d472c8c2e29f59a62ae7e.60786.1687792030000000" Core file '/var/lib/systemd/coredump/core.ganesha\x2enfsd.0.76643ba0b43d472c8c2e29f59a62ae7e.60786.1687792030000000' (x86_64) was loaded. (lldb) bt * thread #1, name = 'ganesha.nfsd', stop reason = signal SIGABRT * frame #0: 0x00007fd34677f54c ------ ganesha.log ----- P :EVENT :------------------------------------------------- Jun 26 10:36:35 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:35 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED Jun 26 10:36:35 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:35 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- Jun 26 10:36:45 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:45 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(2) clid count(2) Jun 26 10:36:45 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:45 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE Jun 26 10:58:44 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:58:44 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[svc_69] destroy_fsal_fd :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fa78c019f60 (&fsal_fd->work_mutex) at /builddir/build/BUILD/nfs-ganesha-5.1/src/include/fsal_types.h:1029 Jun 26 10:58:44 ceph-mani-oo0maz-node5 systemd-coredump[60575]: Process 52153 (ganesha.nfsd) of user 0 dumped core. Jun 26 10:58:44 ceph-mani-oo0maz-node5 podman[60580]: 2023-06-26 10:58:44.661759627 -0400 EDT m=+0.042498076 container died 392faf4f13539082c382554bbb70d8764e6ed4144499eb42f85df4fec95c1a00 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:0eb98763b77938a11cb62e8414119ed35a739f02077f2b6b0489f76d80a63e67, name=ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm, GIT_REPO=https://github.com/ceph/ceph-container.git, description=Red Hat Ceph Storage 6, ceph=True, io.buildah.version=1.29.0, vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, RELEASE=main, build-date=2023-06-23T19:14:24, com.redhat.component=rhceph-container, distribution-scope=public, release=179, architecture=x86_64, io.k8s.description=Red Hat Ceph Storage 6, CEPH_POINT_RELEASE=, com.redhat.license_terms=https://www.redhat.com/agreements, maintainer=Guillaume Abrioux <gabrioux>, name=rhceph, vcs-type=git, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-179, io.openshift.expose-services=, GIT_BRANCH=main, vendor=Red Hat, Inc., GIT_CLEAN=True, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, io.openshift.tags=rhceph ceph, version=6) Jun 26 10:58:44 ceph-mani-oo0maz-node5 podman[60580]: 2023-06-26 10:58:44.679685425 -0400 EDT m=+0.060423839 container remove 392faf4f13539082c382554bbb70d8764e6ed4144499eb42f85df4fec95c1a00 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:0eb98763b77938a11cb62e8414119ed35a739f02077f2b6b0489f76d80a63e67, name=ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm, GIT_CLEAN=True, io.openshift.tags=rhceph ceph, ceph=True, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, GIT_BRANCH=main, io.openshift.expose-services=, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-179, architecture=x86_64, com.redhat.component=rhceph-container, build-date=2023-06-23T19:14:24, vendor=Red Hat, Inc., distribution-scope=public, release=179, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, io.k8s.description=Red Hat Ceph Storage 6, com.redhat.license_terms=https://www.redhat.com/agreements, io.buildah.version=1.29.0, RELEASE=main, description=Red Hat Ceph Storage 6, maintainer=Guillaume Abrioux <gabrioux>, name=rhceph, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, CEPH_POINT_RELEASE=, vcs-type=git, GIT_REPO=https://github.com/ceph/ceph-container.git, version=6) Jun 26 10:58:44 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Main process exited, code=exited, status=134/n/a Jun 26 10:58:45 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Failed with result 'exit-code'. Jun 26 10:58:45 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Consumed 3.488s CPU time. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Scheduled restart job, restart counter is at 3. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: Stopped Ceph nfs.nfsganesha.0.0.ceph-mani-oo0maz-node5.qwtotm for 7f3277c8-1419-11ee-96b4-fa163eb1880b. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Consumed 3.488s CPU time. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: Starting Ceph nfs.nfsganesha.0.0.ceph-mani-oo0maz-node5.qwtotm for 7f3277c8-1419-11ee-96b4-fa163eb1880b... Jun 26 10:58:55 ceph-mani-oo0maz-node5 podman[60773]: Jun 26 10:58:55 ceph-mani-oo0maz-node5 podman[60773]: 2023-06-26 10:58:55.334297003 -0400 EDT m=+0.046869916 container create 901730a39b6c781fe3071ac5148b315d7d5d62ed695c9341990dc3cac0649f61 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:0eb98763b77938a11cb62e8414119ed35a739f02077f2b6b0489f76d80a63e67, name=ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., com.redhat.license_terms=https://www.redhat.com/agreements, ceph=True, vendor=Red Hat, Inc., io.buildah.version=1.29.0, architecture=x86_64, io.k8s.description=Red Hat Ceph Storage 6, maintainer=Guillaume Abrioux <gabrioux>, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, version=6, release=179, name=rhceph, GIT_CLEAN=True, io.openshift.tags=rhceph ceph, RELEASE=main, io.openshift.expose-services=, CEPH_POINT_RELEASE=, description=Red Hat Ceph Storage 6, vcs-type=git, distribution-scope=public, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, build-date=2023-06-23T19:14:24, GIT_BRANCH=main, com.redhat.component=rhceph-container, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-179, vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, GIT_REPO=https://github.com/ceph/ceph-container.git)
No observing the issue qith RHCS 7.0 build. Do we have RCA for same? Can we move this to ON_QA?
This is almost certainly the same root cause as Bug 2216442.