Description of problem: =========== Export 5 subvolume via cephFS (Ganesha1,ganesha2,ganesha3,ganesha4,ganesha5). Delete 1 export (ganesha4). Mount the export --> ganesha1 on 2 clients via NFS 4.2 say client 1 and Client 2 Copy the tar file on mount point from Client 1 and perform lookup from Client 2 on mount point while the copy file operation is in process. Version-Release number of selected component (if applicable): ========== [ceph: root@ceph-mani-oo0maz-node1-installer /]# rpm -qa | grep nfs libnfsidmap-2.5.4-18.el9.x86_64 nfs-utils-2.5.4-18.el9.x86_64 nfs-ganesha-selinux-5.1-1.el9cp.noarch nfs-ganesha-5.1-1.el9cp.x86_64 nfs-ganesha-ceph-5.1-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.1-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.1-1.el9cp.x86_64 nfs-ganesha-rgw-5.1-1.el9cp.x86_64 How reproducible: ========= 3/3 Steps to Reproduce: 1.Create ganesha cluster on 2 RHCS nodes # ceph nfs cluster info nfsganesha { "nfsganesha": { "virtual_ip": null, "backend": [ { "hostname": "ceph-mani-oo0maz-node5", "ip": "10.0.208.192", "port": 2049 }, { "hostname": "ceph-mani-oo0maz-node6", "ip": "10.0.210.195", "port": 2049 } ] } } 2.Create CephFS filesystem and create and export 5 subvolume via ganesha # ceph fs volume ls [ { "name": "cephfs" } ] # ceph fs subvolumegroup ls cephfs [ { "name": "ganesha4" }, { "name": "ganesha1" }, { "name": "ganesha2" }, { "name": "ganesha5" }, { "name": "ganesha3" } ] # ceph nfs export ls nfsganesha [ "/ganesha1", "/ganesha2", "/ganesha3", "/ganesha4", "/ganesha5" ] 3.Delete export /ganesha4 4. Mount the export on 2 clients (Client 1 and client 2) via NFS v 4.2 5. Copy the file on client 1 on mount point and at the same time perform lookup from client 2 Client 1: [root@ceph-mani-oo0maz-node11 ganesha1]# cp /root/linux-6.4.tar.xz /mnt/ganesha1/ [root@ceph-mani-oo0maz-node11 ganesha1]# Client 2: [root@ceph-mani-oo0maz-node10 ganesha1]# ls f2 linux-6.4.tar.xz Actual results: ===== NFS-Ganesha process getting crash and dumped core while performing lookup from client 2 Expected results: ==== NFS-Ganesha process should not crash Additional info: ===== [root@ceph-mani-oo0maz-node5 coredump]# lldb -c core.ganesha\\x2enfsd.0.76643ba0b43d472c8c2e29f59a62ae7e.60786.1687792030000000 (lldb) target create --core "core.ganesha\\x2enfsd.0.76643ba0b43d472c8c2e29f59a62ae7e.60786.1687792030000000" Core file '/var/lib/systemd/coredump/core.ganesha\x2enfsd.0.76643ba0b43d472c8c2e29f59a62ae7e.60786.1687792030000000' (x86_64) was loaded. (lldb) bt * thread #1, name = 'ganesha.nfsd', stop reason = signal SIGABRT * frame #0: 0x00007fd34677f54c ------ ganesha.log ----- P :EVENT :------------------------------------------------- Jun 26 10:36:35 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:35 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED Jun 26 10:36:35 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:35 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- Jun 26 10:36:45 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:45 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(2) clid count(2) Jun 26 10:36:45 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:36:45 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE Jun 26 10:58:44 ceph-mani-oo0maz-node5 ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm[52149]: 26/06/2023 14:58:44 : epoch 6499a269 : ceph-mani-oo0maz-node5 : ganesha.nfsd-2[svc_69] destroy_fsal_fd :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fa78c019f60 (&fsal_fd->work_mutex) at /builddir/build/BUILD/nfs-ganesha-5.1/src/include/fsal_types.h:1029 Jun 26 10:58:44 ceph-mani-oo0maz-node5 systemd-coredump[60575]: Process 52153 (ganesha.nfsd) of user 0 dumped core. Jun 26 10:58:44 ceph-mani-oo0maz-node5 podman[60580]: 2023-06-26 10:58:44.661759627 -0400 EDT m=+0.042498076 container died 392faf4f13539082c382554bbb70d8764e6ed4144499eb42f85df4fec95c1a00 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:0eb98763b77938a11cb62e8414119ed35a739f02077f2b6b0489f76d80a63e67, name=ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm, GIT_REPO=https://github.com/ceph/ceph-container.git, description=Red Hat Ceph Storage 6, ceph=True, io.buildah.version=1.29.0, vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, RELEASE=main, build-date=2023-06-23T19:14:24, com.redhat.component=rhceph-container, distribution-scope=public, release=179, architecture=x86_64, io.k8s.description=Red Hat Ceph Storage 6, CEPH_POINT_RELEASE=, com.redhat.license_terms=https://www.redhat.com/agreements, maintainer=Guillaume Abrioux <gabrioux>, name=rhceph, vcs-type=git, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-179, io.openshift.expose-services=, GIT_BRANCH=main, vendor=Red Hat, Inc., GIT_CLEAN=True, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, io.openshift.tags=rhceph ceph, version=6) Jun 26 10:58:44 ceph-mani-oo0maz-node5 podman[60580]: 2023-06-26 10:58:44.679685425 -0400 EDT m=+0.060423839 container remove 392faf4f13539082c382554bbb70d8764e6ed4144499eb42f85df4fec95c1a00 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:0eb98763b77938a11cb62e8414119ed35a739f02077f2b6b0489f76d80a63e67, name=ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm, GIT_CLEAN=True, io.openshift.tags=rhceph ceph, ceph=True, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, GIT_BRANCH=main, io.openshift.expose-services=, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-179, architecture=x86_64, com.redhat.component=rhceph-container, build-date=2023-06-23T19:14:24, vendor=Red Hat, Inc., distribution-scope=public, release=179, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, io.k8s.description=Red Hat Ceph Storage 6, com.redhat.license_terms=https://www.redhat.com/agreements, io.buildah.version=1.29.0, RELEASE=main, description=Red Hat Ceph Storage 6, maintainer=Guillaume Abrioux <gabrioux>, name=rhceph, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, CEPH_POINT_RELEASE=, vcs-type=git, GIT_REPO=https://github.com/ceph/ceph-container.git, version=6) Jun 26 10:58:44 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Main process exited, code=exited, status=134/n/a Jun 26 10:58:45 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Failed with result 'exit-code'. Jun 26 10:58:45 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Consumed 3.488s CPU time. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Scheduled restart job, restart counter is at 3. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: Stopped Ceph nfs.nfsganesha.0.0.ceph-mani-oo0maz-node5.qwtotm for 7f3277c8-1419-11ee-96b4-fa163eb1880b. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b.0.0.ceph-mani-oo0maz-node5.qwtotm.service: Consumed 3.488s CPU time. Jun 26 10:58:55 ceph-mani-oo0maz-node5 systemd[1]: Starting Ceph nfs.nfsganesha.0.0.ceph-mani-oo0maz-node5.qwtotm for 7f3277c8-1419-11ee-96b4-fa163eb1880b... Jun 26 10:58:55 ceph-mani-oo0maz-node5 podman[60773]: Jun 26 10:58:55 ceph-mani-oo0maz-node5 podman[60773]: 2023-06-26 10:58:55.334297003 -0400 EDT m=+0.046869916 container create 901730a39b6c781fe3071ac5148b315d7d5d62ed695c9341990dc3cac0649f61 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:0eb98763b77938a11cb62e8414119ed35a739f02077f2b6b0489f76d80a63e67, name=ceph-7f3277c8-1419-11ee-96b4-fa163eb1880b-nfs-nfsganesha-0-0-ceph-mani-oo0maz-node5-qwtotm, summary=Provides the latest Red Hat Ceph Storage 6 on RHEL 9 in a fully featured and supported base image., com.redhat.license_terms=https://www.redhat.com/agreements, ceph=True, vendor=Red Hat, Inc., io.buildah.version=1.29.0, architecture=x86_64, io.k8s.description=Red Hat Ceph Storage 6, maintainer=Guillaume Abrioux <gabrioux>, io.k8s.display-name=Red Hat Ceph Storage 6 on RHEL 9, version=6, release=179, name=rhceph, GIT_CLEAN=True, io.openshift.tags=rhceph ceph, RELEASE=main, io.openshift.expose-services=, CEPH_POINT_RELEASE=, description=Red Hat Ceph Storage 6, vcs-type=git, distribution-scope=public, GIT_COMMIT=0727c855af939c6f3709e73be703026388413744, build-date=2023-06-23T19:14:24, GIT_BRANCH=main, com.redhat.component=rhceph-container, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/6-179, vcs-ref=dec93361f6f7a22d929d690d9002f0df9a8f6805, GIT_REPO=https://github.com/ceph/ceph-container.git)
No observing the issue qith RHCS 7.0 build. Do we have RCA for same? Can we move this to ON_QA?
This is almost certainly the same root cause as Bug 2216442.
Verified this BZ with [ceph: root@argo016 /]# rpm -qa | grep ganesha nfs-ganesha-selinux-5.5-1.el9cp.noarch nfs-ganesha-5.5-1.el9cp.x86_64 nfs-ganesha-rgw-5.5-1.el9cp.x86_64 nfs-ganesha-ceph-5.5-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.5-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.5-1.el9cp.x86_64 Steps peformed: 1.Configure continerized ganesha 2. Export 5 subvolume via NFS (Ganesha1,ganesha2,ganesha3,ganesha4,ganesha5). Delete 1 export (ganesha4). 3. Mount the export --> ganesha1 on 2 clients say client 1 and Client 2 4. Copy the tar file on mount point from Client 1 and perform lookup from Client 2 on mount point while the copy file operation is in process. No crashes were observed.Moving this BZ to verified state
As a bug introduced by the async/nonblocking work, I don't think this requires doc text. Please advise on how to proceed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780