Bug 2359508 - [NFS-Ganesha] The NFS Ganesha daemon crashed at lookup_path while updating the ops limit, with a lookup operation running in parallel.
Summary: [NFS-Ganesha] The NFS Ganesha daemon crashed at lookup_path while updating th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NFS-Ganesha
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 8.1
Assignee: Deeraj Patil
QA Contact: Manish Singh
URL:
Whiteboard:
Depends On: 2362289
Blocks: 2367464
TreeView+ depends on / blocked
 
Reported: 2025-04-14 13:40 UTC by Manisha Saini
Modified: 2025-06-26 12:24 UTC (History)
3 users (show)

Fixed In Version: nfs-ganesha-6.5-10.el9cp; rhceph-container-8-403
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2367464 (view as bug list)
Environment:
Last Closed: 2025-06-26 12:24:01 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-11162 0 None None None 2025-04-14 13:50:14 UTC
Red Hat Product Errata RHSA-2025:9775 0 None None None 2025-06-26 12:24:09 UTC

Description Manisha Saini 2025-04-14 13:40:35 UTC
Description of problem:
=========

Set the PerShare ops limit to max supported limit i.e 409600. Create a file using dd command. Update the ops configuration to PerClient with the limit of 409600.
Perform lookup on the mount point while updation is running.
At this stage, ganesha server crashed and dumped multiple core files


# ceph orch ps | grep nfs
haproxy.nfs.nfsganesha.cali015.dcrxwa     cali015  *:2049,9049       running (4h)     5m ago   4h     101M        -  2.4.22-f8e3218                       6c223bddea69  77ce1b2fbfe1
keepalived.nfs.nfsganesha.cali015.woknus  cali015                    running (4h)     5m ago   4h    1555k        -  2.2.8                                09859a486cb9  9d848e7ac5aa
nfs.nfsganesha.0.0.cali015.lspczp         cali015  *:12049           error            5m ago   4h        -        -  <unknown>                            <unknown>     <unknown>


ganesha.log
----
Apr 14 13:01:24 cali015 ceph-288c1062-18fb-11f0-a987-b49691cee574-nfs-nfsganesha-0-0-cali015-lspczp[452515]: 14/04/2025 13:01:24 : epoch 67fd071e : cali015 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
Apr 14 13:01:24 cali015 ceph-288c1062-18fb-11f0-a987-b49691cee574-nfs-nfsganesha-0-0-cali015-lspczp[452515]: 14/04/2025 13:01:24 : epoch 67fd071e : cali015 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(3)
Apr 14 13:01:29 cali015 systemd-coredump[452627]: Process 452525 (ganesha.nfsd) of user 0 dumped core.

                                                  Stack trace of thread 70:
                                                  #0  0x00007f8f6c5cda30 n/a (/usr/lib64/libganesha_nfsd.so.6.5 + 0xe8a30)
                                                  ELF object binary architecture: AMD x86-64
Apr 14 13:01:29 cali015 podman[452633]: 2025-04-14 13:01:29.551319814 +0000 UTC m=+0.025186287 container died 8c79bb7bfd6d70a2f9c5d9b6bdd714c05fb84a68481d3f5f547fad2b0c92cecc (image=cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9@sha256:ca65e6bfabd1652fec495211ae72d8a4af3271bcd88ea948b623089381b982f3, name=ceph-288c1062-18fb-11f0-a987-b49691cee574-nfs-nfsganesha-0-0-cali015-lspczp, io.openshift.tags=ibm ceph, GIT_BRANCH=main, io.k8s.description=IBM Storage Ceph 8, maintainer=Guillaume Abrioux <gabrioux>, release=105.0.hotfix.bz2357486, description=IBM Storage Ceph 8, io.buildah.version=1.33.12, ceph=True, com.redhat.component=ibm-ceph-container, name=ibm-ceph, RELEASE=main, distribution-scope=public, GIT_CLEAN=True, GIT_REPO=https://github.com/ceph/ceph-container.git, GIT_COMMIT=eadbe5f6c4471e17c1721f9f08dde7964a4f491b, CEPH_POINT_RELEASE=, io.openshift.expose-services=, com.redhat.license_terms=https://www.redhat.com/agreements, build-date=2025-04-11T20:16:13, version=8, vcs-ref=8dc014514b5df6095811d1ad01a9d2c98e222a0e, io.k8s.display-name=IBM Storage Ceph 8, vendor=Red Hat, Inc., architecture=x86_64, summary=Provides the latest IBM Storage Ceph 8 in a fully featured and supported base image., url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/8-105.0.hotfix.bz2357486, vcs-type=git)
Apr 14 13:01:29 cali015 podman[452633]: 2025-04-14 13:01:29.55981934 +0000 UTC m=+0.033685806 container remove 8c79bb7bfd6d70a2f9c5d9b6bdd714c05fb84a68481d3f5f547fad2b0c92cecc (image=cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9@sha256:ca65e6bfabd1652fec495211ae72d8a4af3271bcd88ea948b623089381b982f3, name=ceph-288c1062-18fb-11f0-a987-b49691cee574-nfs-nfsganesha-0-0-cali015-lspczp, GIT_COMMIT=eadbe5f6c4471e17c1721f9f08dde7964a4f491b, com.redhat.license_terms=https://www.redhat.com/agreements, distribution-scope=public, summary=Provides the latest IBM Storage Ceph 8 in a fully featured and supported base image., url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/8-105.0.hotfix.bz2357486, io.openshift.tags=ibm ceph, io.openshift.expose-services=, version=8, maintainer=Guillaume Abrioux <gabrioux>, name=ibm-ceph, io.k8s.display-name=IBM Storage Ceph 8, architecture=x86_64, GIT_REPO=https://github.com/ceph/ceph-container.git, com.redhat.component=ibm-ceph-container, vcs-ref=8dc014514b5df6095811d1ad01a9d2c98e222a0e, vcs-type=git, io.buildah.version=1.33.12, io.k8s.description=IBM Storage Ceph 8, GIT_CLEAN=True, release=105.0.hotfix.bz2357486, GIT_BRANCH=main, RELEASE=main, build-date=2025-04-11T20:16:13, CEPH_POINT_RELEASE=, vendor=Red Hat, Inc., ceph=True, description=IBM Storage Ceph 8)
Apr 14 13:01:29 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Main process exited, code=exited, status=139/n/a
Apr 14 13:01:29 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Failed with result 'exit-code'.
Apr 14 13:01:29 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Consumed 1.488s CPU time.
Apr 14 13:01:39 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Scheduled restart job, restart counter is at 5.
Apr 14 13:01:39 cali015 systemd[1]: Stopped Ceph nfs.nfsganesha.0.0.cali015.lspczp for 288c1062-18fb-11f0-a987-b49691cee574.
Apr 14 13:01:39 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Consumed 1.488s CPU time.
Apr 14 13:01:39 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Start request repeated too quickly.
Apr 14 13:01:39 cali015 systemd[1]: ceph-288c1062-18fb-11f0-a987-b49691cee574.0.0.cali015.lspczp.service: Failed with result 'exit-code'.
Apr 14 13:01:39 cali015 systemd[1]: Failed to start Ceph nfs.nfsganesha.0.0.cali015.lspczp for 288c1062-18fb-11f0-a987-b49691cee574.

-------



Version-Release number of selected component (if applicable):
==========
# ceph --version
ceph version 19.2.0-124.1.hotfix.bz2357486.el9cp (1d6c39dcc35466271ef633ccfd91e51cc792656b) squid (stable)


How reproducible:
========
1/1


Steps to Reproduce:
========
1. Set the PerShare ops limit to max supported limit i.e 409600. 

[ceph: root@cali013 /]# ceph nfs cluster qos enable ops_control nfsganesha PerShare --max_export_iops 409600
[ceph: root@cali013 /]# ceph nfs cluster qos get nfsganesha
{
  "combined_rw_bw_control": false,
  "enable_bw_control": false,
  "enable_iops_control": true,
  "enable_qos": true,
  "max_export_iops": 409600,
  "qos_type": "PerShare"
}


2. Create a file using dd command. 

[root@cali020 ganesha]# dd if=/dev/urandom of=/mnt/ganesha/file4 bs=1G count=10 oflag=direct
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 48.0426 s, 223 MB/s


3. Update the ops configuration to PerClient with the limit of 409600 and Perform lookup on the mount point while updation is running.

[ceph: root@cali013 /]# ceph nfs cluster qos enable ops_control nfsganesha PerClient --max_client_iops 409600
[ceph: root@cali013 /]# ceph nfs cluster qos get nfsganesha
{
  "combined_rw_bw_control": false,
  "enable_bw_control": false,
  "enable_iops_control": true,
  "enable_qos": true,
  "max_client_iops": 409600,
  "qos_type": "PerClient"
}




Actual results:
======
At this stage, ganesha server crashed and dumped multiple core files


Expected results:
======
Ganesha should not crash


Additional info:

==========
--Type <RET> for more, q to quit, c to continue without paging--c
Missing separate debuginfos, use: dnf debuginfo-install abseil-cpp-20211102.0-4.el9.x86_64 c-ares-1.19.1-2.el9_4.x86_64 dbus-libs-1.12.20-8.el9.x86_64 glibc-2.34-125.el9_5.1.x86_64 grpc-1.46.7-10.el9.x86_64 grpc-cpp-1.46.7-10.el9.x86_64 gssproxy-0.8.4-7.el9.x86_64 keyutils-libs-1.6.3-1.el9.x86_64 krb5-libs-1.21.1-4.el9_5.x86_64 libacl-2.3.1-4.el9.x86_64 libattr-2.5.1-3.el9.x86_64 libblkid-2.37.4-20.el9.x86_64 libcom_err-1.46.5-5.el9.x86_64 libcurl-7.76.1-31.el9.x86_64 libgcc-11.5.0-5.el9_5.x86_64 libgpg-error-1.42-5.el9.x86_64 libibverbs-51.0-1.el9.x86_64 libnfsidmap-2.5.4-27.el9.x86_64 libnghttp2-1.43.0-6.el9.x86_64 libnl3-3.9.0-1.el9.x86_64 librdmacm-51.0-1.el9.x86_64 libselinux-3.6-1.el9.x86_64 libstdc++-11.5.0-5.el9_5.x86_64 libuuid-2.37.4-20.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lttng-ust-2.12.0-6.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 numactl-libs-2.0.18-2.el9.x86_64 openssl-libs-3.2.2-6.el9_5.1.x86_64 pcre2-10.40-6.el9.x86_64 protobuf-3.14.0-13.el9.x86_64 sssd-client-2.9.5-4.el9_5.4.x86_64 userspace-rcu-0.12.1-6.el9.x86_64 xz-libs-5.2.5-8.el9_0.x86_64 zlib-1.2.11-40.el9.x86_64#0  0x00007f1483db6836 in lookup_path (export_pub=0x7f1483dc0fa0, path=0x0, pub_handle=0x562ee08902b0, attrs_out=0x7f1486d44000 <default_log_levels>) at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/FSAL_CEPH/export.c:142
142	{
[Current thread is 1 (LWP 76)]
(gdb) bt
#0  0x00007f1483db6836 in lookup_path (export_pub=0x7f1483dc0fa0, path=0x0, pub_handle=0x562ee08902b0, attrs_out=0x7f1486d44000 <default_log_levels>) at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/FSAL_CEPH/export.c:142
#1  0x00007f1486b8cdaf in shutdown_handles (fsal=0x7f1486cce3a0 <__func__.3.lto_priv.3>) at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/fsal_destroyer.c:67
#2  destroy_fsals () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/fsal_destroyer.c:151
#3  0x0000562edc81a720 in ?? ()
#4  0x00007f1483dc1030 in CephFSM () from /usr/lib64/ganesha/libfsalceph.so
#5  0x00007f1483dc1030 in CephFSM () from /usr/lib64/ganesha/libfsalceph.so
#6  0x00007f143c008540 in ?? ()
#7  0x0000562edc6762a0 in ?? ()
#8  0x0000562edc66bf10 in ?? ()
#9  0x00007f1486b84130 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/access_check.c:879 from /lib64/libganesha_nfsd.so.6.5
#10 0x00007f1483db1940 in init_config (module_in=0x7f1368110480, config_struct=0x7f1486d44360 <fsal_list>, err_type=0x7f1486d54df0 <general_fridge>) at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/FSAL_CEPH/main.c:165
#11 0x00007f1486b846c0 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:163 from /lib64/libganesha_nfsd.so.6.5
#12 0x00007f1486b7b090 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:150 from /lib64/libganesha_nfsd.so.6.5
#13 0x00007f1486b7b0a0 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:202 from /lib64/libganesha_nfsd.so.6.5
#14 0x00007f1486b7b0b0 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:212 from /lib64/libganesha_nfsd.so.6.5
#15 0x00007f1486b847c0 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:187 from /lib64/libganesha_nfsd.so.6.5
#16 0x00007f1486b7b390 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:585 from /lib64/libganesha_nfsd.so.6.5
#17 0x00007f1486b84870 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:240 from /lib64/libganesha_nfsd.so.6.5
#18 0x00007f1486b848c0 in ?? () at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/default_methods.c:267 from /lib64/libganesha_nfsd.so.6.5
#19 0x00007f1483db5cd0 in export_ops_init (ops=0x7f1483dc0fa0) at /usr/src/debug/nfs-ganesha-6.5-9.el9cp.x86_64/src/FSAL/FSAL_CEPH/export.c:495
#20 0x0000000000000001 in ?? ()
#21 0x0000000000000001 in ?? ()
#22 0x0000000000000001 in ?? ()
#23 0x0000000000000000 in ?? ()
(gdb) quit

Comment 14 errata-xmlrpc 2025-06-26 12:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775


Note You need to log in before you can comment on or make changes to this bug.