Bug 2313037

Summary: [NFS-Ganesha] [Scale Test] Ganesha crashed while running IO's on 2000 exports from 100 clients
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Manisha Saini <msaini>
Component: CephFSAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0CC: akraj, bhubbard, ceph-eng-bugs, cephqe-warriors, gfarnum, gouthamr, khiremat, kkeithle, ngangadh, rpollack, spunadik, tserlin, vdas, vshankar
Target Milestone: ---Keywords: TestBlocker
Target Release: 8.0Flags: khiremat: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.2.0-13.el9cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-25 09:11:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2317218    

Description Manisha Saini 2024-09-18 06:04:55 UTC
Description of problem:
======================

2000 NFS exports mounted on 100 clients via v4.2 protocol.
IO : FIO running in parallel from 100 clients
HA Enabled

Ganesha service got crashed.


# ceph nfs cluster info cephfs-nfs
{
  "cephfs-nfs": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      },
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.236"
  }
}


Ganesha.log
=====
Sep 18 05:55:13 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]: 18/09/2024 05:55:13 : epoch 66ea6a3b : cali015 : ganesha.nfsd-2[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]: /builddir/build/BUILD/ceph-19.1.1/src/client/Inode.cc: In function 'int Inode::put_cap_ref(int)' thread 7fe77affd640 time 2024-09-18T05:55:32.001602+0000
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]: /builddir/build/BUILD/ceph-19.1.1/src/client/Inode.cc: 207: FAILED ceph_assert(cap_refs[c] > 0)
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  ceph version 19.1.1-44.el9cp (1c2feb8f45d9a34109a907465648980102a4a7e0) squid (rc)
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7fe7c6f03de4]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  2: /usr/lib64/ceph/libceph-common.so.2(+0x18afa2) [0x7fe7c6f03fa2]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  3: /lib64/libcephfs.so.2(+0xbc173) [0x7fe7c5e86173]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  4: /lib64/libcephfs.so.2(+0xde2d5) [0x7fe7c5ea82d5]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  5: /lib64/libcephfs.so.2(+0x4a346) [0x7fe7c5e14346]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  6: /lib64/libcephfs.so.2(+0x48ddd) [0x7fe7c5e12ddd]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  7: /lib64/libcephfs.so.2(+0x7c996) [0x7fe7c5e46996]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  8: /lib64/libcephfs.so.2(+0x7ccc6) [0x7fe7c5e46cc6]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  9: /lib64/libcephfs.so.2(+0x71f4d) [0x7fe7c5e3bf4d]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  10: /lib64/libcephfs.so.2(+0x18d685) [0x7fe7c5f57685]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  11: /lib64/libcephfs.so.2(+0x14d6d7) [0x7fe7c5f176d7]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  12: /lib64/libcephfs.so.2(+0x144d0f) [0x7fe7c5f0ed0f]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  13: /lib64/libcephfs.so.2(+0x7467c) [0x7fe7c5e3e67c]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  14: (DispatchQueue::fast_dispatch(boost::intrusive_ptr<Message> const&)+0x190) [0x7fe7c7114500]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  15: /usr/lib64/ceph/libceph-common.so.2(+0x431895) [0x7fe7c71aa895]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  16: (ProtocolV2::handle_message()+0xd02) [0x7fe7c71de812]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  17: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x39) [0x7fe7c71d6cf9]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  18: (AsyncConnection::process()+0x676) [0x7fe7c71ae626]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  19: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x1cd) [0x7fe7c71fa27d]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  20: /usr/lib64/ceph/libceph-common.so.2(+0x481be6) [0x7fe7c71fabe6]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  21: /lib64/libstdc++.so.6(+0xdbad4) [0x7fe7c83d4ad4]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  22: /lib64/libc.so.6(+0x89c02) [0x7fe7c8801c02]
Sep 18 05:55:32 cali015 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx[1259888]:  23: /lib64/libc.so.6(+0x10ec40) [0x7fe7c8886c40]
Sep 18 05:55:34 cali015 podman[1263368]: 2024-09-18 05:55:34.081808092 +0000 UTC m=+0.028625811 container died b72ddb3e58ad3b6d1896d908b5aaac4dcee8fae4d616a276a98813101e4cc1c8 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:ca7ff00c2f687c7a750fe25dc2b69e6da6ccf9b261635d5f1d84a6500b4bb972, name=ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx, io.openshift.expose-services=, build-date=2024-09-16T19:56:18, com.redhat.component=rhceph-container, GIT_BRANCH=main, vcs-type=git, vendor=Red Hat, Inc., io.k8s.description=Red Hat Ceph Storage 8, distribution-scope=public, vcs-ref=f0f2707c29c8affe98c484af48cf2d3b5459146f, architecture=x86_64, maintainer=Guillaume Abrioux <gabrioux>, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-91, description=Red Hat Ceph Storage 8, GIT_REPO=https://github.com/ceph/ceph-container.git, com.redhat.license_terms=https://www.redhat.com/agreements, ceph=True, name=rhceph, GIT_CLEAN=True, io.buildah.version=1.29.0, RELEASE=main, CEPH_POINT_RELEASE=, io.openshift.tags=rhceph ceph, version=8, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, release=91)
Sep 18 05:55:34 cali015 podman[1263368]: 2024-09-18 05:55:34.094477639 +0000 UTC m=+0.041295350 container remove b72ddb3e58ad3b6d1896d908b5aaac4dcee8fae4d616a276a98813101e4cc1c8 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:ca7ff00c2f687c7a750fe25dc2b69e6da6ccf9b261635d5f1d84a6500b4bb972, name=ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx, CEPH_POINT_RELEASE=, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., vcs-type=git, version=8, release=91, architecture=x86_64, GIT_BRANCH=main, ceph=True, description=Red Hat Ceph Storage 8, com.redhat.license_terms=https://www.redhat.com/agreements, vendor=Red Hat, Inc., vcs-ref=f0f2707c29c8affe98c484af48cf2d3b5459146f, io.openshift.expose-services=, build-date=2024-09-16T19:56:18, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, name=rhceph, com.redhat.component=rhceph-container, io.openshift.tags=rhceph ceph, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-91, RELEASE=main, maintainer=Guillaume Abrioux <gabrioux>, io.k8s.description=Red Hat Ceph Storage 8, GIT_CLEAN=True, GIT_REPO=https://github.com/ceph/ceph-container.git, distribution-scope=public, io.buildah.version=1.29.0)
Sep 18 05:55:34 cali015 systemd[1]: ceph-4e687a60-638e-11ee-8772-b49691cee574.0.0.cali015.koxmhx.service: Main process exited, code=exited, status=134/n/a
Sep 18 05:55:34 cali015 systemd[1]: ceph-4e687a60-638e-11ee-8772-b49691cee574.0.0.cali015.koxmhx.service: Failed with result 'exit-code'.
Sep 18 05:55:34 cali015 systemd[1]: ceph-4e687a60-638e-11ee-8772-b49691cee574.0.0.cali015.koxmhx.service: Consumed 18.126s CPU time.
Sep 18 05:55:44 cali015 systemd[1]: ceph-4e687a60-638e-11ee-8772-b49691cee574.0.0.cali015.koxmhx.service: Scheduled restart job, restart counter is at 2.
Sep 18 05:55:44 cali015 systemd[1]: Stopped Ceph nfs.cephfs-nfs.0.0.cali015.koxmhx for 4e687a60-638e-11ee-8772-b49691cee574.
Sep 18 05:55:44 cali015 systemd[1]: ceph-4e687a60-638e-11ee-8772-b49691cee574.0.0.cali015.koxmhx.service: Consumed 18.126s CPU time.
Sep 18 05:55:44 cali015 systemd[1]: Starting Ceph nfs.cephfs-nfs.0.0.cali015.koxmhx for 4e687a60-638e-11ee-8772-b49691cee574...
Sep 18 05:55:44 cali015 podman[1263559]:
Sep 18 05:55:44 cali015 podman[1263559]: 2024-09-18 05:55:44.633445796 +0000 UTC m=+0.035470437 container create b00a2448e81c5904a8d9f6525c004274559ac86dde80998c1ee33588bc099a31 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:ca7ff00c2f687c7a750fe25dc2b69e6da6ccf9b261635d5f1d84a6500b4bb972, name=ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., description=Red Hat Ceph Storage 8, vendor=Red Hat, Inc., release=91, ceph=True, distribution-scope=public, name=rhceph, maintainer=Guillaume Abrioux <gabrioux>, io.k8s.description=Red Hat Ceph Storage 8, io.openshift.tags=rhceph ceph, CEPH_POINT_RELEASE=, GIT_CLEAN=True, vcs-type=git, architecture=x86_64, GIT_REPO=https://github.com/ceph/ceph-container.git, com.redhat.component=rhceph-container, io.buildah.version=1.29.0, build-date=2024-09-16T19:56:18, version=8, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-91, io.openshift.expose-services=, vcs-ref=f0f2707c29c8affe98c484af48cf2d3b5459146f, GIT_BRANCH=main, com.redhat.license_terms=https://www.redhat.com/agreements, RELEASE=main)
Sep 18 05:55:44 cali015 podman[1263559]: 2024-09-18 05:55:44.673649766 +0000 UTC m=+0.075674415 container init b00a2448e81c5904a8d9f6525c004274559ac86dde80998c1ee33588bc099a31 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:ca7ff00c2f687c7a750fe25dc2b69e6da6ccf9b261635d5f1d84a6500b4bb972, name=ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx, vendor=Red Hat, Inc., io.openshift.tags=rhceph ceph, architecture=x86_64, vcs-ref=f0f2707c29c8affe98c484af48cf2d3b5459146f, GIT_CLEAN=True, CEPH_POINT_RELEASE=, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-91, RELEASE=main, release=91, GIT_REPO=https://github.com/ceph/ceph-container.git, ceph=True, io.openshift.expose-services=, GIT_BRANCH=main, build-date=2024-09-16T19:56:18, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, vcs-type=git, com.redhat.component=rhceph-container, maintainer=Guillaume Abrioux <gabrioux>, io.buildah.version=1.29.0, description=Red Hat Ceph Storage 8, io.k8s.description=Red Hat Ceph Storage 8, distribution-scope=public, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., com.redhat.license_terms=https://www.redhat.com/agreements, name=rhceph, version=8)
Sep 18 05:55:44 cali015 podman[1263559]: 2024-09-18 05:55:44.677558982 +0000 UTC m=+0.079583623 container start b00a2448e81c5904a8d9f6525c004274559ac86dde80998c1ee33588bc099a31 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:ca7ff00c2f687c7a750fe25dc2b69e6da6ccf9b261635d5f1d84a6500b4bb972, name=ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-0-0-cali015-koxmhx, maintainer=Guillaume Abrioux <gabrioux>, vcs-ref=f0f2707c29c8affe98c484af48cf2d3b5459146f, ceph=True, architecture=x86_64, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-91, io.buildah.version=1.29.0, RELEASE=main, release=91, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, GIT_REPO=https://github.com/ceph/ceph-container.git, version=8, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., io.k8s.description=Red Hat Ceph Storage 8, com.redhat.license_terms=https://www.redhat.com/agreements, CEPH_POINT_RELEASE=, io.openshift.expose-services=, GIT_CLEAN=True, name=rhceph, description=Red Hat Ceph Storage 8, vcs-type=git, io.openshift.tags=rhceph ceph, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, distribution-scope=public, com.redhat.component=rhceph-container, vendor=Red Hat, Inc., GIT_BRANCH=main, build-date=2024-09-16T19:56:18)
Sep 18 05:55:44 cali015 podman[1263559]: 2024-09-18 05:55:44.619591912 +0000 UTC m=+0.021616554 image pull  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:ca7ff00c2f687c7a750fe25dc2b69e6da6ccf9b261635d5f1d84a6500b4bb972
===============


Version-Release number of selected component (if applicable):
======================
[ceph: root@cali013 /]# rpm -qa | grep nfs
libnfsidmap-2.5.4-26.el9_4.x86_64
nfs-utils-2.5.4-26.el9_4.x86_64
nfs-ganesha-selinux-6.0-4.el9cp.noarch
nfs-ganesha-6.0-4.el9cp.x86_64
nfs-ganesha-rgw-6.0-4.el9cp.x86_64
nfs-ganesha-ceph-6.0-4.el9cp.x86_64
nfs-ganesha-rados-grace-6.0-4.el9cp.x86_64
nfs-ganesha-rados-urls-6.0-4.el9cp.x86_64


[ceph: root@cali013 /]# ceph --version
ceph version 19.1.1-44.el9cp (1c2feb8f45d9a34109a907465648980102a4a7e0) squid (rc)


How reproducible:
==============
1/1


Steps to Reproduce:
==================
1. Deploy NFS on 2 nodes with HA enabled
2. Create 1 cephFS file system
3. Create 1 subvolume group
4. Create 2000 subvolumes on the subvolume group
5. Create 2000 NFS exports using subvolume
6. Mount the 2000 exports on 100 clients via v4.2 protocol
7. Run FIO in parallel from all 100 clients on 2000 exports


Actual results:
===============
Ganesha crashed


Expected results:
=================
Ganesha should not crash


Additional info:

Comment 49 errata-xmlrpc 2024-11-25 09:11:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216