Bug 2362289

Summary: [NFS-Ganesha] Ganesha process crashed at __memcpy_evex_unaligned_erms while running the pynfs test suite vers=4.1
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Manisha Saini <msaini>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.1CC: ceph-eng-bugs, cephqe-warriors, ffilz, hacharya, hyelloji, khiremat, kkeithle, manising, mbenjamin, spunadik, tserlin, vdas, vshankar
Target Milestone: ---Keywords: Automation, Regression
Target Release: 8.1Flags: khiremat: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nfs-ganesha-6.5-13.el9cp; rhceph-container-8-424 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-06-26 12:31:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2363635    
Bug Blocks: 2367464, 2359508    

Description Manisha Saini 2025-04-25 11:22:57 UTC
Description of problem:
======

NFS-Ganesha server crashed and dumped core while running the pynfs sanity test suite

--------------------------------
Apr 25 11:09:10 ceph-nfsclusterlive-qrsq9e-node2 ceph-d55360cc-20dd-11f0-9cff-fa163eb24a5f-nfs-cephfs-nfs-0-0-ceph-nfsclusterlive-qrsq9e-node2-xqulsc[654968]: 25/04/2025 11:09:10 : epoch 680b6d4f : ceph-nfsclusterlive-qrsq9e-node2 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
Apr 25 11:09:10 ceph-nfsclusterlive-qrsq9e-node2 ceph-d55360cc-20dd-11f0-9cff-fa163eb24a5f-nfs-cephfs-nfs-0-0-ceph-nfsclusterlive-qrsq9e-node2-xqulsc[654968]: 25/04/2025 11:09:10 : epoch 680b6d4f : ceph-nfsclusterlive-qrsq9e-node2 : ganesha.nfsd-2[dbus] gsh_dbus_thread :DBUS :CRIT :DBUS not initialized, service thread exiting
Apr 25 11:09:10 ceph-nfsclusterlive-qrsq9e-node2 ceph-d55360cc-20dd-11f0-9cff-fa163eb24a5f-nfs-cephfs-nfs-0-0-ceph-nfsclusterlive-qrsq9e-node2-xqulsc[654968]: 25/04/2025 11:09:10 : epoch 680b6d4f : ceph-nfsclusterlive-qrsq9e-node2 : ganesha.nfsd-2[dbus] gsh_dbus_thread :DBUS :EVENT :shutdown
Apr 25 11:09:11 ceph-nfsclusterlive-qrsq9e-node2 systemd-coredump[655049]: Process 654972 (ganesha.nfsd) of user 0 dumped core.

                                                                           Stack trace of thread 68:
                                                                           #0  0x00007fbafd59c225 __memcpy_evex_unaligned_erms (libc.so.6 + 0x16a225)
                                                                           #1  0x00007fbafb028e84 n/a (/usr/lib64/ceph/libceph-common.so.2 + 0x49de84)
                                                                           ELF object binary architecture: AMD x86-64
Apr 25 11:09:12 ceph-nfsclusterlive-qrsq9e-node2 podman[655054]: 2025-04-25 11:09:12.038942001 +0000 UTC m=+0.023106154 container died d7cb2166ab33e85929416687165d75c8fed4bd0e61b3fcf442d50d9bfbfc3c31 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf0181a0987e49ad38becd1062baf79417c00825becae810e78806e23ee825a2, name=ceph-d55360cc-20dd-11f0-9cff-fa163eb24a5f-nfs-cephfs-nfs-0-0-ceph-nfsclusterlive-qrsq9e-node2-xqulsc, CEPH_POINT_RELEASE=, vcs-type=git, GIT_REPO=https://github.com/ceph/ceph-container.git, io.k8s.description=Red Hat Ceph Storage 8, GIT_BRANCH=main, ceph=True, io.openshift.tags=rhceph ceph, distribution-scope=public, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-409, release=409, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., version=8, GIT_CLEAN=True, description=Red Hat Ceph Storage 8, io.buildah.version=1.33.12, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, vendor=Red Hat, Inc., architecture=x86_64, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, io.openshift.expose-services=, vcs-ref=2e5d3c0e666903d2aac3aa37c8f72bef249f276f, RELEASE=main, build-date=2025-04-24T03:39:50, name=rhceph, com.redhat.component=rhceph-container, com.redhat.license_terms=https://www.redhat.com/agreements, maintainer=Guillaume Abrioux <gabrioux>)


Version-Release number of selected component (if applicable):
==================

# rpm -qa | grep nfs
libnfsidmap-2.5.4-27.el9.x86_64
nfs-utils-2.5.4-27.el9.x86_64
nfs-ganesha-selinux-6.5-10.el9cp.noarch
nfs-ganesha-6.5-10.el9cp.x86_64
nfs-ganesha-rgw-6.5-10.el9cp.x86_64
nfs-ganesha-ceph-6.5-10.el9cp.x86_64
nfs-ganesha-rados-grace-6.5-10.el9cp.x86_64
nfs-ganesha-rados-urls-6.5-10.el9cp.x86_64
nfs-ganesha-utils-6.5-10.el9cp.x86_64

# ceph --version
ceph version 19.2.1-159.el9cp (99c759851ebe12f2e6a118b424029bb14a6efc5b) squid (stable)


How reproducible:
===============
2/2


Steps to Reproduce:
=================
1. Deploy the ceph cluster
2. Create NFS Ganesha cluster

# ceph nfs cluster info cephfs-nfs
{
  "cephfs-nfs": {
    "backend": [
      {
        "hostname": "ceph-nfsclusterlive-qrsq9e-node2",
        "ip": "10.0.66.199",
        "port": 2049
      }
    ],
    "virtual_ip": null
  }
}

3. Create NFS export and mount it on client

 Execution of mount -t nfs -o vers=4.1,port=2049 ceph-nfsclusterlive-qrsq9e-node2:/export_1 /mnt/nfs on 10.0.67.59 took 1.003478 seconds.

4. Run pynfs test suite

2025-04-25 07:06:46,018 - cephci - ceph:1570 - INFO - Execute python3 -m pip install ply;cd /mnt/nfs;git clone git://git.linux-nfs.org/projects/bfields/pynfs.git;cd pynfs;-- yes |python setup.py build;cd nfs4.1;./testserver.py ceph-nfsclusterlive-qrsq9e-node2:/export_0 -v --outfile ~/pynfs.run --maketree --showomit --rundep all on 10.0.66.72
2025-04-25 07:17:35,039 - cephci - ceph:1621 - ERROR - python3 -m pip install ply;cd /mnt/nfs;git clone git://git.linux-nfs.org/projects/bfields/pynfs.git;cd pynfs;-- yes |python setup.py build;cd nfs4.1;./testserver.py ceph-nfsclusterlive-qrsq9e-node2:/export_0 -v --outfile ~/pynfs.run --maketree --showomit --rundep all failed to execute within 600 seconds.
2025-04-25 07:17:35,275 - cephci - nfs_verify_pynfs:64 - ERROR - Failed to run pynfs on ceph-nfsclusterlive-qrsq9e-node4, Error:
2025-04-25 07:17:35,277 - cephci - ceph:1570 - INFO - Execute ls -Art /var/lib/systemd/coredump | tail -n 1 on 10.0.66.199
2025-04-25 07:17:36,281 - cephci - ceph:1600 - INFO - Execution of ls -Art /var/lib/systemd/coredump | tail -n 1 on 10.0.66.199 took 1.003007 seconds.
2025-04-25 07:17:36,282 - cephci - ceph:1570 - INFO - Execute stat -c '%w' /var/lib/systemd/coredump/core.ganesha\x2enfsd.0.1b1df88562f14760aabcef6c94857406.654972.1745579350000000.zst
 on 10.0.66.199
2025-04-25 07:17:37,286 - cephci - ceph:1600 - INFO - Execution of stat -c '%w' /var/lib/systemd/coredump/core.ganesha\x2enfsd.0.1b1df88562f14760aabcef6c94857406.654972.1745579350000000.zst
 on 10.0.66.199 took 1.003291 seconds.
2025-04-25 07:17:37,287 - cephci - run:854 - ERROR - stat -c '%w' /var/lib/systemd/coredump/core.ganesha\x2enfsd.0.1b1df88562f14760aabcef6c94857406.654972.1745579350000000.zst
 returned stat: cannot statx '/var/lib/systemd/coredump/core.ganeshax2enfsd.0.1b1df88562f14760aabcef6c94857406.654972.1745579350000000.zst': No such file or directory

Actual results:
=======
NFS server crashed


Expected results:
======
Pynfs should pass and no crashes should be observed


Additional info:

Comment 25 errata-xmlrpc 2025-06-26 12:31:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775