Description of problem: =================== Ganesha crashed and dumped core file while performing failover operations when the ingress mode is set to "haproxy-protocol" on Stretched cluster deployed with NFS. IO's failed with Input/Output error on clients [root@cali024 ~]# ls -lart /var/lib/systemd/coredump total 21720 drwxr-xr-x. 7 root root 4096 Sep 10 12:05 .. -rw-r-----. 1 root root 4299741 Jan 15 19:21 'core.ganesha\x2enfsd.0.1b34fcef4c464a569a178fc8e49c5980.113640.1736968862000000.zst' -rw-r-----. 1 root root 2582534 Jan 15 19:21 'core.ganesha\x2enfsd.0.1b34fcef4c464a569a178fc8e49c5980.127457.1736968878000000.zst' -rw-r-----. 1 root root 2575077 Jan 15 19:21 'core.ganesha\x2enfsd.0.1b34fcef4c464a569a178fc8e49c5980.127721.1736968894000000.zst' -rw-r-----. 1 root root 4290951 Jan 15 19:28 'core.ganesha\x2enfsd.0.1b34fcef4c464a569a178fc8e49c5980.127975.1736969305000000.zst' -rw-r-----. 1 root root 4197354 Jan 15 19:32 'core.ganesha\x2enfsd.0.1b34fcef4c464a569a178fc8e49c5980.129280.1736969557000000.zst' -rw-r-----. 1 root root 4251991 Jan 15 19:34 'core.ganesha\x2enfsd.0.1b34fcef4c464a569a178fc8e49c5980.130664.1736969675000000.zst' drwxr-xr-x. 2 root root 4096 Jan 15 19:34 . ganesha.log -------- 2[svc_16] nfs_rpc_process_request :DISP :WARN :HAProxy connection from ::ffff:10.8.130.31:42422 rejected Jan 15 19:34:34 cali024 ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842-nfs-nfsganesha-0-1-cali024-ldwpmh[130659]: 15/01/2025 19:34:34 : epoch 67880d60 : cali024 : ganesha.nfsd-2[svc_16] nfs_rpc_process_request :DISP :WARN :HAProxy connection from ::ffff:10.8.130.31:51286 rejected Jan 15 19:34:34 cali024 ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842-nfs-nfsganesha-0-1-cali024-ldwpmh[130659]: 15/01/2025 19:34:34 : epoch 67880d60 : cali024 : ganesha.nfsd-2[svc_16] nfs_rpc_process_request :DISP :WARN :HAProxy connection from ::ffff:10.8.130.31:51298 rejected Jan 15 19:34:34 cali024 ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842-nfs-nfsganesha-0-1-cali024-ldwpmh[130659]: 15/01/2025 19:34:34 : epoch 67880d60 : cali024 : ganesha.nfsd-2[svc_16] rpc :TIRPC :EVENT :handle_haproxy_header: 0x7f6538002ef0 fd 38 proxy header rest len failed header rlen = % (will set dead) Jan 15 19:34:36 cali024 systemd-coredump[130930]: Process 130664 (ganesha.nfsd) of user 0 dumped core. Stack trace of thread 57: #0 0x00007f6599de8536 n/a (/usr/lib64/libntirpc.so.6.0.1 + 0x22536) #1 0x0000000000000000 n/a (n/a + 0x0) #2 0x00007f6599df2c90 n/a (/usr/lib64/libntirpc.so.6.0.1 + 0x2cc90) ELF object binary architecture: AMD x86-64 Jan 15 19:34:36 cali024 podman[130992]: 2025-01-15 19:34:36.219059684 +0000 UTC m=+0.021312121 container died 3b3476705cdfbbb41fef3375d4d2b49adb2993e7c9b0e94581374576e6d31de1 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:11b3c0122e92da541a439a2628837c08de5693a964484c3c3c864da54e040bce, name=ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842-nfs-nfsganesha-0-1-cali024-ldwpmh, vendor=Red Hat, Inc., io.k8s.description=Red Hat Ceph Storage 8, io.openshift.tags=rhceph ceph, architecture=x86_64, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, io.buildah.version=1.33.8, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-215, maintainer=Guillaume Abrioux <gabrioux>, vcs-type=git, GIT_REPO=https://github.com/ceph/ceph-container.git, name=rhceph, description=Red Hat Ceph Storage 8, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, vcs-ref=6bc17c430374b15a8dee08107281b6a4fa5b8ce9, version=8, release=215, GIT_CLEAN=True, com.redhat.component=rhceph-container, build-date=2025-01-09T21:26:54, CEPH_POINT_RELEASE=, distribution-scope=public, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., GIT_BRANCH=main, io.openshift.expose-services=, ceph=True, com.redhat.license_terms=https://www.redhat.com/agreements, RELEASE=main) Jan 15 19:34:36 cali024 podman[130992]: 2025-01-15 19:34:36.228607841 +0000 UTC m=+0.030860270 container remove 3b3476705cdfbbb41fef3375d4d2b49adb2993e7c9b0e94581374576e6d31de1 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:11b3c0122e92da541a439a2628837c08de5693a964484c3c3c864da54e040bce, name=ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842-nfs-nfsganesha-0-1-cali024-ldwpmh, GIT_BRANCH=main, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-215, GIT_CLEAN=True, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, name=rhceph, GIT_REPO=https://github.com/ceph/ceph-container.git, version=8, vcs-type=git, io.openshift.tags=rhceph ceph, vcs-ref=6bc17c430374b15a8dee08107281b6a4fa5b8ce9, architecture=x86_64, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, io.k8s.description=Red Hat Ceph Storage 8, build-date=2025-01-09T21:26:54, release=215, io.buildah.version=1.33.8, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., com.redhat.component=rhceph-container, io.openshift.expose-services=, RELEASE=main, description=Red Hat Ceph Storage 8, CEPH_POINT_RELEASE=, distribution-scope=public, ceph=True, com.redhat.license_terms=https://www.redhat.com/agreements, vendor=Red Hat, Inc., maintainer=Guillaume Abrioux <gabrioux>) Jan 15 19:34:36 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Main process exited, code=exited, status=139/n/a Jan 15 19:34:36 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Failed with result 'exit-code'. Jan 15 19:34:36 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Consumed 1.511s CPU time. Jan 15 19:34:46 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Scheduled restart job, restart counter is at 6. Jan 15 19:34:46 cali024 systemd[1]: Stopped Ceph nfs.nfsganesha.0.1.cali024.ldwpmh for 2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842. Jan 15 19:34:46 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Consumed 1.511s CPU time. Jan 15 19:34:46 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Start request repeated too quickly. Jan 15 19:34:46 cali024 systemd[1]: ceph-2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842.0.1.cali024.ldwpmh.service: Failed with result 'exit-code'. Jan 15 19:34:46 cali024 systemd[1]: Failed to start Ceph nfs.nfsganesha.0.1.cali024.ldwpmh for 2edf8cd4-d28b-11ef-b7c0-ac1f6b0a1842. [ceph: root@argo022 /]# ceph orch ps | grep nfs haproxy.nfs.nfsganesha.cali021.yjbcwr cali021 *:2049,9049 running (12h) 6m ago 12h 106M - 2.4.22-f8e3218 dcd07853693c 4632c83b7d71 keepalived.nfs.nfsganesha.cali021.cvvyox cali021 running (12h) 6m ago 12h 1640k - 2.2.8 8d8e2a795e75 c3b2c6263bdc nfs.nfsganesha.0.1.cali024.ldwpmh cali024 *:12049 error 6m ago 12h - - <unknown> <unknown> <unknown> [ceph: root@argo022 /]# IO's failed on client ========== tar: linux-6.4/Documentation/devicetree: Cannot change ownership to uid 0, gid 0: Input/output error tar: linux-6.4/Documentation/devicetree: Cannot change mode to rwxrwxr-x: Input/output error tar: linux-6.4/Documentation: Cannot utime: Input/output error tar: linux-6.4/Documentation: Cannot change ownership to uid 0, gid 0: Input/output error tar: linux-6.4/Documentation: Cannot change mode to rwxrwxr-x: Input/output error tar: linux-6.4: Cannot utime: Input/output error tar: linux-6.4: Cannot change ownership to uid 0, gid 0: Input/output error tar: linux-6.4: Cannot change mode to rwxrwxr-x: Input/output error tar: Error is not recoverable: exiting now Version-Release number of selected component (if applicable): ================= # ceph --version ceph version 19.2.0-58.el9cp (d4780295dda9ca9810286810bd5254843dbd55e5) squid (stable) # rpm -qa | grep nfs libnfsidmap-2.5.4-27.el9.x86_64 nfs-utils-2.5.4-27.el9.x86_64 nfs-ganesha-selinux-6.0-8.el9cp.noarch nfs-ganesha-6.0-8.el9cp.x86_64 nfs-ganesha-rgw-6.0-8.el9cp.x86_64 nfs-ganesha-ceph-6.0-8.el9cp.x86_64 nfs-ganesha-rados-grace-6.0-8.el9cp.x86_64 nfs-ganesha-rados-urls-6.0-8.el9cp.x86_64 How reproducible: ============ 1/1 Steps to Reproduce: ================= 1. Deployed 2 site stretched cluster (DC1 and DC2) with 1 tie breaker mon in DC3. 2. NFS cluster is deployed with HA (cali021 is present in DC1 and cali024 is present in DC2) [ceph: root@argo022 /]# ceph nfs cluster create nfsganesha "1 cali021 cali024" --ingress-mode haproxy-protocol --ingress --virtual-ip 10.8.130.31/22 [ceph: root@argo022 /]# ceph nfs cluster info nfsganesha { "nfsganesha": { "backend": [ { "hostname": "cali021", "ip": "10.8.130.21", "port": 12049 } ], "monitor_port": 9049, "port": 2049, "virtual_ip": "10.8.130.31" } } [ceph: root@argo022 /]# ceph orch ps | grep nfs haproxy.nfs.nfsganesha.cali021.yjbcwr cali021 *:2049,9049 running (13s) 6s ago 13s 37.5M - 2.4.22-f8e3218 dcd07853693c 78e01f109e66 keepalived.nfs.nfsganesha.cali021.cvvyox cali021 running (12s) 6s ago 12s 1631k - 2.2.8 8d8e2a795e75 e42f68b4c225 nfs.nfsganesha.0.0.cali021.aolabd cali021 *:12049 running (15s) 6s ago 15s 60.5M - 6.0 51e7488dd867 3d6f154c39f6 3. Created 1 subvolume group, 2 subvolume and 2 NFS exports [ceph: root@argo022 /]# ceph nfs export create cephfs nfsganesha /ganesha2 cephfs --path=/volumes/ganeshagroup/vol2/6c075800-d6e1-4bf6-8236-82e11c8f8928 { "bind": "/ganesha2", "cluster": "nfsganesha", "fs": "cephfs", "mode": "RW", "path": "/volumes/ganeshagroup/vol2/6c075800-d6e1-4bf6-8236-82e11c8f8928" } [ceph: root@argo022 /]# ceph nfs export create cephfs nfsganesha /ganesha1 cephfs --path=/volumes/ganeshagroup/vol1/6b2b0883-d1a3-42d2-af7b-638961c3a8ae { "bind": "/ganesha1", "cluster": "nfsganesha", "fs": "cephfs", "mode": "RW", "path": "/volumes/ganeshagroup/vol1/6b2b0883-d1a3-42d2-af7b-638961c3a8ae" } 4. Mount the 2 exports on 2 clients and run linux untars on NFS v4.2 mount in parallel 5. Perform failover when IO's are running on client. (Rebooted node - cali021) Actual results: ============== Ganesha crashed on cali024 and IO'S failed with input/output error Expected results: ================ Ganesha should not crash Additional info: ================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775