Bug 2346115

Summary: ceph-nfs fails to start with Ceph Ingress and proxy-protocol
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Francesco Pantano <fpantano>
Component: NFS-GaneshaAssignee: Sachin Punadikar <spunadik>
Status: CLOSED DUPLICATE QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: ashrodri, cephqe-warriors, ffilz, gouthamr, johfulto, kkeithle, ltoscano, mobisht, spunadik
Target Milestone: ---   
Target Release: 8.0z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-02-27 16:49:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Francesco Pantano 2025-02-17 15:01:47 UTC
Created attachment 2076847 [details]
Ganesha unit journal logs

Description of problem:

During the RHCS 8 validation, the ceph-nfs cluster is unable to start with the proxy-protocol option introduced to support Manila with a CephIngress daemon.
In particular, the following stacktrace is found:

''''
Feb 17 04:07:21 ceph-e36ghuwn-0 ceph-3b6a1b36-5e55-5bfd-80a8-5d723433981e-nfs-cephfs-2-0-ceph-e36ghuwn-0-qppsbw[403604]: 17/02/2025 09:07:21 : epoch 67b2fc2f : ceph-e36ghuwn-0 : ganesha.nfsd-2[svc_6] rpc :TIRPC :EVENT :handle_haproxy_header: 0x7f41540095a0 fd 26 proxy header rest len failed header rlen = % (will set dead)
Feb 17 04:07:22 ceph-e36ghuwn-0 ceph-3b6a1b36-5e55-5bfd-80a8-5d723433981e-nfs-cephfs-2-0-ceph-e36ghuwn-0-qppsbw[403604]: 17/02/2025 09:07:22 : epoch 67b2fc2f : ceph-e36ghuwn-0 : ganesha.nfsd-2[svc_11] rpc :TIRPC :EVENT :handle_haproxy_header: 0x7f41540095a0 fd 26 proxy header rest len failed header rlen = % (will set dead)
Feb 17 04:07:23 ceph-e36ghuwn-0 ceph-3b6a1b36-5e55-5bfd-80a8-5d723433981e-nfs-cephfs-2-0-ceph-e36ghuwn-0-qppsbw[403604]: 17/02/2025 09:07:23 : epoch 67b2fc2f : ceph-e36ghuwn-0 : ganesha.nfsd-2[svc_2] rpc :TIRPC :EVENT :handle_haproxy_header: 0x7f4148002760 fd 26 proxy header rest len failed header rlen = % (will set dead)
Feb 17 04:07:23 ceph-e36ghuwn-0 ceph-3b6a1b36-5e55-5bfd-80a8-5d723433981e-nfs-cephfs-2-0-ceph-e36ghuwn-0-qppsbw[403604]: 17/02/2025 09:07:23 : epoch 67b2fc2f : ceph-e36ghuwn-0 : ganesha.nfsd-2[svc_6] rpc :TIRPC :EVENT :handle_haproxy_header: 0x7f414c0018f0 fd 26 proxy header rest len failed header rlen = % (will set dead)
Feb 17 04:07:24 ceph-e36ghuwn-0 ceph-3b6a1b36-5e55-5bfd-80a8-5d723433981e-nfs-cephfs-2-0-ceph-e36ghuwn-0-qppsbw[403604]: 17/02/2025 09:07:24 : epoch 67b2fc2f : ceph-e36ghuwn-0 : ganesha.nfsd-2[svc_4] rpc :TIRPC :EVENT :handle_haproxy_header: 0x7f4134003d10 fd 26 proxy ignored for local
Feb 17 04:07:25 ceph-e36ghuwn-0 systemd-coredump[403711]: [🡕] Process 403608 (ganesha.nfsd) of user 0 dumped core.
 
                                                          Stack trace of thread 38:
                                                          #0  0x00007f419e488536 n/a (/usr/lib64/libntirpc.so.6.0.1 + 0x22536)
                                                          #1  0x0000000000000000 n/a (n/a + 0x0)
                                                          #2  0x00007f419e492c90 n/a (/usr/lib64/libntirpc.so.6.0.1 + 0x2cc90)
                                                          ELF object binary architecture: AMD x86-64
Subject: Process 403608 (ganesha.nfsd) dumped core
Defined-By: systemd
Support: https://access.redhat.com/support
Documentation: man:core(5)
''''

Version-Release number of selected component (if applicable):

RHCS 8 container with the following ganesha packages:

nfs-ganesha-selinux-6.0-8.1.el9cp.noarch
nfs-ganesha-6.0-8.1.el9cp.x86_64
nfs-ganesha-rgw-6.0-8.1.el9cp.x86_64
nfs-ganesha-ceph-6.0-8.1.el9cp.x86_64
nfs-ganesha-rados-grace-6.0-8.1.el9cp.x86_64
nfs-ganesha-rados-urls-6.0-8.1.el9cp.x86_64
nfs-ganesha-utils-6.0-8.1.el9cp.x86_64


How reproducible:

```
ceph nfs cluster create cephfs '--placement=ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2' --ingress --virtual-ip=192.168.122.2 --ingress-mode=haproxy-protocol
```


```
$ ceph orch ls

NAME                     PORTS                    RUNNING  REFRESHED  AGE  PLACEMENT
crash                                                 3/3  5m ago     2d   *
ingress.nfs.cephfs       192.168.122.2:2049,9049      6/6  5m ago     5h   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
ingress.rgw.default      192.168.122.2:8080,8999      2/2  5m ago     2d   count:1
mds.cephfs                                            3/3  5m ago     2d   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
mgr                                                   3/3  5m ago     2d   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
mon                                                   3/3  5m ago     2d   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
nfs.cephfs               ?:12049                      0/3  5m ago     5h   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
osd.default_drive_group                                 9  5m ago     2d   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
rgw.rgw                  ?:8082                       3/3  5m ago     2d   ceph-e36ghuwn-0;ceph-e36ghuwn-1;ceph-e36ghuwn-2
```

```
[ceph: root@ceph-e36ghuwn-0 /]# ceph orch ps | grep -i nfs
haproxy.nfs.cephfs.ceph-e36ghuwn-0.tijlzs      ceph-e36ghuwn-0  *:2049,9049           running (5h)     6m ago   2d    10.6M        -  2.4.22-f8e3218   73c7c53e4888  8119448a5c19
haproxy.nfs.cephfs.ceph-e36ghuwn-1.ttayqn      ceph-e36ghuwn-1  *:2049,9049           running (5h)     6m ago   2d    13.6M        -  2.4.22-f8e3218   73c7c53e4888  7ca849babbd3
haproxy.nfs.cephfs.ceph-e36ghuwn-2.befmsu      ceph-e36ghuwn-2  *:2049,9049           running (5h)     6m ago   2d    9768k        -  2.4.22-f8e3218   73c7c53e4888  097638a4c20e
keepalived.nfs.cephfs.ceph-e36ghuwn-0.gzlbkw   ceph-e36ghuwn-0                        running (2d)     6m ago   2d    1644k        -  2.2.8            c63687d7cfa0  39ccb21f414a
keepalived.nfs.cephfs.ceph-e36ghuwn-1.hwnynw   ceph-e36ghuwn-1                        running (2d)     6m ago   2d    1640k        -  2.2.8            c63687d7cfa0  55c0353ffb10
keepalived.nfs.cephfs.ceph-e36ghuwn-2.dozadz   ceph-e36ghuwn-2                        running (2d)     6m ago   2d    1640k        -  2.2.8            c63687d7cfa0  f643614c1380
nfs.cephfs.0.0.ceph-e36ghuwn-1.awvbjw          ceph-e36ghuwn-1  *:12049               error            6m ago   5h        -        -  <unknown>        <unknown>     <unknown>
nfs.cephfs.1.0.ceph-e36ghuwn-2.gohgiy          ceph-e36ghuwn-2  *:12049               error            6m ago   5h        -        -  <unknown>        <unknown>     <unknown>
nfs.cephfs.2.0.ceph-e36ghuwn-0.qppsbw          ceph-e36ghuwn-0  *:12049               error            6m ago   5h        -        -  <unknown>        <unknown>     <unknown>
```

Comment 1 Storage PM bot 2025-02-17 15:01:57 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 3 Goutham Pacha Ravi 2025-02-17 16:52:59 UTC
cc @ffilz : 

hi Frank, this looks like it was reported upstream as https://github.com/nfs-ganesha/ntirpc/pull/322 - could it be the same issue?