Bug 2400121
| Summary: | [NFS-Ganesha][Active-Active HA] Post node reboot, I/O and basic commands on mount point remain stuck | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Manisha Saini <msaini> |
| Component: | Cephadm | Assignee: | Shweta Bhosale <shbhosal> |
| Status: | CLOSED ERRATA | QA Contact: | Manisha Saini <msaini> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.0 | CC: | cephqe-warriors, gouthamr, jcaratza, ngangadh, shbhosal, spunadik |
| Target Milestone: | --- | ||
| Target Release: | 9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-20.1.0-80 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2026-01-29 07:00:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2026:1536 |
Description of problem: ============= Tested with private build - quay.io/rh-ee-shbhosal/ceph:haproxy_chnages_for_nfs During the node down test scenario on an active active deployed cluster, after the rebooted node came back online, I/O operations remained in a hung state indefinitely. Additionally, basic commands such as ls, df, and cd on the mount point also became unresponsive. Note - This happens when the haproxy and NFS containers were running on same nodes. Version-Release number of selected component (if applicable): =============================== How reproducible: =============== 1/1 Steps to Reproduce: =================== 1. Deploy the NFS Ganesha cluster # ceph nfs cluster create nfsganesha '2 cali019 cali020 cali016' --ingress --virtual_ip 10.8.130.191/22 --ingress_mode haproxy-protocol # ceph nfs cluster info nfsganesha { "nfsganesha": { "backend": [ { "hostname": "cali016", "ip": "10.8.130.16", "port": 12049 }, { "hostname": "cali019", "ip": "10.8.130.19", "port": 12049 } ], "ingress_mode": "haproxy-protocol", "monitor_port": 9049, "port": 2049, "virtual_ip": "10.8.130.191" } } # ceph orch ps | grep nfs.nfs haproxy.nfs.nfsganesha.cali016.lcuiws cali016 *:2049,9049 running (58m) 5m ago 58m 40.7M - 2.4.22-f8e3218 4aa9f9e449aa fa299894f9d3 haproxy.nfs.nfsganesha.cali019.fwdjkt cali019 *:2049,9049 running (58m) 5m ago 58m 43.1M - 2.4.22-f8e3218 4aa9f9e449aa dee1cfd3771f keepalived.nfs.nfsganesha.cali016.dpymzm cali016 running (58m) 5m ago 58m 1555k - 2.2.8 38911a18f8ae 14a35d2a9124 keepalived.nfs.nfsganesha.cali019.kniuot cali019 running (58m) 5m ago 58m 1555k - 2.2.8 38911a18f8ae d41fba529246 nfs.nfsganesha.0.0.cali016.juebkv cali016 *:12049 running (58m) 5m ago 58m 111M - 6.5 3f878c026ee3 fbd843b02702 nfs.nfsganesha.1.0.cali019.giupuc cali019 *:12049 running (58m) 5m ago 58m 112M - 6.5 3f878c026ee3 444b68241efc VIP is assigned to cali016 [root@cali016 ~]# ip addr | grep eno12399 4: eno12399: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 inet 10.8.130.16/21 brd 10.8.135.255 scope global dynamic noprefixroute eno12399 inet 10.8.130.191/22 scope global eno12399 2. Create NFS export and mount it on 4 clients [ceph: root@cali013 /]# ceph fs subvolume getpath cephfs ganesha1 --group_name ganeshagroup /volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107 [ceph: root@cali013 /]# ceph nfs export create cephfs nfsganesha /ganesha1 cephfs --path=/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107 { "bind": "/ganesha1", "cluster": "nfsganesha", "fs": "cephfs", "mode": "RW", "path": "/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107" } 3. Run IO’s from 4 clients 4.Poweroff cali016 where VIP is assigned [ceph: root@cali013 /]# ceph orch ps | grep nfs.nfs haproxy.nfs.nfsganesha.cali016.lcuiws cali016 *:2049,9049 host is offline 6m ago 69m 40.8M - 2.4.22-f8e3218 4aa9f9e449aa fa299894f9d3 haproxy.nfs.nfsganesha.cali019.fwdjkt cali019 *:2049,9049 running (15s) 8s ago 69m 38.1M - 2.4.22-f8e3218 4aa9f9e449aa 9ba3ab59d678 haproxy.nfs.nfsganesha.cali020.ojbyet cali020 *:2049,9049 running (13s) 8s ago 23s 37.9M - 2.4.22-f8e3218 4aa9f9e449aa 3e107b279a27 keepalived.nfs.nfsganesha.cali016.dpymzm cali016 host is offline 6m ago 69m 1555k - 2.2.8 38911a18f8ae 14a35d2a9124 keepalived.nfs.nfsganesha.cali019.kniuot cali019 running (16s) 8s ago 69m 1547k - 2.2.8 38911a18f8ae 16e7ddcc3238 keepalived.nfs.nfsganesha.cali020.ukbzxw cali020 running (11s) 8s ago 21s 1551k - 2.2.8 38911a18f8ae 56b8365399f6 nfs.nfsganesha.0.0.cali016.juebkv cali016 *:12049 host is offline 6m ago 69m 148M - 6.5 3f878c026ee3 fbd843b02702 nfs.nfsganesha.0.1.cali020.fuqpfb cali020 *:12049 running (23s) 8s ago 23s 18.3M - 6.5 3f878c026ee3 377fbb1d5bee nfs.nfsganesha.1.0.cali019.giupuc cali019 *:12049 running (69m) 8s ago 69m 290M - 6.5 3f878c026ee3 444b68241efc Observation : A. VIP failover to Node cali019 B. Ganesha service starts on Node cali020 C. IO’s resume on client 5. Now bring back cali016 node up. --> IO's on the existing clients are getting hung when the node comes up. Even "df" operations are getting stuck. Actual results: =============== IO's should work as expected once the node is up Expected results: ================ IO's were getting hung forever when the rebooted node came up Additional info: ================