Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2400121

Summary:	[NFS-Ganesha][Active-Active HA] Post node reboot, I/O and basic commands on mount point remain stuck
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Manisha Saini <msaini>
Component:	Cephadm	Assignee:	Shweta Bhosale <shbhosal>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	9.0	CC:	cephqe-warriors, gouthamr, jcaratza, ngangadh, shbhosal, spunadik
Target Milestone:	---
Target Release:	9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-20.1.0-80	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2026-01-29 07:00:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Manisha Saini 2025-09-29 13:45:36 UTC

Description of problem:
=============
Tested with private build -  quay.io/rh-ee-shbhosal/ceph:haproxy_chnages_for_nfs

During the node down test scenario on an active active deployed cluster, after the rebooted node came back online, I/O operations remained in a hung state indefinitely. Additionally, basic commands such as ls, df, and cd on the mount point also became unresponsive.

Note - This happens when the haproxy and NFS containers were running on same nodes.


Version-Release number of selected component (if applicable):
===============================


How reproducible:
===============
1/1


Steps to Reproduce:
===================

1. Deploy the NFS Ganesha cluster
# ceph nfs cluster create nfsganesha '2 cali019 cali020 cali016' --ingress --virtual_ip 10.8.130.191/22 --ingress_mode haproxy-protocol

# ceph nfs cluster info nfsganesha
{
  "nfsganesha": {
    "backend": [
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      },
      {
        "hostname": "cali019",
        "ip": "10.8.130.19",
        "port": 12049
      }
    ],
    "ingress_mode": "haproxy-protocol",
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.191"
  }
}
# ceph orch ps | grep nfs.nfs
haproxy.nfs.nfsganesha.cali016.lcuiws     cali016  *:2049,9049       running (58m)     5m ago  58m    40.7M        -  2.4.22-f8e3218    4aa9f9e449aa  fa299894f9d3
haproxy.nfs.nfsganesha.cali019.fwdjkt     cali019  *:2049,9049       running (58m)     5m ago  58m    43.1M        -  2.4.22-f8e3218    4aa9f9e449aa  dee1cfd3771f
keepalived.nfs.nfsganesha.cali016.dpymzm  cali016                    running (58m)     5m ago  58m    1555k        -  2.2.8             38911a18f8ae  14a35d2a9124
keepalived.nfs.nfsganesha.cali019.kniuot  cali019                    running (58m)     5m ago  58m    1555k        -  2.2.8             38911a18f8ae  d41fba529246
nfs.nfsganesha.0.0.cali016.juebkv         cali016  *:12049           running (58m)     5m ago  58m     111M        -  6.5               3f878c026ee3  fbd843b02702
nfs.nfsganesha.1.0.cali019.giupuc         cali019  *:12049           running (58m)     5m ago  58m     112M        -  6.5               3f878c026ee3  444b68241efc


VIP is assigned to cali016

[root@cali016 ~]# ip addr | grep eno12399
4: eno12399: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.8.130.16/21 brd 10.8.135.255 scope global dynamic noprefixroute eno12399
    inet 10.8.130.191/22 scope global eno12399


2. Create NFS export and mount it on 4 clients

[ceph: root@cali013 /]# ceph fs subvolume getpath cephfs ganesha1 --group_name ganeshagroup
/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107
[ceph: root@cali013 /]# ceph nfs export create cephfs nfsganesha /ganesha1 cephfs --path=/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107
{
  "bind": "/ganesha1",
  "cluster": "nfsganesha",
  "fs": "cephfs",
  "mode": "RW",
  "path": "/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107"
}

3. Run IO’s from 4 clients

4.Poweroff cali016 where VIP is assigned


[ceph: root@cali013 /]# ceph orch ps | grep nfs.nfs
haproxy.nfs.nfsganesha.cali016.lcuiws     cali016  *:2049,9049       host is offline     6m ago  69m    40.8M        -  2.4.22-f8e3218    4aa9f9e449aa  fa299894f9d3
haproxy.nfs.nfsganesha.cali019.fwdjkt     cali019  *:2049,9049       running (15s)       8s ago  69m    38.1M        -  2.4.22-f8e3218    4aa9f9e449aa  9ba3ab59d678
haproxy.nfs.nfsganesha.cali020.ojbyet     cali020  *:2049,9049       running (13s)       8s ago  23s    37.9M        -  2.4.22-f8e3218    4aa9f9e449aa  3e107b279a27
keepalived.nfs.nfsganesha.cali016.dpymzm  cali016                    host is offline     6m ago  69m    1555k        -  2.2.8             38911a18f8ae  14a35d2a9124
keepalived.nfs.nfsganesha.cali019.kniuot  cali019                    running (16s)       8s ago  69m    1547k        -  2.2.8             38911a18f8ae  16e7ddcc3238
keepalived.nfs.nfsganesha.cali020.ukbzxw  cali020                    running (11s)       8s ago  21s    1551k        -  2.2.8             38911a18f8ae  56b8365399f6
nfs.nfsganesha.0.0.cali016.juebkv         cali016  *:12049           host is offline     6m ago  69m     148M        -  6.5               3f878c026ee3  fbd843b02702
nfs.nfsganesha.0.1.cali020.fuqpfb         cali020  *:12049           running (23s)       8s ago  23s    18.3M        -  6.5               3f878c026ee3  377fbb1d5bee
nfs.nfsganesha.1.0.cali019.giupuc         cali019  *:12049           running (69m)       8s ago  69m     290M        -  6.5               3f878c026ee3  444b68241efc


Observation : 
A. VIP failover to Node cali019
B. Ganesha service starts on Node cali020
C. IO’s resume on client

5. Now bring back cali016 node up. --> IO's on the existing clients are getting hung when the node comes up. Even "df" operations are getting stuck.


Actual results:
===============
IO's should work as expected once the node is up


Expected results:
================
IO's were getting hung forever when the rebooted node came up


Additional info:
================

Comment 22 errata-xmlrpc 2026-01-29 07:00:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536