Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2397780

Summary: [NFS-Ganesha] Unexpected restart of Ganesha services on active nodes during VIP failback after node reboot
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Manisha Saini <msaini>
Component: CephadmAssignee: Shweta Bhosale <shbhosal>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.0CC: cephqe-warriors
Target Milestone: ---   
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-20.1.0-41 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2026-01-29 06:59:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Manisha Saini 2025-09-24 11:01:53 UTC
Description of problem:
======================

During node reboot with VIP failover/failback in an Active-Active cluster, Ganesha services restart unexpectedly on non-rebooted nodes (cali019 and cali020).
The restart on cali020 is a known issue (BZ 2375725), but the restart on cali019 is unexpected and should not occur.

The tests were performed on the Test builds provided for early feature testing - https://bugzilla.redhat.com/show_bug.cgi?id=2388477 



Version-Release number of selected component (if applicable):
============
# ceph --version
ceph version 19.2.1-267.el9cp (54365d89c054519ec9203f55f8782b46b78e45aa) squid (stable)



How reproducible:
================
Everytime


Steps to Reproduce:
==================

1. Configure the nfs ganesha cluster

# ceph nfs cluster create nfsganesha '2 cali019 cali020 cali016' --ingress --virtual_ip 10.8.130.191/22 --ingress_mode haproxy-protocol

# ceph nfs cluster info nfsganesha
{
  "nfsganesha": {
    "backend": [
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      },
      {
        "hostname": "cali019",
        "ip": "10.8.130.19",
        "port": 12049
      }
    ],
    "ingress_mode": "haproxy-protocol",
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.191"
  }
}

# ceph orch ps | grep nfs.nfs
haproxy.nfs.nfsganesha.cali016.fccrgn     cali016  *:2049,9049       running (19s)    15s ago  29s    38.7M        -  2.4.22-f8e3218    4aa9f9e449aa  969d636433ca
haproxy.nfs.nfsganesha.cali019.ppmopf     cali019  *:2049,9049       running (18s)    15s ago  29s    37.8M        -  2.4.22-f8e3218    4aa9f9e449aa  dfee3bcc96ed
keepalived.nfs.nfsganesha.cali016.dfyqqs  cali016                    running (25s)    15s ago  25s    1551k        -  2.2.8             38911a18f8ae  b916a5ebda10
keepalived.nfs.nfsganesha.cali019.cveshz  cali019                    running (25s)    15s ago  25s    1551k        -  2.2.8             38911a18f8ae  5da097d0c2d8
nfs.nfsganesha.0.0.cali016.qbqwfx         cali016  *:12049           running (29s)    15s ago  29s    67.5M        -  6.5               29bb7a0386df  17420d8d8574
nfs.nfsganesha.1.0.cali019.jhcjcb         cali019  *:12049           running (29s)    15s ago  29s    69.0M        -  6.5               29bb7a0386df  a01121a7fd57

# ceph orch ls | grep nfs
ingress.nfs.nfsganesha     10.8.130.191:2049,9049      4/4  59s ago    75s  cali019;cali020;cali016;count:2
nfs.nfsganesha             ?:12049                     2/2  59s ago    75s  cali019;cali020;cali016;count:2

*********VIP is assigned to cali016

2. Create NFS export and mount it on 4 clients

# ceph fs subvolume getpath cephfs ganesha1 --group_name ganeshagroup
/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107

# ceph nfs export create cephfs nfsganesha /ganesha1 cephfs --path=/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107
{
  "bind": "/ganesha1",
  "cluster": "nfsganesha",
  "fs": "cephfs",
  "mode": "RW",
  "path": "/volumes/ganeshagroup/ganesha1/9d6705f0-8aa3-483f-bd4b-26aedafe3107"
}

3. Power off cali016 where VIP is assigned

# ceph orch ps | grep nfs.nfs
haproxy.nfs.nfsganesha.cali016.fccrgn     cali016  *:2049,9049       host is offline    12m ago  13m    39.7M        -  2.4.22-f8e3218    4aa9f9e449aa  969d636433ca
haproxy.nfs.nfsganesha.cali019.ppmopf     cali019  *:2049,9049       running (4m)        3m ago  13m    39.6M        -  2.4.22-f8e3218    4aa9f9e449aa  c14161c97dd1
haproxy.nfs.nfsganesha.cali020.qsvmjp     cali020  *:2049,9049       running (4m)       70s ago   4m    40.2M        -  2.4.22-f8e3218    4aa9f9e449aa  ab0838304a55
keepalived.nfs.nfsganesha.cali016.dfyqqs  cali016                    host is offline    12m ago  13m    1551k        -  2.2.8             38911a18f8ae  b916a5ebda10
keepalived.nfs.nfsganesha.cali019.cveshz  cali019                    running (4m)        3m ago  13m    1551k        -  2.2.8             38911a18f8ae  159384dfd548
keepalived.nfs.nfsganesha.cali020.fngcor  cali020                    running (4m)       70s ago   4m    1551k        -  2.2.8             38911a18f8ae  67d7eefd3cee
nfs.nfsganesha.0.0.cali016.qbqwfx         cali016  *:12049           host is offline    12m ago  13m     110M        -  6.5               29bb7a0386df  17420d8d8574
nfs.nfsganesha.0.1.cali020.ekrkgz         cali020  *:12049           running (4m)       70s ago   4m     182M        -  6.5               29bb7a0386df  0fe3b83dd0cc
nfs.nfsganesha.1.0.cali019.jhcjcb         cali019  *:12049           running (13m)       3m ago  13m     229M        -  6.5               29bb7a0386df  a01121a7fd57


Observation : 
-------------
1. VIP failover to Node cali019 --> Expected
2. Ganesha service starts on Node cali020 --> Expected
3. IO’s resume on client
4. Now bring back cali016 node up
5. Only VIP failback to cali016 
6. Ganesha service gets restarted on both cali019 and cali020
7. Ongoing IO’s failed with “Remote I/O error” on 1 out of 2 clients. Later if I again restart linux untars, it works..

tar: linux-6.4/arch/arm64/boot/dts: Cannot stat: Remote I/O error
tar: linux-6.4/arch/arm64/boot: Cannot stat: Remote I/O error
tar: linux-6.4/arch/arm64: Cannot stat: Remote I/O error
tar: linux-6.4/arch: Cannot stat: Remote I/O error
tar: linux-6.4: Cannot stat: Remote I/O error
tar: Error is not recoverable: exiting now


Actual results:
===============
Ganesha service gets restarted on active nodes when the node is brought back online


Expected results:
=================
Ganesha service should not restart on active nodes


Additional info:

Comment 6 errata-xmlrpc 2026-01-29 06:59:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536