Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2405397

Summary: cephadm crashes and doesn't recover with ganesha-rados-grace tool failed: Failure: -126
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Omid Yoosefi <omidyoosefi>
Component: CephadmAssignee: Shweta Bhosale <shbhosal>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.1CC: adking, akane, bkunal, cephqe-warriors, hacharya, shbhosal, spunadik, tserlin, vdas
Target Milestone: ---   
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-20.1.0-126 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2421434 (view as bug list) Environment:
Last Closed: 2026-01-29 07:02:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2421434    

Description Omid Yoosefi 2025-10-21 17:46:41 UTC
Description of problem:

When provisioning and deprovisioning lots of NFS daemon with the concurrency changes, NFS gracetool hits an exception and crashes the cephadm module. After the fact, orchestrator will not work until mgr daemon is restarted or failed over.


Version-Release number of selected component (if applicable): 19.2.1-245.0.hotfix.BYOK.el9cp


How reproducible: 50%


Steps to Reproduce:
1. Provision a lot of NFS daemons using 1 single spec apply
2. Delete all the daemons
3. Watch mgr logs or ceph -s for health_err


Actual results:

cephadm orchestrator stops working and needs a mgr restart/failover to continue.


Expected results:
cephadm orchestrator handles the error and retries without user intervention


Additional info:

```
2025-10-21T01:43:57.547+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.7.dal1-qz2-sr5-rk025-s28.dgynzp
2025-10-21T01:43:57.547+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.7.dal1-qz2-sr5-rk025-s28.dgynzp
2025-10-21T01:43:57.547+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.7.dal1-qz2-sr5-rk025-s28.dgynzp
2025-10-21T01:43:57.547+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.7.dal1-qz2-sr5-rk025-s28.dgynzp
2025-10-21T01:43:57.585+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.8.dal3-qz2-sr3-rk279-s28.qasdbj
2025-10-21T01:43:57.585+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.8.dal3-qz2-sr3-rk279-s28.qasdbj
2025-10-21T01:43:57.585+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.8.dal3-qz2-sr3-rk279-s28.qasdbj
2025-10-21T01:43:57.585+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.8.dal3-qz2-sr3-rk279-s28.qasdbj
2025-10-21T01:43:57.620+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.9.dal4-qz2-sr1-rk114-s48.isrjtf
2025-10-21T01:43:57.620+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.9.dal4-qz2-sr1-rk114-s48.isrjtf
2025-10-21T01:43:57.620+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.9.dal4-qz2-sr1-rk114-s48.isrjtf
2025-10-21T01:43:57.620+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.2.9.dal4-qz2-sr1-rk114-s48.isrjtf
2025-10-21T01:43:58.138+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.4.0.dal2-qz2-sr2-rk089-s28.maofjs
2025-10-21T01:43:58.138+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Fencing old nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.4.0.dal2-qz2-sr2-rk089-s28.maofjs
2025-10-21T01:43:58.138+0000 7f77c7153640  0 [cephadm INFO cephadm.services.nfs] Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.4.0.dal2-qz2-sr2-rk089-s28.maofjs
2025-10-21T01:43:58.138+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Removing key for client.nfs.r134-4bd98973-2380-4530-aa1e-6c838295028b.4.0.dal2-qz2-sr2-rk089-s28.maofjs
2025-10-21T01:43:58.144+0000 7f77c7153640  0 [cephadm INFO root] Removing 4 from the ganesha grace table
2025-10-21T01:43:58.144+0000 7f77c7153640  0 log_channel(cephadm) log [INF] : Removing 4 from the ganesha grace table
2025-10-21T01:43:58.150+0000 7f77d582b640  0 log_channel(cluster) log [DBG] : pgmap v38: 6721 pgs: 6721 active+clean; 2.3 TiB data, 9.4 TiB used, 177 TiB / 186 TiB avail; 16 KiB/s rd, 0 B/s wr, 19 op/s
2025-10-21T01:43:58.268+0000 7f77c7153640  0 [cephadm WARNING root] ganesha-rados-grace tool failed: Failure: -126

2025-10-21T01:43:58.269+0000 7f77c7153640  0 log_channel(cephadm) log [WRN] : ganesha-rados-grace tool failed: Failure: -126

2025-10-21T01:43:58.305+0000 7f77c7153640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'cephadm' while running on mgr.dal1-qz2-sr5-rk025-s38.vyhojz: grace tool failed: Failure: -126
2025-10-21T01:43:58.305+0000 7f77c7153640 -1 cephadm.serve:
2025-10-21T01:43:58.305+0000 7f77c7153640 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 865, in serve
    serve.serve()
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 120, in serve
    if self._apply_all_services():
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 764, in _apply_all_services
    svc.fence_old_ranks(spec, ranking_map, len(daemons))
  File "/usr/share/ceph/mgr/cephadm/services/nfs.py", line 315, in fence_old_ranks
    self.run_grace_tool(cast(NFSServiceSpec, spec), 'remove', nodeid)
  File "/usr/share/ceph/mgr/cephadm/services/nfs.py", line 874, in run_grace_tool
    raise RuntimeError(f'grace tool failed: {result.stderr.decode("utf-8")}')
RuntimeError: grace tool failed: Failure: -126
```

Comment 21 errata-xmlrpc 2026-01-29 07:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536

Comment 22 Red Hat Bugzilla 2026-02-06 04:26:50 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days or the product is inactive and locked