Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2372349

Summary:	[8.1z backport] Parallel NFS daemon deploy processing does synchronous deletes during reschedules
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Omid Yoosefi <omidyoosefi>
Component:	Cephadm	Assignee:	Adam King <adking>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	medium	Docs Contact:	Rivka Pollack <rpollack>
Priority:	unspecified
Version:	8.0	CC:	bkunal, cephqe-warriors, mobisht, msaini, radhika.chirra, rpollack, tserlin
Target Milestone:	---	Flags:	mobisht: needinfo+
Target Release:	8.1z1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-19.2.1-223	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2372438 (view as bug list)		Environment:
Last Closed:	2025-08-18 14:01:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2372438

Description Omid Yoosefi 2025-06-11 20:17:57 UTC

Description of problem:
This issue is particularly around a 'reschedule' operation where the criteria for scheduling a daemon changes and the NFS daemons have to be moved off a node and recreated in another. 

What we're observing is that instead of the parallel deletion processing the daemons, we end up processing the daemons one by one.

This behavior is tracked to below code, where we remove and fence keyrings prior to the parallel daemon delete processing in apply_all_services.

https://github.com/ceph/ceph/blob/826cc55745323d68c78ac9587ddfeb47aad009c1/src/pybind/mgr/cephadm/serve.py#L1010

Version-Release number of selected component (if applicable): 8.0


How reproducible: 100%


Steps to Reproduce:
1. Use multi-label placement deploy some NFS daemons
2. Remove the placement label from one of the hosts
3. Watch active mgr logs to observe the synchronous behavior of the daemon removals

Actual results:
NFS daemons are removed synchronously/one-by-one from the node.

Expected results:
All daemons are removed in parallel from the node.

Additional info:

This is our host setup
[ceph: root@dal1-qz2-sr2-rk044-s28 /]# ceph orch host ls
HOST   ADDR          LABELS         STATUS
node1  10.22.19.127  _admin,nfs
node2  10.22.19.128  nfs
node3  10.22.65.7    nfs
node4  10.22.65.8    nfs
node5  10.22.67.31   nfs
node6  10.22.67.32   nfs

In this case we'd run `ceph orch host label rm node1 nfs` to trigger this behavior.

Comment 8 errata-xmlrpc 2025-08-18 14:01:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.1 security and bug fix updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:14015