Bug 2157581 - [RFE] UX : Handle orch host drain : Improvise handling drain of services with defined host list
Summary: [RFE] UX : Handle orch host drain : Improvise handling drain of services with...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 6.2
Assignee: Adam King
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-02 06:32 UTC by Vasishta
Modified: 2023-07-06 17:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-5874 0 None None None 2023-01-02 06:45:21 UTC

Description Vasishta 2023-01-02 06:32:26 UTC
Description of problem:
As part of upgrade rocedure of host OS, Tried host drain on a host with services which were specified to be on particular hosts.

ceph health warned saying
>>    health: HEALTH_WARN
>>            Failed to apply 2 service(s): mds.cephfs,mon

service specs of above services had defined list of hosts.

ceph health detail mentioned

>> # ceph health detail
>> HEALTH_WARN Failed to apply 2 service(s): mds.cephfs,mon
>> [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 2 service(s): mds.cephfs,mon
>>    mds.cephfs: Cannot place <MDSSpec for service_name=mds.cephfs> on e22-h18-b01-fc640.rdu2.scalelab.redhat.com: Unknown hosts
>>    mon: Cannot place <ServiceSpec for service_name=mon> on e22-h18-b01-fc640.rdu2.scalelab.redhat.com: Unknown hosts

This tracker is to enhance user experience here by either -
-> Display error message conveying that daemons on this node are with services with fixed host names, service specs needs to be updated to be able to safely drain the host.
OR
-> Improve WARN message in ceph health to convey that the service deply failed as host now has got the keyword _no_schedule 
OR
-> Providing an option for users to mention backup list of hosts in service spec to fail-over to in case of maintenance operations like these.

Version-Release number of selected component (if applicable):
16.2.10-87.el8cp

How reproducible:
Tried once

Steps to Reproduce:
1. Configure a cluster with services with host list mentioned in service specification.
2. Drain the host from another admin node.

Actual results:
Host drain ends up in failure and not-so-helpful warn message when host drain is tried on host with services with defined host list.

Expected results:
User to be helped to know the risks/ Notified about the issue in easily understandable error message.


Note You need to log in before you can comment on or make changes to this bug.