Bug 2157581

Summary: [RFE] UX : Handle orch host drain : Improvise handling drain of services with defined host list
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: CephadmAssignee: Adam King <adking>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.3CC: cephqe-warriors, saraut, vereddy
Target Milestone: ---Keywords: FutureFeature
Target Release: 6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vasishta 2023-01-02 06:32:26 UTC
Description of problem:
As part of upgrade rocedure of host OS, Tried host drain on a host with services which were specified to be on particular hosts.

ceph health warned saying
>>    health: HEALTH_WARN
>>            Failed to apply 2 service(s): mds.cephfs,mon

service specs of above services had defined list of hosts.

ceph health detail mentioned

>> # ceph health detail
>> HEALTH_WARN Failed to apply 2 service(s): mds.cephfs,mon
>> [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 2 service(s): mds.cephfs,mon
>>    mds.cephfs: Cannot place <MDSSpec for service_name=mds.cephfs> on e22-h18-b01-fc640.rdu2.scalelab.redhat.com: Unknown hosts
>>    mon: Cannot place <ServiceSpec for service_name=mon> on e22-h18-b01-fc640.rdu2.scalelab.redhat.com: Unknown hosts

This tracker is to enhance user experience here by either -
-> Display error message conveying that daemons on this node are with services with fixed host names, service specs needs to be updated to be able to safely drain the host.
OR
-> Improve WARN message in ceph health to convey that the service deply failed as host now has got the keyword _no_schedule 
OR
-> Providing an option for users to mention backup list of hosts in service spec to fail-over to in case of maintenance operations like these.

Version-Release number of selected component (if applicable):
16.2.10-87.el8cp

How reproducible:
Tried once

Steps to Reproduce:
1. Configure a cluster with services with host list mentioned in service specification.
2. Drain the host from another admin node.

Actual results:
Host drain ends up in failure and not-so-helpful warn message when host drain is tried on host with services with defined host list.

Expected results:
User to be helped to know the risks/ Notified about the issue in easily understandable error message.