Today is is not possible to decommission specific nodes with OSPd. We need that capability to correctly scale down storage when required. This capability is required beyond storage, but this bug is specifically addressing the Ceph DFG needs.
The end goal for the Ceph DFG would be to delete a storage node without disruptions.
There are two scenarios we'd need to cover:
a) the storage node went down and can't be recovered
b) the storage node is purposely deleted
It seems to me that for both these scenarios we could delete the node from the stack using a command like the one we document for the compute nodes [ref1], is this correct?
Before deleting the node from the stack though, the user needs to execute some manual steps to cleanup (scenario A) or quiesce (scenario B) the pre-exising storage node; similarily to what happens is documented for the compute nodes [ref2].
To fully automate the process we'll need to be able to:
1) trigger a command execution on DELETE before the resource is actually deleted
2) to deal specifically with the scenario A (where the node goes down without notice), we need to execute commands on a node different from the one which is targeted for deletion
On scale down the node to remove can be passed to "overcloud node delete" command with:
$ openstack overcloud node delete $nova_node_id