Bug 1318389 - [RFE] Tool for putting node into maintenance mode
Summary: [RFE] Tool for putting node into maintenance mode
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Unclassified
Version: 1.3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 3.*
Assignee: ceph-eng-bugs
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-16 17:56 UTC by arkady kanevsky
Modified: 2019-01-30 14:59 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-30 14:59:18 UTC
Embargoed:


Attachments (Terms of Use)

Description arkady kanevsky 2016-03-16 17:56:42 UTC
Description of problem:
Currently Customers who wants to put a node into maintenance mode need to follow set of instructions in chapter 17 of https://access.redhat.com/documentation/en/red-hat-ceph-storage/1.3/red-hat-ceph-administration-guide/part-v-adding-and-removing-osd-nodes.
Since this is a common procedure for node replacement and FW upgrade having a tool that help with it will be beneficial.

Version-Release number of selected component (if applicable):
1.3

How reproducible:
N/A

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 arkady kanevsky 2016-06-06 12:53:02 UTC
I do not have access to Ceph 2.0 documentation.
Assuming little changes for putting node into maintenance mode for Ceph 2.x the ask is for a new command of type 
          ceph fw-update --"url for gw version" --user -- password
where last two parameters are optional credential for FW access.

The script will cycle thru every node in OSD cluster, take one node at a time into maintenance mode and update FW to specified version. Leave it to implementer to disable cluster rebuild or not during update.
Script will calculate that it has sufficient spare capacity in a cluster to do it.

I recommend that operation is async since it takes long time to do.
For extra bonus an extra command to check the status of fw-update and shows percentage of nodes update and current node under update.
Ditto for UI for Ceph.

Comment 6 Ian Colle 2017-07-10 17:11:16 UTC
Arkady,

Please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1464945. Does this accomplish what you're looking for?

Comment 7 Ian Colle 2017-08-01 23:20:28 UTC
Closing as duplicate due to lack of response from originator.

*** This bug has been marked as a duplicate of bug 1464945 ***

Comment 8 arkady kanevsky 2017-08-07 13:48:39 UTC
It is not a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1464945.
This BZ is not specific to disk replacement even though some of the steps used in https://bugzilla.redhat.com/show_bug.cgi?id=1464945 will be applicable here.
Documentation will be very different. 
One need a generic way to put node in a maintenance mode. The goal is to minimize data transfer and potentially create a new states for a node - not available. 
For maintenance mode we know that we will bring the node back on line, so it should not be treated as failure.

Once in a maintenance mode  once can what is required. For example, update FW or BIOS on a node or for any of its components, replace any component, like disk or NIC or processors, or even motherboard. Some specific steps maybe required depending what components were replaced.

Reopening.

Comment 9 John Spray 2017-08-07 14:09:13 UTC
So it sounds like you're looking for a host-wide equivalent of the "ceph osd add-noout" command?

Is there any behaviour you're looking for other than for the OSDs on a particular host to not be marked out?

Comment 11 Drew Harris 2019-01-30 14:59:18 UTC
I have closed this issue because it has been inactive for some time now. If you feel this still deserves attention feel free to reopen it.


Note You need to log in before you can comment on or make changes to this bug.