Bug 1210543
| Summary: | Replacing failed CEPH Node | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vasu Kulkarni <vakulkar> |
| Component: | Documentation | Assignee: | John Wilkins <jowilkin> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 1.3.0 | CC: | asriram, dgallowa, flucifre, hnallurv, jowilkin, kdreyer, shmohan, vakulkar, vashastr |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | 1.3.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-30 09:52:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Vasu Kulkarni
2015-04-10 03:09:42 UTC
I'm targeting this to 1.3.0. John please feel free to re-target if that's not appropriate. Will have to address this after 1.3 release. Seems to be duplicated multiple times. John, I checked that document, It explains few things about adding and remove OSD(which is a Ceph Disk Replacemnt as compared to this bz which is Ceph Node replacement(i,e All OSD's/Mon's on this particular node). Also I feel the doc link you sent doesn't completely explain how to remove and add a new OSD. This document--https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/add-remove-node.adoc--is meant to provide high-level guidance to a user when changing an OSD node, but not a monitor node. Changing OSD nodes can't easily be made into a generic procedure, because it depends on the hardware configuration which is unknown, the means by which the node was configured, which is also unknown; and, the reason the node must be changed (e.g., motherboard failure, hardware upgrade, etc.), which is also unknown. I have stated this multiple times. I am not clear on how I can document the procedure for unknown hardware, configuration and rationale for swapping out the node, so I have provided the high level guidance that a system administrator should know as it relates to what they can expect as a performance impact and the steps they can do to mitigate performance impact.. The doc provides a hyperlink to--https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc--that describes how to change an OSD that has failed as well as https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc which describes generically adding/removing OSDs. The add/remove doc has been available in that form for years. The doc on replacing and OSD disk that has failed is new, but tracks the add/remove OSD fairly closely. Did you follow that procedure? John, Sorry I missed to update earlier, I am not sure why changing OSD is not a generic process, It shouldn't have dependency on Drive or Chassis, detecting new drive should be a OS Specific and it could be in different forms. The one which you had here before I think is generic : https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc--that We need to document the process for replacing mon and I think the upstream document should be good enough. Please re-open this bug for Infernalis (RHCS 2.0). We are going to rewrite the hardware guide and will address this with information we receive from the reference architecture team. hi, In the doc: https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide/#adding_and_removing_osd_nodes Section 8.3 step 3: osd_recovery_priority = 1 is not a valid config setting. root@magna009 ceph-config]# ceph tell osd.* injectargs '--osd_recovery_priority 1' osd.0: failed to parse arguments: --osd_recovery_priority,1 I think it should be : ceph tell osd.* injectargs '--osd-recovery-op-priority 1' root@magna009 ceph-config]# ceph tell osd.* injectargs '--osd_recovery_op_priority 1' osd.0: osd_recovery_op_priority = '1' osd.1: osd_recovery_op_priority = '1' Moving this back. Thanks, Tejas Fixed. There were three instances in the doc. https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide#recommendations https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/1.3/single/administration-guide/#changing_an_osd_drive https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/1.3/single/administration-guide/#adding_and_removing_osd_nodes Fixed. Sorry, but can we also document how to replace the "root" drive that holds osd and mon db, I was hoping this will cover that scenario where osd drive might be intact but the root drive needs replacement? Vasu, this defect tracks node replacement. comment 20 may be applicable to 1210539 (Replacing failed disks on CEPH nodes.) Can you please check and do the needful. We have almost completed verifying this defect. Harish, that bz is covering only the failed "osd" drives,its not covering the failed "system" drive, since the osd drives survive we need to replace system drive and check for other services come up after quick restore of ceph (mon/osd/mds that exists on node) Should comment 22 be incorporated as part of this BZ? If yes, please move the defect to assigned state. Vasu, A gentle reminder. We have completed verifying this defect already and would like to move this defect to verified state. But without resolution on comments 21, 22 and 23 we can't move this to verified state. I feel comment 22 should be part of 1210539 (Replacing failed disks on CEPH nodes.) and we should move this defect to verified state. |