Description of problem: In large clusters with multiple disks on each node, Disks can fail, we need to have clear document for customers on how to replace the bad disk with new one and bring the cluster back to good state(rebalance data) ceph.com document list adding/removing OSD which is not helpful One interesting blog i found online was here: http://karan-mj.blogspot.com/2014/03/admin-guide-replacing-failed-disk-in.html I am still thinking there should be a better way to replace disk without affecting the CRUSH MAP since the layout hasn't changed at all. Current method: (1) #ceph osd tree | grep -i down 89 2.73 osd.89 up 1 99 2.73 osd.99 down 0 (2) Using above info login to the OSD node which is down and check if its mounted # df -h /dev/sdd1 2.8T 197G 2.5T 8% /var/lib/ceph/osd/ceph-30 /dev/sde1 2.8T 172G 2.6T 7% /var/lib/ceph/osd/ceph-53 (3) Assuming Hotswapable drive, replace the failed drive with new drive (4) Take out the osd # ceph osd out osd.99 osd.99 is already out. # service ceph stop osd.99 /etc/init.d/ceph: osd.99 not found (/etc/ceph/ceph.conf defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78 , /var/lib/ceph defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78) # service ceph status osd.99 /etc/init.d/ceph: osd.99 not found (/etc/ceph/ceph.conf defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78 , /var/lib/ceph defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78) # service ceph status === osd.9 === osd.9: running {"version":"0.72.1"} === osd.30 === osd.30: running {"version":"0.72.1"} (5) Remove OSD from the CRUSH MAP # ceph osd crush remove osd.99 removed item id 99 name 'osd.99' from crush map (6) check ceph status , ceph will make PG copies that were on failed disk and will place PG on other disk # ceph status - Optionally check disk stats using your favourite disk io tool Wait for ceph status to return to "ACTIVE + CLEAN state" # ceph status (7) Remove OSD keyring # ceph auth del osd.99 updated (8) Remove OSD # ceph osd rm osd.99 removed osd.99 (9) Check ceph status for Number of OSD's /OSD's UP/IN state #ceph status Optionally check ceph.conf on all nodes for the presence of this OSD and if present remove it, One can use ceph admin command to push configuration on all nodes. Add New Drive to CEPH Cluster: (10) # ceph osd create 99 (11) Check Number of OSDs for UP/IN State # ceph status (12) Zap disk and deploy the new disk: # ceph-deploy disk list nodename # ceph-deploy disk zap nodename:sdi # ceph-deploy --overwrite-conf osd prepare nodename:sdi Check the new OSD # ceph osd tree 141 2.73 osd.141 up 1 99 2.73 osd.99 up 1 (13) ceph status will show pg rebalance using new drive # ceph status Version-Release number of selected component (if applicable): How reproducible: N/A Steps to Reproduce: N/A Actual results: N/A Expected results: Need Clear Documentation Additional info:
I'm targeting this to 1.3.0. John please feel free to re-target if that's not appropriate.
Will address after 1.3. I will need physical hardware for this procedure. Also, we should ask whether we want this in generally available documentation or as a kb article.
Hi John, IMO, this should go into GA documents. You may want to have a section "replacing Ceph hardware components" to describe the replacement procedures for all ceph components. Please note that QE is not going to test this defect for 1.3.0 RHEL release. Regards, Harish
See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc
(In reply to John Wilkins from comment #3) > Will address after 1.3. I will need physical hardware for this procedure. > Also, we should ask whether we want this in generally available > documentation or as a kb article. Reading replace-osds.adoc, I don't see anything that specifically requires a physical host. Can this be tested with a VM?
Re-opening this BUG as i see a minor documentation issue. Point number 9. 9. From your admin node, find the OSD drive and zap it. ceph-deploy drive list <node-name> ceph-deploy drive zap <node-name>:</path/to/drive> It should be disk, not drive. Correct command: ceph-deploy disk list cephqe5.lab.eng.blr.redhat.com ceph-deploy disk zap cephqe5.lab.eng.blr.redhat.com:/dev/sdj
Fixed. See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/541c25b137323dba9fbc798bc553ad6065e3f28a
Marking it Verified.
Why no "ceph-deploy --overwrite-conf osd activate <node-name>:</path/to/drive>" as in previous documentation? Using the older documetnation we've seen that the "ceph osd create" is redundant and causes the osd number to be wrong. I think step 10 should be removed unless we know that ceph-deploy as given here will NOT do the create again.
*** Bug 1275631 has been marked as a duplicate of this bug. ***
David, What kind of Documentation changes will be required ? Can you let John Wilkins know, so that he can update the document.
This bug is assigned to John so I assumed my comment above indicates what needs to be changed and re-tested.
Tanay, Can you test the steps above skipping the "ceph osd create" (step 10)?
Referring the below Document, ceph osd create is in Step number 8 8. Recreate the OSD. ceph osd create https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc I hope you meant this.
I've removed the ceph osd create step, and placed the disk zap command before activate. Please retest. See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/6e5f3704f1842bb38d0ccb56fc1d9d422581efe7
I am still unable to make it Successfully Run. The problem i feel is after i delete the osd entries from crush i.e. (Running steps 1-6), still i see the osd is being mounted: /dev/sda1 on /var/lib/ceph/osd/ceph-6 type xfs (rw,noatime,seclabel,attr2,inode64,noquota) Now once i replace the drive with a new drive, activating is failing because the new drive which i put in the server is not getting /dev/sda, rather its getting some new drive letter ( Is this the real Problem ) If Yes: The question here is how we how can we control the newly added drive getting the same drive letter. Another correction in the document: Step 5 should be replaced with Step 6 and vice-verse. Hi John, Did the mentioned steps worked for you ? Thanks, Tanay
(In reply to John Wilkins from comment #16) > I've removed the ceph osd create step, and placed the disk zap command > before activate. Please retest. See > > https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration- > guide/commit/6e5f3704f1842bb38d0ccb56fc1d9d422581efe7 Why did you remove the prepare step? I don't think that is right. I'll have Tanay test it out.
Tanay, I don't have a running cluster today, so I'm not able to test it. I've added a umount step, because the failed drive is mounted. That was a problem in https://bugzilla.redhat.com/show_bug.cgi?id=1278558 too. So we do need to umount. https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/24945382a649c830f43dda644c92f9cd75a302f2
(In reply to David Zafman from comment #18) > (In reply to John Wilkins from comment #16) > > I've removed the ceph osd create step, and placed the disk zap command > > before activate. Please retest. See > > > > https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration- > > guide/commit/6e5f3704f1842bb38d0ccb56fc1d9d422581efe7 > > Why did you remove the prepare step? I don't think that is right. I'll > have Tanay test it out. I don't have a running cluster to test it right now. I'll restore prepare. At one point, prepare was activating an OSD erroneously, and that's why it got omitted.
Prepare restored. https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/ab31d98ff947ef92be325ee6b59add990b9f4e4d
(In reply to Tanay Ganguly from comment #17) > I am still unable to make it Successfully Run. > > The problem i feel is after i delete the osd entries from crush i.e. > (Running steps 1-6), still i see the osd is being mounted: > > /dev/sda1 on /var/lib/ceph/osd/ceph-6 type xfs > (rw,noatime,seclabel,attr2,inode64,noquota) I would have thought that if you replaced a drive into the physical location in a machine it would get the same device name. Linux may be moving away from using device names by mounting using UUID which is the better approach going forward. However, Ceph may not use fstab on some distributions. > > Now once i replace the drive with a new drive, activating is failing because > the new drive which i put in the server is not getting /dev/sda, rather its > getting some new drive letter ( Is this the real Problem ) > > If Yes: > The question here is how we how can we control the newly added drive getting > the same drive letter. If you aren't physically changing drives in the system and rebooting, then manually unmount the old partition. Also, remove an fstab entry if one is present on your system. We might need instructions about that. What if I have an unused drive on a node and want to replace the bad drive (but leave it physically installed). That would require a drive letter change. So there might be extra steps for that case. If so, we need to figure out what those are. For now can you work around leaving the old drive in place as I've described? Then substitute the new drive letter in the commands steps. > > Another correction in the document: > Step 5 should be replaced with Step 6 and vice-verse. > Yes, John should change order to "ceph auth del" followed by "ceph osd rm" Tanay, If the instructions said to "ceph-deploy osd create" the OSD, it should handle the prepare and activate in one step. According to ceph-deploy documentation an osd create is the same as prepare then activate. Can you try an extra test using ceph-deploy osd create instead of the ceph-deploy osd activate from Step 9: "Recreate the OSD."? ceph-deploy --overwrite-conf osd create <node-name>:</path/to/drive> Please re-test using https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/ab31d98ff947ef92be325ee6b59add990b9f4e4d/replace-osds.adoc and try again with my suggestion of replacing the prepare/activate with create in current step #10. David
Thanks John I see that the order of steps 5 and 6 are correct now.
Its not working.. I followed the new steps mentioned in the document: https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc I used osd.0 to be removed as the part of this Process. I followed step 1 -7 in sequence. After that i followed Step 9 ceph-deploy disk list <node-name> ceph-deploy disk zap <node-name>:</path/to/drive> Took a new Drive rather replacing the old drive ( As now it was not feasible ) Then didn't performed Step 10 as mentioned but as David mentioned used.. ceph-deploy --overwrite-conf osd create <node-name>:</path/to/drive> But Ceph osd tree shows: ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 13.43994 root default -2 0.35999 host cephqe11 1 0.35999 osd.1 up 1.00000 1.00000 -3 4.35999 host cephqe8 2 1.09000 osd.2 up 1.00000 1.00000 3 1.09000 osd.3 up 1.00000 1.00000 4 1.09000 osd.4 up 1.00000 1.00000 5 1.09000 osd.5 up 1.00000 1.00000 -4 4.35999 host cephqe9 6 1.09000 osd.6 up 1.00000 1.00000 7 1.09000 osd.7 up 1.00000 1.00000 8 1.09000 osd.8 up 1.00000 1.00000 9 1.09000 osd.9 up 1.00000 1.00000 -5 4.35999 host cephqe10 10 1.09000 osd.10 up 1.00000 1.00000 11 1.09000 osd.11 up 1.00000 1.00000 12 1.09000 osd.12 up 1.00000 1.00000 13 1.09000 osd.13 up 1.00000 1.00000 0 0 osd.0 down 0 1.00000 Some other Information: After i added a new drive, i still see osd.0 as down. And strange again i see /dev/sdb1 on /var/lib/ceph/osd/ceph-0 got mounted. But now i used /dev/sdd as a replacement, so i guess /dev/sdd should have got mounted.
Tanay, I wanted you to try the "ceph-deploy ... osd create ..." as an extra test. According to Alfredo there is a RHEL bug with "create." So please retest using the procedure as described. Thanks David
Tanay, I've ask 2 other developers to look at the procedure and they think it looks good. Please send me a log of the entire shell session in which you are following these steps. As per my previous comment use the exact instructions, not my suggestion. David
Removed the section from the TOC. It will remain in the repo and we can continue to develop and test; then, republish when it's ready. https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/03c6f66246cfb60aad8b61e3b775baed4bd3eb0e
targeted for 1.3.2
Will re-test this in 1.3.3
https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide/#changing_an_osd_drive
@docs team, Vasu has a request to add a section to replace even the "system" disks. Please check comments: https://bugzilla.redhat.com/show_bug.cgi?id=1210543#c20 https://bugzilla.redhat.com/show_bug.cgi?id=1210543#c22 The comment https://bugzilla.redhat.com/show_bug.cgi?id=1210543#c26 has rough steps to replace a faulty 'system' disk. As this bug is already tracking the replacement of the failed drives, I feel Vasu's additional request of replacing 'system' disks can be accommodated here. Changing the state of this defect to Assigned to address above request. Thanks, Harish