Bug 1210539

Summary:	Replacing failed disks on CEPH nodes
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Vasu Kulkarni <vakulkar>
Component:	Documentation	Assignee:	John Wilkins <jowilkin>
Status:	CLOSED WONTFIX	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	1.3.0	CC:	anharris, dzafman, ealcaniz, flucifre, hnallurv, jowilkin, kdreyer, khartsoe, shmohan, vashastr, vikumar, vumrao
Target Milestone:	rc	Keywords:	Reopened
Target Release:	1.3.4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-02-20 20:59:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vasu Kulkarni 2015-04-10 02:27:17 UTC

Description of problem:

In large clusters with multiple disks on each node, Disks can fail, we need to have clear document for customers on how to replace the bad disk with new one and bring the cluster back to good state(rebalance data)

ceph.com document list adding/removing OSD which is not helpful 

One interesting blog i found online was here: 
http://karan-mj.blogspot.com/2014/03/admin-guide-replacing-failed-disk-in.html

I am still thinking there should be a better way to replace disk without affecting the CRUSH MAP since the layout hasn't changed at all. 

Current method:

(1)
#ceph osd tree | grep -i down
89 2.73 osd.89 up 1
99 2.73 osd.99 down 0

(2)
Using above info login to the OSD node which is down and check if its mounted
# df -h 
/dev/sdd1      2.8T 197G 2.5T 8% /var/lib/ceph/osd/ceph-30
/dev/sde1      2.8T 172G 2.6T 7% /var/lib/ceph/osd/ceph-53

(3)
Assuming Hotswapable drive, replace the failed drive with new drive

(4)
Take out the osd
# ceph osd out osd.99
osd.99 is already out.

# service ceph stop osd.99
/etc/init.d/ceph: osd.99 not found (/etc/ceph/ceph.conf defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78 , /var/lib/ceph defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78)

# service ceph status osd.99
/etc/init.d/ceph: osd.99 not found (/etc/ceph/ceph.conf defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78 , /var/lib/ceph defines osd.9 osd.30 osd.17 osd.128 osd.65 osd.141 osd.89 osd.53 osd.113 osd.78)

# service ceph status
=== osd.9 ===
osd.9: running {"version":"0.72.1"}
=== osd.30 ===
osd.30: running {"version":"0.72.1"}


(5)
Remove OSD from the CRUSH MAP

# ceph osd crush remove osd.99
removed item id 99 name 'osd.99' from crush map

(6)
check ceph status , ceph will make PG copies that were on failed disk and will place PG on other disk

# ceph status
   - Optionally check disk stats using your favourite disk io tool

Wait for ceph status to return to "ACTIVE + CLEAN state"
# ceph status

(7)
Remove OSD keyring
# ceph auth del osd.99
updated

(8)
Remove OSD
# ceph osd rm osd.99
removed osd.99

(9)
Check ceph status for Number of OSD's /OSD's UP/IN state
#ceph status 

Optionally check ceph.conf on all nodes for the presence of this OSD and if present remove it, One can use ceph admin command to push configuration on all nodes.

Add New Drive to CEPH Cluster:

(10)
# ceph osd create
99

(11)
Check Number of OSDs for UP/IN State
# ceph status


(12)
Zap disk and deploy the new disk:

# ceph-deploy disk list nodename
# ceph-deploy disk zap nodename:sdi
# ceph-deploy --overwrite-conf osd prepare nodename:sdi

Check the new OSD
# ceph osd tree 
141 2.73 osd.141 up 1
99 2.73 osd.99 up 1

(13)
ceph status will show pg rebalance using new drive
# ceph status 

Version-Release number of selected component (if applicable):


How reproducible:
N/A

Steps to Reproduce:
N/A

Actual results:
N/A

Expected results:
Need Clear Documentation

Additional info:

Comment 1 Ken Dreyer (Red Hat) 2015-04-23 14:30:23 UTC

I'm targeting this to 1.3.0. John please feel free to re-target if that's not appropriate.

Comment 3 John Wilkins 2015-06-09 22:06:33 UTC

Will address after 1.3. I will need physical hardware for this procedure. Also, we should ask whether we want this in generally available documentation or as a kb article.

Comment 4 Harish NV Rao 2015-06-10 06:39:06 UTC

Hi John,

IMO, this should go into GA documents. You may want to have a section "replacing Ceph hardware components" to describe the replacement procedures for all ceph components. 
Please note that QE is not going to test this defect for 1.3.0 RHEL release.

Regards,
Harish

Comment 5 John Wilkins 2015-08-17 23:02:44 UTC

See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc

Comment 6 Ken Dreyer (Red Hat) 2015-10-21 16:54:28 UTC

(In reply to John Wilkins from comment #3)
> Will address after 1.3. I will need physical hardware for this procedure.
> Also, we should ask whether we want this in generally available
> documentation or as a kb article.

Reading replace-osds.adoc, I don't see anything that specifically requires a physical host. Can this be tested with a VM?

Comment 7 Tanay Ganguly 2015-10-27 11:28:40 UTC

Re-opening this BUG as i see a minor documentation issue.

Point number 9.

9. From your admin node, find the OSD drive and zap it.

ceph-deploy drive list <node-name>
ceph-deploy drive zap <node-name>:</path/to/drive>

It should be disk, not drive. Correct command:

ceph-deploy disk list cephqe5.lab.eng.blr.redhat.com
ceph-deploy disk zap cephqe5.lab.eng.blr.redhat.com:/dev/sdj

Comment 8 John Wilkins 2015-11-02 17:52:01 UTC

Fixed. See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/541c25b137323dba9fbc798bc553ad6065e3f28a

Comment 9 Tanay Ganguly 2015-11-03 06:48:58 UTC

Marking it Verified.

Comment 10 David Zafman 2015-11-03 18:09:02 UTC

Why no "ceph-deploy --overwrite-conf osd activate <node-name>:</path/to/drive>" as in previous documentation?

Using the older documetnation we've seen that the "ceph osd create" is redundant and causes the osd number to be wrong.

I think step 10 should be removed unless we know that ceph-deploy as given here will NOT do the create again.

Comment 11 David Zafman 2015-11-03 18:11:27 UTC

*** Bug 1275631 has been marked as a duplicate of this bug. ***

Comment 12 Tanay Ganguly 2015-11-04 11:30:32 UTC

David,

What kind of Documentation changes will be required ?
Can you let John Wilkins know, so that he can update the document.

Comment 13 David Zafman 2015-11-04 14:49:07 UTC

This bug is assigned to John so I assumed my comment above indicates what needs to be changed and re-tested.

Comment 14 David Zafman 2015-11-04 14:50:47 UTC

Tanay,

Can you test the steps above skipping the "ceph osd create" (step 10)?

Comment 15 Tanay Ganguly 2015-11-05 10:34:11 UTC

Referring the below Document, ceph osd create is in Step number 8

8. Recreate the OSD.

ceph osd create

https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc

I hope you meant this.

Comment 16 John Wilkins 2015-11-05 16:48:54 UTC

I've removed the ceph osd create step, and placed the disk zap command before activate. Please retest. See

https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/6e5f3704f1842bb38d0ccb56fc1d9d422581efe7

Comment 17 Tanay Ganguly 2015-11-06 11:50:04 UTC

I am still unable to make it Successfully Run.

The problem i feel is after i delete the osd entries from crush i.e. (Running steps 1-6), still i see the osd is being mounted:

/dev/sda1 on /var/lib/ceph/osd/ceph-6 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)

Now once i replace the drive with a new drive, activating is failing because the new drive which i put in the server is not getting /dev/sda, rather its getting some new drive letter ( Is this the real Problem )

If Yes: 
The question here is how we how can we control the newly added drive getting the same drive letter.

Another correction in the document:
Step 5 should be replaced with Step 6 and vice-verse.

Hi John,
Did the mentioned steps worked for you ?


Thanks,
Tanay

Comment 18 David Zafman 2015-11-06 19:30:21 UTC

(In reply to John Wilkins from comment #16)
> I've removed the ceph osd create step, and placed the disk zap command
> before activate. Please retest. See
> 
> https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-
> guide/commit/6e5f3704f1842bb38d0ccb56fc1d9d422581efe7

Why did you remove the prepare step?  I don't think that is right.  I'll have Tanay test it out.

Comment 19 John Wilkins 2015-11-06 19:36:09 UTC

Tanay, 

I don't have a running cluster today, so I'm not able to test it. I've added a umount step, because the failed drive is mounted. That was a problem in https://bugzilla.redhat.com/show_bug.cgi?id=1278558 too. So we do need to umount.

https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/24945382a649c830f43dda644c92f9cd75a302f2

Comment 20 John Wilkins 2015-11-06 19:38:16 UTC

(In reply to David Zafman from comment #18)
> (In reply to John Wilkins from comment #16)
> > I've removed the ceph osd create step, and placed the disk zap command
> > before activate. Please retest. See
> > 
> > https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-
> > guide/commit/6e5f3704f1842bb38d0ccb56fc1d9d422581efe7
> 
> Why did you remove the prepare step?  I don't think that is right.  I'll
> have Tanay test it out.

I don't have a running cluster to test it right now. I'll restore prepare. At one point, prepare was activating an OSD erroneously, and that's why it got omitted.

Comment 22 John Wilkins 2015-11-06 19:41:11 UTC

Prepare restored.

https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/ab31d98ff947ef92be325ee6b59add990b9f4e4d

Comment 23 David Zafman 2015-11-06 22:05:47 UTC

(In reply to Tanay Ganguly from comment #17)
> I am still unable to make it Successfully Run.
> 
> The problem i feel is after i delete the osd entries from crush i.e.
> (Running steps 1-6), still i see the osd is being mounted:
> 
> /dev/sda1 on /var/lib/ceph/osd/ceph-6 type xfs
> (rw,noatime,seclabel,attr2,inode64,noquota)

I would have thought that if you replaced a drive into the physical location in a machine it would get the same device name.  Linux may be moving away from using device names by mounting using UUID which is the better approach going forward.  However, Ceph may not use fstab on some distributions.

> 
> Now once i replace the drive with a new drive, activating is failing because
> the new drive which i put in the server is not getting /dev/sda, rather its
> getting some new drive letter ( Is this the real Problem )
> 
> If Yes: 
> The question here is how we how can we control the newly added drive getting
> the same drive letter.

If you aren't physically changing drives in the system and rebooting, then manually unmount the old partition.  Also, remove an fstab entry if one is present on your system.  We might need instructions about that.  What if I have an unused drive on a node and want to replace the bad drive (but leave it physically installed).  That would require a drive letter change.  So there might be extra steps for that case.  If so, we need to figure out what those are.

For now can you work around leaving the old drive in place as I've described?  Then substitute the new drive letter in the commands steps.

> 
> Another correction in the document:
> Step 5 should be replaced with Step 6 and vice-verse.
>

Yes, John should change order to "ceph auth del" followed by "ceph osd rm"

Tanay,

If the instructions said to "ceph-deploy osd create" the OSD, it should handle the prepare and activate in one step.  According to ceph-deploy documentation an osd create is the same as prepare then activate.

Can you try an extra test using ceph-deploy osd create instead of the ceph-deploy osd activate from Step 9: "Recreate the OSD."?

ceph-deploy --overwrite-conf osd create <node-name>:</path/to/drive>

Please re-test using https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/ab31d98ff947ef92be325ee6b59add990b9f4e4d/replace-osds.adoc and try again with my suggestion of replacing the prepare/activate with create in current step #10.

David

Comment 24 David Zafman 2015-11-06 22:07:45 UTC

Thanks John I see that the order of steps 5 and 6 are correct now.

Comment 25 Tanay Ganguly 2015-11-07 18:19:04 UTC

Its not working..

I followed the new steps mentioned in the document:
https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/replace-osds.adoc

I used osd.0 to be removed as the part of this Process.

I followed step 1 -7 in sequence.

After that i followed Step 9
ceph-deploy disk list <node-name>
ceph-deploy disk zap <node-name>:</path/to/drive>

Took a new Drive rather replacing the old drive ( As now it was not feasible )

Then didn't performed Step 10 as mentioned but as David mentioned used..

ceph-deploy --overwrite-conf osd create <node-name>:</path/to/drive>


But Ceph osd tree shows:

ceph osd tree
ID WEIGHT   TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 13.43994 root default
-2  0.35999     host cephqe11
 1  0.35999         osd.1          up  1.00000          1.00000
-3  4.35999     host cephqe8
 2  1.09000         osd.2          up  1.00000          1.00000
 3  1.09000         osd.3          up  1.00000          1.00000
 4  1.09000         osd.4          up  1.00000          1.00000
 5  1.09000         osd.5          up  1.00000          1.00000
-4  4.35999     host cephqe9
 6  1.09000         osd.6          up  1.00000          1.00000
 7  1.09000         osd.7          up  1.00000          1.00000
 8  1.09000         osd.8          up  1.00000          1.00000
 9  1.09000         osd.9          up  1.00000          1.00000
-5  4.35999     host cephqe10
10  1.09000         osd.10         up  1.00000          1.00000
11  1.09000         osd.11         up  1.00000          1.00000
12  1.09000         osd.12         up  1.00000          1.00000
13  1.09000         osd.13         up  1.00000          1.00000
 0        0 osd.0                down        0          1.00000


Some other Information:
After i added a new drive, i still see osd.0 as down.

And strange again i see /dev/sdb1 on /var/lib/ceph/osd/ceph-0 got mounted.
But now i used /dev/sdd as a replacement, so i guess

/dev/sdd should have got mounted.

Comment 26 David Zafman 2015-11-09 16:51:24 UTC

Tanay,

I wanted you to try the "ceph-deploy ... osd create ..." as an extra test.

According to Alfredo there is a RHEL bug with "create."  So please retest using the procedure as described.

Thanks
David

Comment 27 David Zafman 2015-11-09 17:17:58 UTC

Tanay,

I've ask 2 other developers to look at the procedure and they think it looks good.  Please send me a log of the entire shell session in which you are following these steps.  As per my previous comment use the exact instructions, not my suggestion.

David

Comment 28 John Wilkins 2015-11-09 23:06:00 UTC

Removed the section from the TOC. It will remain in the repo and we can continue to develop and test; then, republish when it's ready.

https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/03c6f66246cfb60aad8b61e3b775baed4bd3eb0e

Comment 29 Vasu Kulkarni 2015-11-09 23:38:39 UTC

targeted for 1.3.2

Comment 30 Tanay Ganguly 2016-02-22 12:36:42 UTC

Will re-test this in 1.3.3

Comment 31 John Wilkins 2016-09-13 22:04:42 UTC

https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide/#changing_an_osd_drive

Comment 34 Harish NV Rao 2016-10-19 06:52:46 UTC

@docs team,

Vasu has a request to add a section to replace even the "system" disks. Please check comments:
https://bugzilla.redhat.com/show_bug.cgi?id=1210543#c20
https://bugzilla.redhat.com/show_bug.cgi?id=1210543#c22

The comment https://bugzilla.redhat.com/show_bug.cgi?id=1210543#c26 has rough steps to replace a faulty 'system' disk.

As this bug is already tracking the replacement of the failed drives, I feel Vasu's additional request of replacing 'system' disks can be accommodated here.

Changing the state of this defect to Assigned to address above request.

Thanks,
Harish

Comment 36 Red Hat Bugzilla 2023-09-14 02:57:51 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days