1792543 – missing steps in OCS 3.11 docs for replacing drive

Bug 1792543 - missing steps in OCS 3.11 docs for replacing drive

Summary: missing steps in OCS 3.11 docs for replacing drive

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	doc-Container_Native_Storage_with_OpenShift
Sub Component:
Version:	ocs-3.11
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 3.11.z Async
Assignee:	Disha Walvekar
QA Contact:	Rachael
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1810041
TreeView+	depends on / blocked

Reported:	2020-01-17 21:52 UTC by Dan Yocum
Modified:	2023-03-24 16:46 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-10 10:26:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dan Yocum 2020-01-17 21:52:48 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Steps are missing in this docs to replace failed drives:

https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/3.11/html/operations_guide/chap-Documentation-Red_Hat_Gluster_Storage_Container_Native_with_OpenShift_Platform-Managing_Clusters#Replacing_Device


1) Step #5.  If the migration is complete, to correct the device entry
in heketi use '--force-forget' as an option to 'heketi-cli device
delete' command. But note that this can be potentially dangerous if the
failed device data (bricks) hasn't been migrated to the other devices at
the node end, so we have to be careful about that.


2) performance.read-ahead must be disabled in order to allow the heal to
complete:

gluster volume set VOLUME  performance.read-ahead off


3) extra shd's must be started if >100,000 volumes require healing.
KBase article should be included or linked in the docs:

https://access.redhat.com/solutions/3794011




Version of all relevant components (if applicable):

3.4.11


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes

w/o #1 the heal never starts
w/o #2 the heal starts but halts
w/o #3 the heal progresses at a very slow rate and can take days to
complete.


Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused
this
bug (1 - very simple, 5 - very complex)?

5

Comment 3 Disha Walvekar 2020-02-13 07:13:08 UTC

Hi, 

I have created a draft of the content to be added if you can review this and let me know if there are any changes to be done.

Link to the doc - https://docs.google.com/document/d/1hbG-B-7WDpv4_yNil9qHyHn-jyPMlia_TA-T0UDkoP8/edit

Thank you!

-Disha Walvekar

Comment 4 Dan Yocum 2020-02-17 16:30:03 UTC

Disha - let's see if Yaniv and Anton can add more clarification to the "warning" section.

Thanks,
Dan

Note You need to log in before you can comment on or make changes to this bug.