Bug 1432435

Summary: device remove should check there are no pending heals before proceeding with the brick replacement
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Humble Chirammal <hchiramm>
Component: doc-Container_Native_Storage_with_OpenShiftAssignee: Divya <divya>
Status: CLOSED CURRENTRELEASE QA Contact: krishnaram Karthick <kramdoss>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.5CC: annair, asriram, hchiramm, kramdoss, mliyazud, rcyriac, rhs-bugs, rtalur, storage-doc, storage-qa-internal
Target Milestone: ---   
Target Release: CNS 3.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1432004 Environment:
Last Closed: 2017-11-17 05:16:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1432004    
Bug Blocks: 1415611    

Comment 5 Divya 2017-04-07 06:47:57 UTC
I have added a warning on checking the status of self-heal before running the remove-brick operation.

Link to the doc: http://ccs-jenkins.gsslab.brq.redhat.com:8080/job/doc-Red_Hat_Gluster_Storage-3.5-Container_Native_Storage_with_OpenShift_Platform-branch-master/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#idm140465404343952

Please note that all the changes/updates to the section are yet to be merged to the master branch. 

Moving the bug ON_QA.

Comment 6 krishnaram Karthick 2017-04-10 06:40:52 UTC
Currently we have the following doc section for this bug.

===========================================================================
 Remove device operation triggers a self-heal in the background. The time taken to complete the self-heal operation is proportional to the data on the removed device. Before you perform the remove device operations, you must ensure that the self-heal operations are complete on all the volumes.
Run the following command to obtain the volume names:

# oc rsh <gluster_pod_name>

Run the following command on each volume to check the self-heal status:

# gluster volume heal <volname> info
===========================================================================

However, please note that there might be more than one volume configured from a device. so we need to mention here explicitly to check the heal info for all the volumes from the device replaced.

so, we should document 2 commands here.

1) A way to find the list of volumes carved out of a device
2) how to check heal info on the volume
3) Repeat the heal info for all the volumes found on step 1.

Comment 7 Raghavendra Talur 2017-04-12 12:00:15 UTC
Determining the list of bricks(hence volumes) which belongs to a device is not possible through heketi-cli in this release. We have therefore asked users to check self heal status of all the volumes.

Karthick, what method did you use to determine brick list for a device?

Comment 8 krishnaram Karthick 2017-04-13 04:41:04 UTC
(In reply to Raghavendra Talur from comment #7)
> Determining the list of bricks(hence volumes) which belongs to a device is
> not possible through heketi-cli in this release. We have therefore asked
> users to check self heal status of all the volumes.
> 
> Karthick, what method did you use to determine brick list for a device?

I had to login to gluster pods and run heal info for each volume to check there are no pending heals. I couldn't find a better way. 

One way to find the list of bricks used in a device is to use device info, but it doesn't give the volumes being used. We might have to use topology info further to manipulate volume information.

heketi-cli device info 724d4c878d4f406cfeb4bca3bcc15bb0
Device Id: 724d4c878d4f406cfeb4bca3bcc15bb0
Name: /dev/sdd
State: online
Size (GiB): 99
Used (GiB): 10
Free (GiB): 89
Bricks:
Id:1f749f4f5dfab2ee2ac34b9fccbfdbe7   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_1f749f4f5dfab2ee2ac34b9fccbfdbe7/brick
Id:3c78af6a861180417f8763a1fbbaf8e6   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_3c78af6a861180417f8763a1fbbaf8e6/brick
Id:5a137d789c0837f93701228d4aea1ce0   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_5a137d789c0837f93701228d4aea1ce0/brick
Id:61493cbf7f58e3233fd1df0a4b2f5093   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_61493cbf7f58e3233fd1df0a4b2f5093/brick
Id:70aa47765d71c4b62328870e704e1506   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_70aa47765d71c4b62328870e704e1506/brick
Id:85a0e27494c162226321e3d48139fcda   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_85a0e27494c162226321e3d48139fcda/brick
Id:c1f712413bd9891219deeef98fe3cda9   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_c1f712413bd9891219deeef98fe3cda9/brick
Id:d450ab810010e4a509491beefb4d22e6   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_d450ab810010e4a509491beefb4d22e6/brick
Id:d7624a8b4ffd1312b38dbe855823f221   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_724d4c878d4f406cfeb4bca3bcc15bb0/brick_d7624a8b4ffd1312b38dbe855823f221/brick

Comment 9 Raghavendra Talur 2017-04-13 08:36:43 UTC
As Karthick has mentioned in previous comment, it is not possible to determine volume list corresponding to a device. Volume info command does not have sufficient information either. Hence the old doc which says that user must check for self heal status of all the volumes remains the right way.

Comment 10 Divya 2017-04-13 09:21:08 UTC
Based on Comment 9, moving back the bug ON_QA.

Comment 12 krishnaram Karthick 2017-04-18 03:52:01 UTC
Looks good to me, moving the bug to verified.