852556 – Need information on retiring bricks/nodes

Bug 852556 - Need information on retiring bricks/nodes

Summary: Need information on retiring bricks/nodes

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Gluster-Documentation
Classification:	Community
Component:	Other
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	---
Assignee:	Anjana Suparna Sriram
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-28 22:42 UTC by Shawn Heisey
Modified:	2023-01-04 04:36 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:

Attachments	(Terms of Use)

Description Shawn Heisey 2012-08-28 22:42:52 UTC

Description of problem:
The documentation does not explain how to retire bricks/nodes.  This is a multi-step process that involves moving data off the brick(s), removing the brick(s) from the volume, then optionally removing the node(s) from the cluster.  Here's the proper way to take care of the bricks:

gluster volume <volname> remove-brick node1:/brick1 node2:/brick2 start
gluster volume <volname> remove-brick node1:/brick1 node2:/brick2 status
.
.
.
gluster volume <volname> remove-brick node1:/brick1 node2:/brick2 status

<when status shows completed, do the following>

gluster volume <volname> remove-brick node1:/brick1 node2:/brick2 commit
<answer "y" to the confirmation prompt>

Version-Release number of selected component (if applicable):
3.3.0

Comment 1 Shawn Heisey 2012-09-28 21:59:48 UTC

When I did this procedure before, I did not test to see whether the migrated data was accessible.  Today I tried a new test.  This is probably going to require a new bug on gluster itself rather than the documentation, but I wanted to get the info written down while it's fresh.

I loaded a small 4x2 volume two thirds full and tried to gracefully remove the last set of bricks with the procedure I have outlined above.  All of the remaining bricks ran out of disk space during the migration, and there were thousands of migration failures in the log.  I started the remove-brick back up, and it again ran out of disk space.  A third time completed without migration errors in the log.  At this point, I had not issued the commit.

After this, I tried to access files in the volume from a client mount.  Everything that originally existed on the removed bricks was inaccessible.

Final status: Once I did the remove-brick commit, everything magically started working.  I'm glad that there was no actual data loss, but if I am removing a set of 4TB bricks that's 75% or so full, it's going to take a really long time for 3TB of data (millions of files) to get migrated.  The files that get migrated first will be unavailable for the entire time of the migration effort, which is unacceptable by any standard.

Note You need to log in before you can comment on or make changes to this bug.