Back to bug 1303125

Who When What Removed Added
Atin Mukherjee 2016-01-30 03:15:51 UTC Blocks 1303269
Atin Mukherjee 2016-01-30 04:08:17 UTC Status NEW POST
Bhaskarakiran 2016-01-30 10:43:20 UTC CC byarlaga
Rejy M Cyriac 2016-02-01 11:02:28 UTC CC rcyriac
Vivek Agarwal 2016-02-01 13:03:56 UTC CC vagarwal
Atin Mukherjee 2016-02-02 04:36:33 UTC Blocks 1302968
Gaurav Kumar Garg 2016-02-02 06:42:58 UTC CC ggarg
Atin Mukherjee 2016-02-05 06:05:18 UTC Blocks 1268895
Atin Mukherjee 2016-02-05 06:12:20 UTC Doc Text The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd goes down and needs to be restarted while the rebalance process is already onging, post restart glusterd doesn't connect to the running rebalance process.

This results in rebalance continuing to run without communicating with glusterd. Therefore, any operation that requires communication between rebalance and glusterd fails.

Also post restart glusterd doesn't retain decommission_is_in_progress flag set for the volume which indicates that rebalance is still running on and hence if remove-brick commit is triggered from the same node where glusterd is restarted and if in other nodes the rebalance has finished then commit goes through even if there is a onging rebalance process which can potentially result into data loss.

To work around this issue following steps need to be performed:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.

2. remove-brick commit can *only* be executed once remove-brick status shows that data migration is completed.
Doc Type Bug Fix Known Issue
Laura Bailey 2016-02-08 02:06:51 UTC CC amukherj
Flags needinfo?(amukherj)
Atin Mukherjee 2016-02-08 03:36:00 UTC CC lbailey
Flags needinfo?(amukherj) needinfo?(lbailey)
Laura Bailey 2016-02-08 04:06:30 UTC Flags needinfo?(lbailey)
Laura Bailey 2016-02-08 06:24:05 UTC Doc Text The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd goes down and needs to be restarted while the rebalance process is already onging, post restart glusterd doesn't connect to the running rebalance process.

This results in rebalance continuing to run without communicating with glusterd. Therefore, any operation that requires communication between rebalance and glusterd fails.

Also post restart glusterd doesn't retain decommission_is_in_progress flag set for the volume which indicates that rebalance is still running on and hence if remove-brick commit is triggered from the same node where glusterd is restarted and if in other nodes the rebalance has finished then commit goes through even if there is a onging rebalance process which can potentially result into data loss.

To work around this issue following steps need to be performed:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.

2. remove-brick commit can *only* be executed once remove-brick status shows that data migration is completed.
The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting:
- rebalance
- tier
- remove-brick

This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running.

If glusterd fails and restarts on a node where the rebalance process was already running, and remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss.

Workaround:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.
2. On the brick on which glusterd restarted, check the status of the remove-brick process. Only execute the remove-brick commit command when remove-brick status shows that data migration is complete.
Flags needinfo?(amukherj)
Atin Mukherjee 2016-02-08 07:13:02 UTC Flags needinfo?(amukherj)
Laura Bailey 2016-02-09 00:28:57 UTC Doc Text The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting:
- rebalance
- tier
- remove-brick

This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running.

If glusterd fails and restarts on a node where the rebalance process was already running, and remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss.

Workaround:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.
2. On the brick on which glusterd restarted, check the status of the remove-brick process. Only execute the remove-brick commit command when remove-brick status shows that data migration is complete.
The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting:
- rebalance
- tier
- remove-brick

This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running.

If glusterd fails and restarts on a node where remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss.

Workaround:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.
2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete.
Flags needinfo?(amukherj)
Atin Mukherjee 2016-02-09 04:12:55 UTC Flags needinfo?(amukherj)
Laura Bailey 2016-02-11 01:50:53 UTC Doc Text The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting:
- rebalance
- tier
- remove-brick

This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running.

If glusterd fails and restarts on a node where remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss.

Workaround:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.
2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete.
The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting:
- rebalance
- tier
- remove-brick

This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running.

If glusterd fails and restarts on a node where remove-brick was triggered and the rebalance process is not yet complete, but the rebalance process on other nodes has already completed, then the remove-brick commit operation succeeds because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss.

Workaround:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.
2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete.
Flags needinfo?(amukherj)
Atin Mukherjee 2016-02-11 04:37:03 UTC Flags needinfo?(amukherj)
Bhaskarakiran 2016-02-11 09:05:46 UTC Keywords ZStream
Blocks 1299184
John Skeoch 2016-02-18 00:09:38 UTC CC vagarwal sankarshan
Atin Mukherjee 2016-02-23 05:42:57 UTC Blocks 1310972
Atin Mukherjee 2016-02-23 08:46:11 UTC QA Contact storage-qa-internal bsrirama
Alok 2016-03-03 09:35:06 UTC CC asrivast
Red Hat Bugzilla Rules Engine 2016-03-08 08:15:57 UTC Target Release --- RHGS 3.1.3
Atin Mukherjee 2016-03-22 12:02:45 UTC Status POST MODIFIED
errata-xmlrpc 2016-03-24 13:18:16 UTC Status MODIFIED ON_QA
Satish Mohan 2016-03-24 16:12:31 UTC Fixed In Version glusterfs-3.7.9-1
Byreddy 2016-04-06 04:21:48 UTC Status ON_QA VERIFIED
Atin Mukherjee 2016-05-03 11:02:58 UTC Doc Text The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting:
- rebalance
- tier
- remove-brick

This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running.

If glusterd fails and restarts on a node where remove-brick was triggered and the rebalance process is not yet complete, but the rebalance process on other nodes has already completed, then the remove-brick commit operation succeeds because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss.

Workaround:
1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts.
2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete.
Doc Type Known Issue Bug Fix
Atin Mukherjee 2016-05-13 04:28:54 UTC Doc Text Earlier, if glusterd is restarted on a node when rebalance is still in progress, a remove-brick commit command succeeds even if rebalance hasn't been completed which results into a data loss.

With this fix remove brick commit would fail indicating rebalance is still in progress.
John Skeoch 2016-06-05 23:39:03 UTC CC ggarg smohan
Laura Bailey 2016-06-06 06:59:21 UTC Doc Text Earlier, if glusterd is restarted on a node when rebalance is still in progress, a remove-brick commit command succeeds even if rebalance hasn't been completed which results into a data loss.

With this fix remove brick commit would fail indicating rebalance is still in progress.
Previously, when glusterd was restarted on a node while rebalance was still in progress, remove-brick commits succeeded even though rebalance was not yet complete. This resulted in data loss. This update ensures that remove-brick commits fail with appropriate log messages when rebalance is in progress.
Flags needinfo?(amukherj)
Atin Mukherjee 2016-06-06 07:01:20 UTC Flags needinfo?(amukherj)
errata-xmlrpc 2016-06-23 00:47:41 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2016-06-23 05:05:52 UTC Status RELEASE_PENDING CLOSED
Resolution --- ERRATA
Last Closed 2016-06-23 01:05:52 UTC
Rejy M Cyriac 2016-09-17 16:32:34 UTC Sub Component glusterd
Component glusterd glusterd-transition
Rejy M Cyriac 2016-09-17 16:45:03 UTC CC rhs-bugs, storage-qa-internal, vbellur
Component glusterd-transition glusterd

Back to bug 1303125