Back to bug 1303125
| Who | When | What | Removed | Added |
|---|---|---|---|---|
| Atin Mukherjee | 2016-01-30 03:15:51 UTC | Blocks | 1303269 | |
| Atin Mukherjee | 2016-01-30 04:08:17 UTC | Status | NEW | POST |
| Bhaskarakiran | 2016-01-30 10:43:20 UTC | CC | byarlaga | |
| Rejy M Cyriac | 2016-02-01 11:02:28 UTC | CC | rcyriac | |
| Vivek Agarwal | 2016-02-01 13:03:56 UTC | CC | vagarwal | |
| Atin Mukherjee | 2016-02-02 04:36:33 UTC | Blocks | 1302968 | |
| Gaurav Kumar Garg | 2016-02-02 06:42:58 UTC | CC | ggarg | |
| Atin Mukherjee | 2016-02-05 06:05:18 UTC | Blocks | 1268895 | |
| Atin Mukherjee | 2016-02-05 06:12:20 UTC | Doc Text | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd goes down and needs to be restarted while the rebalance process is already onging, post restart glusterd doesn't connect to the running rebalance process. This results in rebalance continuing to run without communicating with glusterd. Therefore, any operation that requires communication between rebalance and glusterd fails. Also post restart glusterd doesn't retain decommission_is_in_progress flag set for the volume which indicates that rebalance is still running on and hence if remove-brick commit is triggered from the same node where glusterd is restarted and if in other nodes the rebalance has finished then commit goes through even if there is a onging rebalance process which can potentially result into data loss. To work around this issue following steps need to be performed: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. remove-brick commit can *only* be executed once remove-brick status shows that data migration is completed. | |
| Doc Type | Bug Fix | Known Issue | ||
| Laura Bailey | 2016-02-08 02:06:51 UTC | CC | amukherj | |
| Flags | needinfo?(amukherj) | |||
| Atin Mukherjee | 2016-02-08 03:36:00 UTC | CC | lbailey | |
| Flags | needinfo?(amukherj) | needinfo?(lbailey) | ||
| Laura Bailey | 2016-02-08 04:06:30 UTC | Flags | needinfo?(lbailey) | |
| Laura Bailey | 2016-02-08 06:24:05 UTC | Doc Text | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd goes down and needs to be restarted while the rebalance process is already onging, post restart glusterd doesn't connect to the running rebalance process. This results in rebalance continuing to run without communicating with glusterd. Therefore, any operation that requires communication between rebalance and glusterd fails. Also post restart glusterd doesn't retain decommission_is_in_progress flag set for the volume which indicates that rebalance is still running on and hence if remove-brick commit is triggered from the same node where glusterd is restarted and if in other nodes the rebalance has finished then commit goes through even if there is a onging rebalance process which can potentially result into data loss. To work around this issue following steps need to be performed: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. remove-brick commit can *only* be executed once remove-brick status shows that data migration is completed. | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting: - rebalance - tier - remove-brick This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running. If glusterd fails and restarts on a node where the rebalance process was already running, and remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss. Workaround: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. On the brick on which glusterd restarted, check the status of the remove-brick process. Only execute the remove-brick commit command when remove-brick status shows that data migration is complete. |
| Flags | needinfo?(amukherj) | |||
| Atin Mukherjee | 2016-02-08 07:13:02 UTC | Flags | needinfo?(amukherj) | |
| Laura Bailey | 2016-02-09 00:28:57 UTC | Doc Text | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting: - rebalance - tier - remove-brick This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running. If glusterd fails and restarts on a node where the rebalance process was already running, and remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss. Workaround: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. On the brick on which glusterd restarted, check the status of the remove-brick process. Only execute the remove-brick commit command when remove-brick status shows that data migration is complete. | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting: - rebalance - tier - remove-brick This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running. If glusterd fails and restarts on a node where remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss. Workaround: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete. |
| Flags | needinfo?(amukherj) | |||
| Atin Mukherjee | 2016-02-09 04:12:55 UTC | Flags | needinfo?(amukherj) | |
| Laura Bailey | 2016-02-11 01:50:53 UTC | Doc Text | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting: - rebalance - tier - remove-brick This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running. If glusterd fails and restarts on a node where remove-brick is triggered after the rebalance process on other nodes has already completed, then the remove-brick commit operation will succeed because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss. Workaround: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete. | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting: - rebalance - tier - remove-brick This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running. If glusterd fails and restarts on a node where remove-brick was triggered and the rebalance process is not yet complete, but the rebalance process on other nodes has already completed, then the remove-brick commit operation succeeds because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss. Workaround: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete. |
| Flags | needinfo?(amukherj) | |||
| Atin Mukherjee | 2016-02-11 04:37:03 UTC | Flags | needinfo?(amukherj) | |
| Bhaskarakiran | 2016-02-11 09:05:46 UTC | Keywords | ZStream | |
| Blocks | 1299184 | |||
| John Skeoch | 2016-02-18 00:09:38 UTC | CC | vagarwal | sankarshan |
| Atin Mukherjee | 2016-02-23 05:42:57 UTC | Blocks | 1310972 | |
| Atin Mukherjee | 2016-02-23 08:46:11 UTC | QA Contact | storage-qa-internal | bsrirama |
| Alok | 2016-03-03 09:35:06 UTC | CC | asrivast | |
| Red Hat Bugzilla Rules Engine | 2016-03-08 08:15:57 UTC | Target Release | --- | RHGS 3.1.3 |
| Atin Mukherjee | 2016-03-22 12:02:45 UTC | Status | POST | MODIFIED |
| errata-xmlrpc | 2016-03-24 13:18:16 UTC | Status | MODIFIED | ON_QA |
| Satish Mohan | 2016-03-24 16:12:31 UTC | Fixed In Version | glusterfs-3.7.9-1 | |
| Byreddy | 2016-04-06 04:21:48 UTC | Status | ON_QA | VERIFIED |
| Atin Mukherjee | 2016-05-03 11:02:58 UTC | Doc Text | The defrag variable is not being reinitialized during glusterd restart. This means that if glusterd fails while the following processes are running, it does not reconnect to these processes after restarting: - rebalance - tier - remove-brick This results in these processes continuing to run without communicating with glusterd. Additionally, glusterd does not retain the decommission_is_in_progress flag that is set to indicate that the rebalance process is running. If glusterd fails and restarts on a node where remove-brick was triggered and the rebalance process is not yet complete, but the rebalance process on other nodes has already completed, then the remove-brick commit operation succeeds because glusterd cannot identify that there is an ongoing rebalance operation on the node. This can result in data loss. Workaround: 1. Stop or kill the rebalance process before restarting glusterd. This ensures that a new rebalance process is spawned when glusterd restarts. 2. On the node on which glusterd restarted, check the status of the remove-brick process. Only execute the 'remove-brick commit' command when 'remove-brick status' shows that data migration is complete. | |
| Doc Type | Known Issue | Bug Fix | ||
| Atin Mukherjee | 2016-05-13 04:28:54 UTC | Doc Text | Earlier, if glusterd is restarted on a node when rebalance is still in progress, a remove-brick commit command succeeds even if rebalance hasn't been completed which results into a data loss. With this fix remove brick commit would fail indicating rebalance is still in progress. |
|
| John Skeoch | 2016-06-05 23:39:03 UTC | CC | ggarg | smohan |
| Laura Bailey | 2016-06-06 06:59:21 UTC | Doc Text | Earlier, if glusterd is restarted on a node when rebalance is still in progress, a remove-brick commit command succeeds even if rebalance hasn't been completed which results into a data loss. With this fix remove brick commit would fail indicating rebalance is still in progress. | Previously, when glusterd was restarted on a node while rebalance was still in progress, remove-brick commits succeeded even though rebalance was not yet complete. This resulted in data loss. This update ensures that remove-brick commits fail with appropriate log messages when rebalance is in progress. |
| Flags | needinfo?(amukherj) | |||
| Atin Mukherjee | 2016-06-06 07:01:20 UTC | Flags | needinfo?(amukherj) | |
| errata-xmlrpc | 2016-06-23 00:47:41 UTC | Status | VERIFIED | RELEASE_PENDING |
| errata-xmlrpc | 2016-06-23 05:05:52 UTC | Status | RELEASE_PENDING | CLOSED |
| Resolution | --- | ERRATA | ||
| Last Closed | 2016-06-23 01:05:52 UTC | |||
| Rejy M Cyriac | 2016-09-17 16:32:34 UTC | Sub Component | glusterd | |
| Component | glusterd | glusterd-transition | ||
| Rejy M Cyriac | 2016-09-17 16:45:03 UTC | CC | rhs-bugs, storage-qa-internal, vbellur | |
| Component | glusterd-transition | glusterd |
Back to bug 1303125