Description of problem:
glusterd is killed on a node, and then remove-brick is started on a volume which has bricks residing on the node where glusterd is brought down. After glusterd is brought up, subsequent volume status commands for that particular volume fail
with the message -
Commit failed on localhost. Please check the log file for more details.
From the logs -
[2014-01-06 15:54:51.430841] E [glusterd-op-sm.c:2021:_add_remove_bricks_to_dict] 0-management: Failed to get brick count
[2014-01-06 15:54:51.430914] E [glusterd-op-sm.c:2085:_add_task_to_dict] 0-management: Failed to add remove bricks to dict
[2014-01-06 15:54:51.430927] E [glusterd-op-sm.c:2170:glusterd_aggregate_task_status] 0-management: Failed to add task details to dict
[2014-01-06 15:54:51.430938] E [glusterd-op-sm.c:4037:glusterd_op_ac_commit_op] 0-management: Commit of operation 'Volume Status' failed: -22
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a distributed-replicate volume ( 2x2, with one brick on each server in a 4-server cluster ), start and mount, create data on mount point.
2. Kill glusterd on node1 and node2 ( these hold bricks that form one replica pair )
3. Start remove-brick on the volume.
4. Start glusterd on node1 and node2.
5. Run 'gluster volume status' command for that volume on any of the nodes.
volume status command fails with the above described message.
volume status command should not fail.
Cloning this to 3.1. To be fixed in future.