Bug 764984 (GLUSTER-3252)

Summary: glusterd doesn't report replace-brick 'status' as expected, after being 'paused'.
Product: [Community] GlusterFS Reporter: krishnan parthasarathi <kparthas>
Component: cliAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, nsathyan, rabhat, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description krishnan parthasarathi 2011-07-26 11:31:48 UTC
This bug tracks the following issues with glusterd in the context of a replace-brick operation.

glusterd,
- reports status unknown after replace-brick operation is paused.
- 'forgets' an ongoing replace-brick operation on going down.

Comment 1 Anand Avati 2011-07-27 05:26:04 UTC
CHANGE: http://review.gluster.com/110 (This change ensures that glusterd retains 'state' information) merged in release-3.1 by Anand Avati (avati)

Comment 2 Anand Avati 2011-07-27 05:26:27 UTC
CHANGE: http://review.gluster.com/107 (This change ensures that glusterd retains 'state' information) merged in release-3.2 by Anand Avati (avati)

Comment 3 krishnan parthasarathi 2011-07-27 05:42:27 UTC
*** Bug 3121 has been marked as a duplicate of this bug. ***

Comment 4 Anand Avati 2011-07-27 05:50:34 UTC
CHANGE: http://review.gluster.com/6 (This change ensures that glusterd retains 'state' information) merged in master by Anand Avati (avati)

Comment 5 krishnan parthasarathi 2011-07-27 05:59:56 UTC
Tests to verify fix:

Scenario 1:
- Start a replace brick operation from glusterd.
- 'Pause' the replace-brick operation.
- Query the status of the replace-brick operation.

Earlier, gluster cli use to report "replace brick status unknown". With this fix, you should see a message saying, "replace brick has been paused."

Scenario 2:
- Start a replace brick operation from glusterd.
- Kill the (same) glusterd process.
- Restart glusterd and  'continue' the replace-brick operation from wherever it  was left. 

Earlier, resuming replace brick operation after a 'glusterd down' event was not possible. With this fix, glusterd must 'continue' the replace brick operation after coming up.

Comment 6 Raghavendra Bhat 2011-07-29 05:31:04 UTC
gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 pause
replace-brick paused successfully
# gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 status
replace brick has been paused




gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 status
Number of files migrated = 157       Current file= /dir/linux-2.6.31.1/firmware/edgeport/down2.H16 
# gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 status
Number of files migrated = 466       Current file= /dir/linux-2.6.31.1/include/video/maxinefb.h 
# killall glusterd
# !ps
ps aux | grep gluster
raghu    10136  0.0  0.1  42688  2756 pts/0    S+   12:02   0:00 ssh shell.gluster.com -l raghavendrabhat
root     14777  4.4  0.8 148060 17036 ?        Ssl  12:26   3:39 /usr/local/sbin/glusterfsd --xlator-option mirror-server.listen-port=24012 -s localhost --volfile-id mirror.bigbang.e-glusterfs-export-export -p /etc/glusterd/vols/mirror/run/bigbang-e-glusterfs-export-export.pid --brick-name /e/glusterfs/export/export --brick-port 24012 -l /usr/local/var/log/glusterfs/bricks/e-glusterfs-export-export.log
root     14859  5.6  3.7 266560 74796 ?        Ssl  12:26   4:40 /usr/local/sbin/glusterfs --log-level=NORMAL --volfile-id=mirror --volfile-server=bigbang /mnt/client
root     30378 11.3  0.5  60908 12040 ?        Ssl  13:47   0:08 /usr/local/sbin/glusterfs -f /etc/glusterd/vols/mirror/rb_dst_brick.vol -p /etc/glusterd/vols/mirror/rb_dst_brick.pid --xlator-option src-server.listen-port=24013
root     30469 18.4  0.9 165384 18972 ?        Ssl  13:48   0:03 /usr/local/sbin/glusterfsd --xlator-option mirror-server.listen-port=24011 -s localhost --volfile-id mirror.bigbang.d-glusterfs-export-export -p /etc/glusterd/vols/mirror/run/bigbang-d-glusterfs-export-export.pid --brick-name /d/glusterfs/export/export --brick-port 24011 -l /usr/local/var/log/glusterfs/bricks/d-glusterfs-export-export.log
root     30473  0.1  1.9 165668 38844 ?        Ssl  13:48   0:00 /usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /usr/local/var/log/glusterfs/nfs.log
root     30521  0.0  0.0   7628   980 pts/5    S+   13:48   0:00 grep --color=auto gluster
# gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 status
Connection failed. Please check if gluster daemon is operational.
# glusterd
# gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 status
Number of files migrated = 1669       Current file= /dir/linux-2.6.31.1/include/linux/if_ec.h 
# gluster volume replace-brick mirror bigbang:/d/glusterfs/export/export bigbang:/e/glusterfs/export/export8 status
Number of files migrated = 1866       Current file= /dir/linux-2.6.31.1/include/linux/byteorder 



Seems to be fixed. Since killing glusterd and then starting it resumed replace-brick and replcace-brick pause and status serially gave right information.