Bug 857549 - brick/server replacement isn't working as documented....
brick/server replacement isn't working as documented....
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.3.0
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: vsomyaju
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-14 15:45 EDT by Rob.Hendelman
Modified: 2015-03-04 19:06 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-22 06:51:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rob.Hendelman 2012-09-14 15:45:05 EDT
Description of problem:
gluster brick replacement issue

Version-Release number of selected component (if applicable):
3.3.0-git-master-a032de191ec32be363d1feedfbd839f6dbde2579

How reproducible:

Steps to Reproduce:
1. Setup 2 nodes (A+B) with replicated bricks
2. Start rsyncing files via native gluster client
3. Shut down one gluster server (B) & unmount bricks.
4. Reformat bricks on B
5. apt-get purge glusterfs on B
6. Remove anything left over in /etc/glusterfs and /var/lib/glusterd on B
7. dpkg -i glusterpackage.deb on B
8. Reformat & mount bricks on B
9. follow the procedure at http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server
10. Gluster writes new files to both, but won't heal existing files that are on 

Actual results:

New files replicated, old files don't heal:
on B:
# gluster volume heal data
operation failed
on A:
# gluster volume heal data
Launching Heal operation on volume data has been successful
Use heal info commands to check status
# gluster volume heal data info
# gluster volume heal data info
operation failed

Expected results:
healing to work

Additional info:
Comment 1 mailbox 2012-10-10 06:34:49 EDT
What if you issue a self-heal on all the files of a volume?

# gluster volume heal VOLNAME full
Comment 2 Rob.Hendelman 2012-10-22 08:37:28 EDT
I get "operation failed" as well, IIRC.
Comment 3 Rob.Hendelman 2013-01-28 10:39:01 EST
I still have this issue in 3.3.1:

I have a quorum setup (fixed, 2 nodes) out of 3 possible nodes.

I shut down node 3, and rsync a bunch of stuff to nodes 1/2.

When I bring up node3, and try to get info, I get the following:

gluster> volume heal newdata info 
operation failed

gluster> volume heal newdata info 
operation failed

gluster> volume heal newdata full
Launching Heal operation on volume newdata has been successful
Use heal info commands to check status

gluster> volume heal newdata info 
gluster>
Comment 4 Rob.Hendelman 2013-01-28 10:47:34 EST
Sorry, Comment #3 was meant to go to ticket #829170, my mistake.  Can an admin move it there?
Comment 5 Vijay Bellur 2013-02-03 14:38:03 EST
CHANGE: http://review.gluster.org/4451 (Tests: Basic pump test case) merged in master by Anand Avati (avati@redhat.com)
Comment 6 Vijay Bellur 2013-02-03 14:44:27 EST
CHANGE: http://review.gluster.org/4450 (Tests: util to check if replace brick completed) merged in master by Anand Avati (avati@redhat.com)
Comment 7 vsomyaju 2013-03-22 06:51:15 EDT
I am closing this  bug as I have replicated the same process as it is mentioned in the doc along with the steps provided above. It works for me.

root@venkatesh-U1:/home/venkatesh/mount_brick2# ps ax | grep glusterfsd
11049 ?        Ssl    0:04 /usr/local/sbin/glusterfsd -s localhost --volfile-id volume1.192.168.122.181.home-venkatesh-mount_brick2 -p /var/lib/glusterd/vols/volume1/run/192.168.122.181-home-venkatesh-mount_brick2.pid -S /tmp/7d80f62eed5c3d111659cca6f79602fe.socket --brick-name /home/venkatesh/mount_brick2 -l /usr/local/var/log/glusterfs/bricks/home-venkatesh-mount_brick2.log --xlator-option *-posix.glusterd-uuid=dcf618f8-f74c-4560-a654-e4923c37914f --brick-port 49152 --xlator-option volume1-server.listen-port=49152



root@venkatesh-U1:/home/venkatesh/mount_brick2# kill -9 11049


root@venkatesh-U1:/home/venkatesh/mount_brick2# killall -r gluster


root@venkatesh-U1:/home/venkatesh# umount /home/venkatesh/mount_brick2/


root@venkatesh-U1:/home/venkatesh# umount /home/venkatesh/mount_brick2/

root@venkatesh-U1:/home/venkatesh# mkfs -t ext3 -m 1 -v /dev/loop0
mke2fs 1.42.5 (29-Jul-2012)
  .....
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

root@venkatesh-U1:/home/venkatesh# mount -t ext3 /dev/loop0 /home/venkatesh/mount_brick2/

root@venkatesh-U1:/home/venkatesh# rm -rf /var/lib/glusterd/
root@venkatesh-U1:/home/venkatesh# rm -rf /usr/local/var/log/glusterfs
root@venkatesh-U1:/home/venkatesh# rm -rf /usr/local/sbin/

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# make install

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# echo UUID=dcf618f8-f74c-4560-a654-e4923c37914f >/var/lib/glusterd/glusterd.info
glusterd


root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# gluster peer probe 192.168.122.187
peer probe: success

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# killall -r gluster

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# glusterd

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# gluster volume info
 
Volume Name: volume1
Type: Replicate
Volume ID: 18334691-b07a-4032-bffe-6b282b5f3c64
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.122.187:/home/venkatesh/mount_brick1
Brick2: 192.168.122.181:/home/venkatesh/mount_brick2


root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# killall -r gluster

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# glusterd

root@venkatesh-U1:/home/venkatesh/glusterrep/glusterfs# cd /home/venkatesh/mount_brick2/


root@venkatesh-U1:/home/venkatesh/mount_brick2# ls | wc -l
1234

WHICH IS SAME AS ON THE BRICK1
root@venkatesh-U2:/home/venkatesh/mount_brick1# ls | wc -l
1234



Please feel free to reopen the bug, if the issue is found again.

Note You need to log in before you can comment on or make changes to this bug.