Description of problem: Two nodes on CentOS 6.4: 1. stgnode01 (eth0: 192.168.13.51, eth1: 10.10.0.11) 2. stgnode02 (eth0: 192.168.13.52, eth1: 10.10.0.12) with replicated GlusterFS volume by eth1 (connected by Ethernet crossover cable) and CTDB (configured on eth0) providing virtual IP 192.168.13.21. On test node mounted GlusterFS volume on NFS by 192.168.13.21. CTDB pointing from virtual IP to stgnode02. When I copy some big file from test node to mounted NFS share and unplug Ethernet crossover cable, copying of file freezes and after some time unfreeze. Then I replug (when file is still being copied) Ethernet crossover cable between two nodes. When copying of file end, problem appears :) On stgnode02 file have proper size, on stgnode01 there is only part of the file (looks like it stop to grow after cable unplug). ####################################################### gluster vol heal datavol info Gathering Heal info on volume datavol has been successful Brick stgnode01:/bricks/data Number of entries: 1 /test.iso Brick stgnode02:/bricks/data Number of entries: 1 /test.iso ####################################################### The heal process does not end after 12h, getfattr shows: ####################################################### [root@stgnode01 glusterfs]# getfattr -d -m '.*' -e hex /bricks/data/test.iso # file: bricks/data/test.iso trusted.afr.datavol-client-0=0x000000010000000000000000 trusted.afr.datavol-client-1=0x000000010000000000000000 trusted.gfid=0x6ed4b51f90634fce993c097b8d77e4fe [root@stgnode02 data]# getfattr -d -m '.*' -e hex /bricks/data/test.iso # file: bricks/data/test.iso trusted.afr.datavol-client-0=0x00009a230000000300000000 trusted.afr.datavol-client-1=0x000000000000000000000000 trusted.gfid=0x6ed4b51f90634fce993c097b8d77e4fe ####################################################### Procedure that help: 1. service glusterd stop 2. pkill glusterd 3. service glusterd start After a few seconds "gluster volume heal datavol info" shows 0 entries, file-size of test.iso is identical on two nodes, getfattr shows: trusted.afr.datavol-client-0=0x000000000000000000000000 trusted.afr.datavol-client-1=0x000000000000000000000000 Version-Release number of selected component (if applicable): glusterfs-3.4.0-0.5.beta2.el6.x86_64 GlusterFS volume created as: vol create datavol replica 2 stgnode01:/bricks/data stgnode02:/bricks/data vol set datavol nfs.port 2049
Anuradha, I think this bug should already be fixed in 3.5.x and upstream. Could you please verify and close the bug if it is indeed working. Pranith
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed.