Bug 973183 - Network down an up on one brick cause self-healing won't work until glusterd restart
Network down an up on one brick cause self-healing won't work until glusterd ...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.4.0-beta
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Anuradha
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-11 08:36 EDT by Marcin Garski
Modified: 2016-09-19 22:00 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-01 02:35:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Marcin Garski 2013-06-11 08:36:31 EDT
Description of problem:
Two nodes on CentOS 6.4:
1. stgnode01 (eth0: 192.168.13.51, eth1: 10.10.0.11)
2. stgnode02 (eth0: 192.168.13.52, eth1: 10.10.0.12)
with replicated GlusterFS volume by eth1 (connected by Ethernet crossover cable) and CTDB (configured on eth0) providing virtual IP 192.168.13.21.
On test node mounted GlusterFS volume on NFS by 192.168.13.21. CTDB pointing from virtual IP to stgnode02.

When I copy some big file from test node to mounted NFS share and unplug Ethernet crossover cable, copying of file freezes and after some time unfreeze. Then I replug (when file is still being copied) Ethernet crossover cable between two nodes.

When copying of file end, problem appears :) On stgnode02 file have proper size, on stgnode01 there is only part of the file (looks like it stop to grow after cable unplug).

#######################################################
gluster vol heal datavol info
Gathering Heal info on volume datavol has been successful

Brick stgnode01:/bricks/data
Number of entries: 1
/test.iso

Brick stgnode02:/bricks/data
Number of entries: 1
/test.iso
#######################################################

The heal process does not end after 12h, getfattr shows:

#######################################################
[root@stgnode01 glusterfs]# getfattr -d -m '.*' -e hex /bricks/data/test.iso
# file: bricks/data/test.iso
trusted.afr.datavol-client-0=0x000000010000000000000000
trusted.afr.datavol-client-1=0x000000010000000000000000
trusted.gfid=0x6ed4b51f90634fce993c097b8d77e4fe


[root@stgnode02 data]# getfattr -d -m '.*' -e hex /bricks/data/test.iso
# file: bricks/data/test.iso
trusted.afr.datavol-client-0=0x00009a230000000300000000
trusted.afr.datavol-client-1=0x000000000000000000000000
trusted.gfid=0x6ed4b51f90634fce993c097b8d77e4fe
#######################################################

Procedure that help:
1. service glusterd stop
2. pkill glusterd
3. service glusterd start

After a few seconds "gluster volume heal datavol info" shows 0 entries, file-size of test.iso is identical on two nodes, getfattr shows:
trusted.afr.datavol-client-0=0x000000000000000000000000
trusted.afr.datavol-client-1=0x000000000000000000000000

Version-Release number of selected component (if applicable):
glusterfs-3.4.0-0.5.beta2.el6.x86_64

GlusterFS volume created as:
vol create datavol replica 2 stgnode01:/bricks/data stgnode02:/bricks/data
vol set datavol nfs.port 2049
Comment 1 Pranith Kumar K 2014-07-13 03:22:05 EDT
Anuradha,
    I think this bug should already be fixed in 3.5.x and upstream. Could you please verify and close the bug if it is indeed working.

Pranith
Comment 2 Niels de Vos 2015-05-17 18:00:17 EDT
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs@gluster.org".

If there is no response by the end of the month, this bug will get automatically closed.

Note You need to log in before you can comment on or make changes to this bug.