Bug 764653 (GLUSTER-2921)

Summary: gfid not being replicated to all replica in 4 x 3 volume
Product: [Community] GlusterFS Reporter: Joe Julian <joe>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.1.4CC: gluster-bugs, jdarcy, pierre.francois, sgowda, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Trace level log files and state dumps from the bricks and client. none

Description Joe Julian 2011-05-20 20:11:43 UTC
Having upgraded from 3.0.7, I find that gfid's are getting out-of-sync across the servers. My original volfile configuration is the well know one from my blog ( http://goo.gl/EH4x ).

The new configuration matches the bricks:
Volume Name: gluster1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: ewcs2:/cluster/0
Brick2: ewcs4:/cluster/0
Brick3: ewcs7:/cluster/0
Brick4: ewcs2:/cluster/1
Brick5: ewcs4:/cluster/1
Brick6: ewcs7:/cluster/1
Brick7: ewcs2:/cluster/2
Brick8: ewcs4:/cluster/2
Brick9: ewcs7:/cluster/2
Brick10: ewcs2:/cluster/3
Brick11: ewcs4:/cluster/3
Brick12: ewcs7:/cluster/3
Options Reconfigured:
diagnostics.brick-log-level: INFO
network.frame-timeout: 600
diagnostics.client-log-level: INFO


(network.frame-timeout is overridden just for debugging purposes)

The attached trace level logs and state dumps were generated from the client command:
stat /mnt/glusterfs/centos/5.6/centosplus/i386/RPMS/xfsdump-2.2.46-1.el5.centos.i386.rpm

where /mnt/glusterfs was the client mountpoint.

I don't believe this bug is related to bug 764196, but it might be worth looking into.

Comment 1 Pranith Kumar K 2011-05-23 08:51:25 UTC
(In reply to comment #0)
> Created an attachment (id=493) [details]
> Trace level log files and state dumps from the bricks and client.
> 
> Having upgraded from 3.0.7, I find that gfid's are getting out-of-sync across
> the servers. My original volfile configuration is the well know one from my
> blog ( http://goo.gl/EH4x ).
> 
> The new configuration matches the bricks:
> Volume Name: gluster1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 4 x 3 = 12
> Transport-type: tcp
> Bricks:
> Brick1: ewcs2:/cluster/0
> Brick2: ewcs4:/cluster/0
> Brick3: ewcs7:/cluster/0
> Brick4: ewcs2:/cluster/1
> Brick5: ewcs4:/cluster/1
> Brick6: ewcs7:/cluster/1
> Brick7: ewcs2:/cluster/2
> Brick8: ewcs4:/cluster/2
> Brick9: ewcs7:/cluster/2
> Brick10: ewcs2:/cluster/3
> Brick11: ewcs4:/cluster/3
> Brick12: ewcs7:/cluster/3
> Options Reconfigured:
> diagnostics.brick-log-level: INFO
> network.frame-timeout: 600
> diagnostics.client-log-level: INFO
> 
> 
> (network.frame-timeout is overridden just for debugging purposes)
> 
> The attached trace level logs and state dumps were generated from the client
> command:
> stat
> /mnt/glusterfs/centos/5.6/centosplus/i386/RPMS/xfsdump-2.2.46-1.el5.centos.i386.rpm
> 
> where /mnt/glusterfs was the client mountpoint.
> 
> I don't believe this bug is related to bug 764196, but it might be worth looking
> into.

hi,
  We are looking into this with priority, just want to check with you if there is only one client or more than one client accessing the volume. I mean are there multiple mount points for the same volume?.

Pranith

Comment 2 Joe Julian 2011-05-23 11:32:52 UTC
/me tries to remember how he built his test at 3am...

I'm sure there were multiple clients, at least on multiple workstations. 

On just this one workstation, normally I would have 2 mountpoints (one is just mounted with debug logs in case I don't want to change the entire volume just for one quick test), but for these logs I only mounted it once.

Comment 3 Pranith Kumar K 2011-08-08 07:28:04 UTC
I think this bug is already fixed on the master. Need to check once and update. This should not happen for the replicas now as it takes entry locks on all the children and then updates the gfid.

Comment 4 Pranith Kumar K 2011-10-19 10:20:53 UTC
This is fixed for both 3.2.5 and 3.3.