Bug 803588

Summary: Aggregation in geo-replication is not happening properly.
Product: [Community] GlusterFS Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Venky Shankar <vshankar>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bbandari, csaba, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-16 09:51:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vijaykumar Koppad 2012-03-15 07:06:09 UTC
Description of problem:
It seems to me that aggregation is not happened properly,per-fs-entry xtimes
seem to be taken randomly from certain bricks, and xtime inversions(fs entries
with greater xtime than their parent) occur.

Version-Release number of selected component (if applicable):
[fa5b0347193f8d1a4b917a2edb338423cb175e66]

How reproducible:Always 

Steps to Reproduce:
1.Start a geo-replication session wirh multiple brick master. or just enable indexing for that volume with multiple brick.
2.Mount it on two client one with  --client-pid=-1 and one normal glusterfs mount. 
3.mkdir d1/d2/d3/d4/d5 on the normal mount point , and check the xtime of all the directories  on the other client.
  
They should be same and greatest among the bricks .

Additional info:
| client1/
./ | 1331647790.22365
d1/ | 1331647790.22270
d1/d2/ | 1331647790.22365
d1/d2/d3/ | 1331647790.22365
d1/d2/d3/d4/ | 1331647790.22502
d1/d2/d3/d4/d5/ | 1331647790.22270

| /root/bricks/doa/d1
./ | 1331647790.22365
d1/ | 1331647790.22365
> d1/d2/ | 1331647790.22365
> d1/d2/d3/ | 1331647790.22365
> d1/d2/d3/d4/ | 1331647790.22365
> d1/d2/d3/d4/d5/ | 1331647790.22365
> 3:28 PM | /root/bricks/doa/d2
> ./ | 1331647790.22365
> d1/ | 1331647790.22365
> d1/d2/ | 1331647790.22365
> d1/d2/d3/ | 1331647790.22365
> d1/d2/d3/d4/ | 1331647790.22365
> d1/d2/d3/d4/d5/ | 1331647790.22365
> | /root/bricks/doa/d3
> ./ | 1331647790.22502
> d1/ | 1331647790.22502
> d1/d2/ | 1331647790.22502
> d1/d2/d3/ | 1331647790.22502
> d1/d2/d3/d4/ | 1331647790.22502
> d1/d2/d3/d4/d5/ | 1331647790.22502
> | /root/bricks/doa/d4
> ./ | 1331647790.22270
> d1/ | 1331647790.22270
> d1/d2/ | 1331647790.22270
> d1/d2/d3/ | 1331647790.22270
> d1/d2/d3/d4/ | 1331647790.22270
> d1/d2/d3/d4/d5/ | 1331647790.22270

Comment 1 Venky Shankar 2012-03-16 07:18:35 UTC
OK, this looks interesting. Actually, it's not a problem but still it is a problem. So here it is:

All parts of the code that are related to *.xtime xattr are conditional expressions which look something like (in *_getxattr() for cluster translators)

if (*priv->vol_uuid) {
    if ((match_uuid_local (name, priv->vol_uuid) == 0)
        && (-1 == frame->root->pid)) {

        [...]

        // call cluster_getmarkerattr() which winds to all subvolumes

        [...]

}


All of this looks ok (client-pid is -1 which Vijaykumar used to query xtime and the uuid match too). But somehow priv->vol_uuid is all filled with '\0' which makes this part of the code defunct and take the normal xattr code path which winds to only a single subvol (for replicate and the likes for other cluster translators)

But when geo-rep does an aux-mount and queries xtime, priv->vol_uuid is correct (i.e. the relevant volume-id). Hence, geo-rep gets the correct (max) xtime on the fs tree.

The reason why priv->vol_uuid is empty for Vijaykumar's mount, it gets initialized when you request for volume-mark xattr from the special mount. So first request for "trusted.glusterfs.volume-mark" and then run the xtime.rb script; you should be happy :-)

Comment 2 Csaba Henk 2012-03-24 17:14:51 UTC
Good analysis Venky -- sorry for robbing your time, I should have remembered this.

As I recall, we had some semantical issue prior to the first query for volume-mark, and given that its used only within geo-rep where we can know that gsyncd starts with a vol-mark query, we could afford the dumb solution of having undefined behavior prior to the first query for vol-mark.