Bug 803588 - Aggregation in geo-replication is not happening properly.
Summary: Aggregation in geo-replication is not happening properly.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Venky Shankar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-15 07:06 UTC by Vijaykumar Koppad
Modified: 2014-08-25 00:49 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-16 09:51:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Vijaykumar Koppad 2012-03-15 07:06:09 UTC
Description of problem:
It seems to me that aggregation is not happened properly,per-fs-entry xtimes
seem to be taken randomly from certain bricks, and xtime inversions(fs entries
with greater xtime than their parent) occur.

Version-Release number of selected component (if applicable):
[fa5b0347193f8d1a4b917a2edb338423cb175e66]

How reproducible:Always 

Steps to Reproduce:
1.Start a geo-replication session wirh multiple brick master. or just enable indexing for that volume with multiple brick.
2.Mount it on two client one with  --client-pid=-1 and one normal glusterfs mount. 
3.mkdir d1/d2/d3/d4/d5 on the normal mount point , and check the xtime of all the directories  on the other client.
  
They should be same and greatest among the bricks .

Additional info:
| client1/
./ | 1331647790.22365
d1/ | 1331647790.22270
d1/d2/ | 1331647790.22365
d1/d2/d3/ | 1331647790.22365
d1/d2/d3/d4/ | 1331647790.22502
d1/d2/d3/d4/d5/ | 1331647790.22270

| /root/bricks/doa/d1
./ | 1331647790.22365
d1/ | 1331647790.22365
> d1/d2/ | 1331647790.22365
> d1/d2/d3/ | 1331647790.22365
> d1/d2/d3/d4/ | 1331647790.22365
> d1/d2/d3/d4/d5/ | 1331647790.22365
> 3:28 PM | /root/bricks/doa/d2
> ./ | 1331647790.22365
> d1/ | 1331647790.22365
> d1/d2/ | 1331647790.22365
> d1/d2/d3/ | 1331647790.22365
> d1/d2/d3/d4/ | 1331647790.22365
> d1/d2/d3/d4/d5/ | 1331647790.22365
> | /root/bricks/doa/d3
> ./ | 1331647790.22502
> d1/ | 1331647790.22502
> d1/d2/ | 1331647790.22502
> d1/d2/d3/ | 1331647790.22502
> d1/d2/d3/d4/ | 1331647790.22502
> d1/d2/d3/d4/d5/ | 1331647790.22502
> | /root/bricks/doa/d4
> ./ | 1331647790.22270
> d1/ | 1331647790.22270
> d1/d2/ | 1331647790.22270
> d1/d2/d3/ | 1331647790.22270
> d1/d2/d3/d4/ | 1331647790.22270
> d1/d2/d3/d4/d5/ | 1331647790.22270

Comment 1 Venky Shankar 2012-03-16 07:18:35 UTC
OK, this looks interesting. Actually, it's not a problem but still it is a problem. So here it is:

All parts of the code that are related to *.xtime xattr are conditional expressions which look something like (in *_getxattr() for cluster translators)

if (*priv->vol_uuid) {
    if ((match_uuid_local (name, priv->vol_uuid) == 0)
        && (-1 == frame->root->pid)) {

        [...]

        // call cluster_getmarkerattr() which winds to all subvolumes

        [...]

}


All of this looks ok (client-pid is -1 which Vijaykumar used to query xtime and the uuid match too). But somehow priv->vol_uuid is all filled with '\0' which makes this part of the code defunct and take the normal xattr code path which winds to only a single subvol (for replicate and the likes for other cluster translators)

But when geo-rep does an aux-mount and queries xtime, priv->vol_uuid is correct (i.e. the relevant volume-id). Hence, geo-rep gets the correct (max) xtime on the fs tree.

The reason why priv->vol_uuid is empty for Vijaykumar's mount, it gets initialized when you request for volume-mark xattr from the special mount. So first request for "trusted.glusterfs.volume-mark" and then run the xtime.rb script; you should be happy :-)

Comment 2 Csaba Henk 2012-03-24 17:14:51 UTC
Good analysis Venky -- sorry for robbing your time, I should have remembered this.

As I recall, we had some semantical issue prior to the first query for volume-mark, and given that its used only within geo-rep where we can know that gsyncd starts with a vol-mark query, we could afford the dumb solution of having undefined behavior prior to the first query for vol-mark.


Note You need to log in before you can comment on or make changes to this bug.