Bug 803588 - Aggregation in geo-replication is not happening properly.
Aggregation in geo-replication is not happening properly.
Status: CLOSED NOTABUG
Product: GlusterFS
Classification: Community
Component: geo-replication (Show other bugs)
mainline
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Venky Shankar
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-15 03:06 EDT by Vijaykumar Koppad
Modified: 2014-08-24 20:49 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-03-16 05:51:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2012-03-15 03:06:09 EDT
Description of problem:
It seems to me that aggregation is not happened properly,per-fs-entry xtimes
seem to be taken randomly from certain bricks, and xtime inversions(fs entries
with greater xtime than their parent) occur.

Version-Release number of selected component (if applicable):
[fa5b0347193f8d1a4b917a2edb338423cb175e66]

How reproducible:Always 

Steps to Reproduce:
1.Start a geo-replication session wirh multiple brick master. or just enable indexing for that volume with multiple brick.
2.Mount it on two client one with  --client-pid=-1 and one normal glusterfs mount. 
3.mkdir d1/d2/d3/d4/d5 on the normal mount point , and check the xtime of all the directories  on the other client.
  
They should be same and greatest among the bricks .

Additional info:
| client1/
./ | 1331647790.22365
d1/ | 1331647790.22270
d1/d2/ | 1331647790.22365
d1/d2/d3/ | 1331647790.22365
d1/d2/d3/d4/ | 1331647790.22502
d1/d2/d3/d4/d5/ | 1331647790.22270

| /root/bricks/doa/d1
./ | 1331647790.22365
d1/ | 1331647790.22365
> d1/d2/ | 1331647790.22365
> d1/d2/d3/ | 1331647790.22365
> d1/d2/d3/d4/ | 1331647790.22365
> d1/d2/d3/d4/d5/ | 1331647790.22365
> 3:28 PM | /root/bricks/doa/d2
> ./ | 1331647790.22365
> d1/ | 1331647790.22365
> d1/d2/ | 1331647790.22365
> d1/d2/d3/ | 1331647790.22365
> d1/d2/d3/d4/ | 1331647790.22365
> d1/d2/d3/d4/d5/ | 1331647790.22365
> | /root/bricks/doa/d3
> ./ | 1331647790.22502
> d1/ | 1331647790.22502
> d1/d2/ | 1331647790.22502
> d1/d2/d3/ | 1331647790.22502
> d1/d2/d3/d4/ | 1331647790.22502
> d1/d2/d3/d4/d5/ | 1331647790.22502
> | /root/bricks/doa/d4
> ./ | 1331647790.22270
> d1/ | 1331647790.22270
> d1/d2/ | 1331647790.22270
> d1/d2/d3/ | 1331647790.22270
> d1/d2/d3/d4/ | 1331647790.22270
> d1/d2/d3/d4/d5/ | 1331647790.22270
Comment 1 Venky Shankar 2012-03-16 03:18:35 EDT
OK, this looks interesting. Actually, it's not a problem but still it is a problem. So here it is:

All parts of the code that are related to *.xtime xattr are conditional expressions which look something like (in *_getxattr() for cluster translators)

if (*priv->vol_uuid) {
    if ((match_uuid_local (name, priv->vol_uuid) == 0)
        && (-1 == frame->root->pid)) {

        [...]

        // call cluster_getmarkerattr() which winds to all subvolumes

        [...]

}


All of this looks ok (client-pid is -1 which Vijaykumar used to query xtime and the uuid match too). But somehow priv->vol_uuid is all filled with '\0' which makes this part of the code defunct and take the normal xattr code path which winds to only a single subvol (for replicate and the likes for other cluster translators)

But when geo-rep does an aux-mount and queries xtime, priv->vol_uuid is correct (i.e. the relevant volume-id). Hence, geo-rep gets the correct (max) xtime on the fs tree.

The reason why priv->vol_uuid is empty for Vijaykumar's mount, it gets initialized when you request for volume-mark xattr from the special mount. So first request for "trusted.glusterfs.volume-mark" and then run the xtime.rb script; you should be happy :-)
Comment 2 Csaba Henk 2012-03-24 13:14:51 EDT
Good analysis Venky -- sorry for robbing your time, I should have remembered this.

As I recall, we had some semantical issue prior to the first query for volume-mark, and given that its used only within geo-rep where we can know that gsyncd starts with a vol-mark query, we could afford the dumb solution of having undefined behavior prior to the first query for vol-mark.

Note You need to log in before you can comment on or make changes to this bug.