Description of problem: It seems to me that aggregation is not happened properly,per-fs-entry xtimes seem to be taken randomly from certain bricks, and xtime inversions(fs entries with greater xtime than their parent) occur. Version-Release number of selected component (if applicable): [fa5b0347193f8d1a4b917a2edb338423cb175e66] How reproducible:Always Steps to Reproduce: 1.Start a geo-replication session wirh multiple brick master. or just enable indexing for that volume with multiple brick. 2.Mount it on two client one with --client-pid=-1 and one normal glusterfs mount. 3.mkdir d1/d2/d3/d4/d5 on the normal mount point , and check the xtime of all the directories on the other client. They should be same and greatest among the bricks . Additional info: | client1/ ./ | 1331647790.22365 d1/ | 1331647790.22270 d1/d2/ | 1331647790.22365 d1/d2/d3/ | 1331647790.22365 d1/d2/d3/d4/ | 1331647790.22502 d1/d2/d3/d4/d5/ | 1331647790.22270 | /root/bricks/doa/d1 ./ | 1331647790.22365 d1/ | 1331647790.22365 > d1/d2/ | 1331647790.22365 > d1/d2/d3/ | 1331647790.22365 > d1/d2/d3/d4/ | 1331647790.22365 > d1/d2/d3/d4/d5/ | 1331647790.22365 > 3:28 PM | /root/bricks/doa/d2 > ./ | 1331647790.22365 > d1/ | 1331647790.22365 > d1/d2/ | 1331647790.22365 > d1/d2/d3/ | 1331647790.22365 > d1/d2/d3/d4/ | 1331647790.22365 > d1/d2/d3/d4/d5/ | 1331647790.22365 > | /root/bricks/doa/d3 > ./ | 1331647790.22502 > d1/ | 1331647790.22502 > d1/d2/ | 1331647790.22502 > d1/d2/d3/ | 1331647790.22502 > d1/d2/d3/d4/ | 1331647790.22502 > d1/d2/d3/d4/d5/ | 1331647790.22502 > | /root/bricks/doa/d4 > ./ | 1331647790.22270 > d1/ | 1331647790.22270 > d1/d2/ | 1331647790.22270 > d1/d2/d3/ | 1331647790.22270 > d1/d2/d3/d4/ | 1331647790.22270 > d1/d2/d3/d4/d5/ | 1331647790.22270
OK, this looks interesting. Actually, it's not a problem but still it is a problem. So here it is: All parts of the code that are related to *.xtime xattr are conditional expressions which look something like (in *_getxattr() for cluster translators) if (*priv->vol_uuid) { if ((match_uuid_local (name, priv->vol_uuid) == 0) && (-1 == frame->root->pid)) { [...] // call cluster_getmarkerattr() which winds to all subvolumes [...] } All of this looks ok (client-pid is -1 which Vijaykumar used to query xtime and the uuid match too). But somehow priv->vol_uuid is all filled with '\0' which makes this part of the code defunct and take the normal xattr code path which winds to only a single subvol (for replicate and the likes for other cluster translators) But when geo-rep does an aux-mount and queries xtime, priv->vol_uuid is correct (i.e. the relevant volume-id). Hence, geo-rep gets the correct (max) xtime on the fs tree. The reason why priv->vol_uuid is empty for Vijaykumar's mount, it gets initialized when you request for volume-mark xattr from the special mount. So first request for "trusted.glusterfs.volume-mark" and then run the xtime.rb script; you should be happy :-)
Good analysis Venky -- sorry for robbing your time, I should have remembered this. As I recall, we had some semantical issue prior to the first query for volume-mark, and given that its used only within geo-rep where we can know that gsyncd starts with a vol-mark query, we could afford the dumb solution of having undefined behavior prior to the first query for vol-mark.