Description of problem ====================== When gluster client asks for timestamp of particular file stored on n-way replicated gluster volume, it can obtain different values based on which brick it talks to. This means that process accessing guster volume can't expect file timestamp values to be consistent across trusted storage pool and so it can't reliably compare entire timestamps of files stored on glusterfs replicated volume. Version-Release number of selected component (if applicable) ============================================================ glusterfs-3.7.1-4.el6rhs.x86_64 How reproducible ================ 100 % Steps to Reproduce ================== 1. Create 2-way replicated gluster volume (just 2 bricks are enough, but each brick *must be* hosted on different machine). 2. Mount this volume from client machine outside of trusted storage pool. 3. Make sure ntpd is configured and time synchronized on all machines. 4. On client machine, create new file on the volume. 5. On client machine, check the timestamp of this new file using `stat` tool. 6. Compare timestamps of this file from each peer node hosting the bricks (one from replicated pair which stores this particular file). Summary of minimal environment to reproduce this issue: * 2 gluster peer nodes (aka storage nodes) * each peer node hosts one brick * one client machine outside of trusted storage pool Actual results ============== On glusterfs client machine, creating new file on gluster volume while checking it's timestamp: ~~~ [root@dhcp-37-182 timestamp]# uname -a > file01 [root@dhcp-37-182 timestamp]# stat file01 File: `file01' Size: 130 Blocks: 1 IO Block: 131072 regular file Device: 12h/18d Inode: 10336758825892691187 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-07-07 16:53:07.885332481 +0200 Modify: 2015-07-07 16:53:07.889332479 +0200 Change: 2015-07-07 16:53:07.889332479 +0200 ~~~ Checking the timestamps from each node (each hosts one brick of 2-way replicated pair): ~~~ [root@dhcp-37-194 timestamp]# stat file01 File: `file01' Size: 130 Blocks: 1 IO Block: 131072 regular file Device: 12h/18d Inode: 10336758825892691187 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-07-07 16:53:07.878828824 +0200 Modify: 2015-07-07 16:53:07.881828822 +0200 Change: 2015-07-07 16:53:07.882828821 +0200 ~~~ ~~~ [root@dhcp-37-195 timestamp]# stat file01 File: `file01' Size: 130 Blocks: 1 IO Block: 131072 regular file Device: 12h/18d Inode: 10336758825892691187 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-07-07 16:53:07.885332481 +0200 Modify: 2015-07-07 16:53:07.889332479 +0200 Change: 2015-07-07 16:53:07.889332479 +0200 ~~~ As you can see, the results differ across the cluster: * client (dhcp-37-182) reports the same values as 2nd node (dhcp-37-195) * 1st node (dhcp-37-194) reports different value The difference is quite small, eg. modify time of file above: abs(881828822 - 889332479) 10^(-9) s = 0.007503657 s This is likely caused by the fact that each gluster peer just uses the values which are reported by underlying xfs on the local brick: ~~~ [root@dhcp-37-194 timestamp]# stat /mnt/brick1/HadoopVol1/tmp/timestamp/file01 File: `/mnt/brick1/HadoopVol1/tmp/timestamp/file01' Size: 130 Blocks: 8 IO Block: 4096 regular file Device: fd04h/64772d Inode: 34041155 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-07-07 16:53:07.878828824 +0200 Modify: 2015-07-07 16:53:07.881828822 +0200 Change: 2015-07-07 16:53:07.882828821 +0200 ~~~ ~~~ [root@dhcp-37-195 timestamp]# stat /mnt/brick1/HadoopVol1/tmp/timestamp/file01 File: `/mnt/brick1/HadoopVol1/tmp/timestamp/file01' Size: 130 Blocks: 8 IO Block: 4096 regular file Device: fd04h/64772d Inode: 29365018 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-07-07 16:53:07.885332481 +0200 Modify: 2015-07-07 16:53:07.889332479 +0200 Change: 2015-07-07 16:53:07.889332479 +0200 ~~~ So this looks like that for n-way replicated volume, there are n values of file timestamps for particular file. Client uses value which is reported by brick it talks to. Expected results ================ All machines (both client and peer nodes hosting bricks) report the same value of file timestamps. Additional info =============== Output of `gluster volume info HadoopVol1` (volume used in examples above): ~~~ Volume Name: HadoopVol1 Type: Replicate Volume ID: 7b64d606-d1fe-47a1-8dbd-9ca2714051d4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dhcp-37-194.example.com:/mnt/brick1/HadoopVol1 Brick2: dhcp-37-195.example.com:/mnt/brick1/HadoopVol1 Options Reconfigured: performance.readdir-ahead: on cluster.eager-lock: on performance.quick-read: off performance.stat-prefetch: off ~~~
This issue is the root cause behind rhs-hadoop BZ 1182628 (hadoop checks the full timestamp of a file to make sure that file hasn't been changed).
Changing the assignee to Rafi since he is working on it. http://lists.gluster.org/pipermail/gluster-devel/2017-February/052190.html
Sorry for the confusion. This is a valid bug, and got closed as I looked at all bugs with PM Score < 0, and were opened 2+ years before and closed in bulk. This is a feature we are working on in upstream to, and plan is to have it in product by RHGS4.0 (or 4.1 worst case). More on this can be found at https://github.com/gluster/glusterfs/issues/208
We have implemented 'ctime' based xlator work in upstream, and best thing for us is to validate the feature upstream, with a Solr based testbed, and then if everything is fixed, work towards ways of getting them downstream. Note that the feature needed some extra fields sent on wire, and hence will need protocol version changes, so the recommendation is to wait for next major RHGS release. Engineering update on this is, it will get fixed when the next rebase happens. Best thing for QE is to validate it upstream to have confirmation on the bug, so we can say its ready when be rebase. (Note: Use at least version glusterfs-5.0 or above). Closing this bug as a duplicate of 1314508. *** This bug has been marked as a duplicate of bug 1314508 ***