Bug 1240739

Summary:	timestamps for files stored on replicated volume are not consistent
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Martin Bukatovic <mbukatov>
Component:	replicate	Assignee:	Mohammed Rafi KC <rkavunga>
Status:	CLOSED DUPLICATE	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	medium	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	atumball, kdhananj, nbalacha, ravishankar, rhs-bugs, sgraf
Target Milestone:	---	Keywords:	Reopened, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-19 04:00:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1182628

Description Martin Bukatovic 2015-07-07 15:50:46 UTC

Description of problem
======================

When gluster client asks for timestamp of particular file stored on n-way
replicated gluster volume, it can obtain different values based on which brick
it talks to. 

This means that process accessing guster volume can't expect file timestamp
values to be consistent across trusted storage pool and so it can't reliably
compare entire timestamps of files stored on glusterfs replicated volume.

Version-Release number of selected component (if applicable)
============================================================

glusterfs-3.7.1-4.el6rhs.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Create 2-way replicated gluster volume (just 2 bricks are enough, but
   each brick *must be* hosted on different machine).
2. Mount this volume from client machine outside of trusted storage pool.
3. Make sure ntpd is configured and time synchronized on all machines.
4. On client machine, create new file on the volume.
5. On client machine, check the timestamp of this new file using `stat` tool.
6. Compare timestamps of this file from each peer node hosting the bricks
   (one from replicated pair which stores this particular file).

Summary of minimal environment to reproduce this issue:

 * 2 gluster peer nodes (aka storage nodes)
 * each peer node hosts one brick
 * one client machine outside of trusted storage pool

Actual results
==============

On glusterfs client machine, creating new file on gluster volume while
checking it's timestamp:

~~~
[root@dhcp-37-182 timestamp]# uname -a > file01
[root@dhcp-37-182 timestamp]# stat file01
  File: `file01'
  Size: 130             Blocks: 1          IO Block: 131072 regular file
Device: 12h/18d Inode: 10336758825892691187  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-07 16:53:07.885332481 +0200
Modify: 2015-07-07 16:53:07.889332479 +0200
Change: 2015-07-07 16:53:07.889332479 +0200
~~~

Checking the timestamps from each node (each hosts one brick of 2-way
replicated pair):

~~~
[root@dhcp-37-194 timestamp]# stat file01 
  File: `file01'
  Size: 130             Blocks: 1          IO Block: 131072 regular file
Device: 12h/18d Inode: 10336758825892691187  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-07 16:53:07.878828824 +0200
Modify: 2015-07-07 16:53:07.881828822 +0200
Change: 2015-07-07 16:53:07.882828821 +0200
~~~

~~~
[root@dhcp-37-195 timestamp]# stat file01 
  File: `file01'
  Size: 130             Blocks: 1          IO Block: 131072 regular file
Device: 12h/18d Inode: 10336758825892691187  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-07 16:53:07.885332481 +0200
Modify: 2015-07-07 16:53:07.889332479 +0200
Change: 2015-07-07 16:53:07.889332479 +0200
~~~

As you can see, the results differ across the cluster:

 * client (dhcp-37-182) reports the same values as 2nd node (dhcp-37-195)
 * 1st node (dhcp-37-194) reports different value

The difference is quite small, eg. modify time of file above:

    abs(881828822 - 889332479) 10^(-9) s = 0.007503657 s

This is likely caused by the fact that each gluster peer just uses the values
which are reported by underlying xfs on the local brick:

~~~
[root@dhcp-37-194 timestamp]# stat /mnt/brick1/HadoopVol1/tmp/timestamp/file01 
  File: `/mnt/brick1/HadoopVol1/tmp/timestamp/file01'
  Size: 130             Blocks: 8          IO Block: 4096   regular file
Device: fd04h/64772d    Inode: 34041155    Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-07 16:53:07.878828824 +0200
Modify: 2015-07-07 16:53:07.881828822 +0200
Change: 2015-07-07 16:53:07.882828821 +0200
~~~

~~~
[root@dhcp-37-195 timestamp]# stat /mnt/brick1/HadoopVol1/tmp/timestamp/file01 
  File: `/mnt/brick1/HadoopVol1/tmp/timestamp/file01'
  Size: 130             Blocks: 8          IO Block: 4096   regular file
Device: fd04h/64772d    Inode: 29365018    Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-07 16:53:07.885332481 +0200
Modify: 2015-07-07 16:53:07.889332479 +0200
Change: 2015-07-07 16:53:07.889332479 +0200
~~~

So this looks like that for n-way replicated volume, there are n values of file
timestamps for particular file. Client uses value which is reported by brick it
talks to.

Expected results
================

All machines (both client and peer nodes hosting bricks) report the same value
of file timestamps.

Additional info
===============

Output of `gluster volume info HadoopVol1` (volume used in examples above):

~~~
Volume Name: HadoopVol1
Type: Replicate
Volume ID: 7b64d606-d1fe-47a1-8dbd-9ca2714051d4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dhcp-37-194.example.com:/mnt/brick1/HadoopVol1
Brick2: dhcp-37-195.example.com:/mnt/brick1/HadoopVol1
Options Reconfigured:
performance.readdir-ahead: on
cluster.eager-lock: on
performance.quick-read: off
performance.stat-prefetch: off
~~~

Comment 1 Martin Bukatovic 2015-07-07 15:59:34 UTC

This issue is the root cause behind rhs-hadoop BZ 1182628 (hadoop checks the
full timestamp of a file to make sure that file hasn't been changed).

Comment 2 Krutika Dhananjay 2017-05-16 06:11:05 UTC

Changing the assignee to Rafi since he is working on it.

http://lists.gluster.org/pipermail/gluster-devel/2017-February/052190.html

Comment 5 Amar Tumballi 2018-04-17 06:35:16 UTC

Sorry for the confusion. This is a valid bug, and got closed as I looked at all bugs with PM Score < 0, and were opened 2+ years before and closed in bulk.

This is a feature we are working on in upstream to, and plan is to have it in product by RHGS4.0 (or 4.1 worst case). More on this can be found at https://github.com/gluster/glusterfs/issues/208

Comment 6 Mohammed Rafi KC 2018-11-19 04:00:44 UTC

We have implemented 'ctime' based xlator work in upstream, and best thing for
us is to validate the feature upstream, with a Solr based testbed, and then if
everything is fixed, work towards ways of getting them downstream.

Note that the feature needed some extra fields sent on wire, and hence will
need protocol version changes, so the recommendation is to wait for next major RHGS
release.


Engineering update on this is, it will get fixed when the next rebase happens.
Best thing for QE is to validate it upstream to have confirmation on the bug,
so we can say its ready when be rebase.

(Note: Use at least version glusterfs-5.0 or above).

Closing this bug as a duplicate of 1314508.

*** This bug has been marked as a duplicate of bug 1314508 ***