1409808 – [Mdcache] clients being served wrong information about a file, can lead to file inconsistency

Bug 1409808 - [Mdcache] clients being served wrong information about a file, can lead to file inconsistency

Summary: [Mdcache] clients being served wrong information about a file, can lead to fi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Poornima G
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1410375 1410688
TreeView+	depends on / blocked

Reported:	2017-01-03 13:44 UTC by Nag Pavan Chilakam
Modified:	2018-11-30 05:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-11
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1410375 (view as bug list)
Environment:
Last Closed:	2017-03-23 06:01:38 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Nag Pavan Chilakam 2017-01-03 13:44:25 UTC

Description of problem:
========================
I see that If i create a 10gb file from one client, the other client shows the file as size of 250MB (ie only quarter the actual size or the size of the file on one data brick)
I don't think caching is the only culprit, because when the file is being created from c1, c2 would have the size it got at a older time(not the real time size)
However once the file is created completely in c1, if we do a stat on c2, the c2 shows the size updated immediately, but instead of showing the correct size, it shows the size of file as only 1/4th the actual size

I was able to reproduce consistently on my setup




I have a 2x(4+2) ec volume mounted on nfs-ganesha clients.
Setup is as follows:
node1:10.70.35.37 VIP:10.70.35.50
node1:10.70.35.116 VIP:10.70.35.60
node1:10.70.35.239 VIP:10.70.35.70
node1:10.70.35.135 ==>not part of ganesha cluster
node1:10.70.35.8 ==>not part of ganesha cluster
node1:10.70.35.196 VIP:10.70.35.80

I mounted the volume using ganesha type, on two clients as below:
c1:rhs-client45(el7.2):10.70.35.50:/disperse on /mnt/disperse type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.70.36.69,local_lock=none,addr=10.70.35.50)

c2:dhcp35-107:(el6.8):10.70.35.80:disperse on /mnt/disperse type nfs (rw,vers=4,addr=10.70.35.80,clientaddr=10.70.35.107)

The same inconsistency can be observed even with md5sum


Volume Name: disperse
Type: Distributed-Disperse
Volume ID: ea0b329a-f2ff-462d-a3a2-457a80b35e08
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick2/disperse
Brick2: 10.70.35.116:/rhs/brick2/disperse
Brick3: 10.70.35.239:/rhs/brick2/disperse
Brick4: 10.70.35.135:/rhs/brick2/disperse
Brick5: 10.70.35.8:/rhs/brick2/disperse
Brick6: 10.70.35.196:/rhs/brick2/disperse
Brick7: 10.70.35.37:/rhs/brick3/disperse
Brick8: 10.70.35.116:/rhs/brick3/disperse
Brick9: 10.70.35.239:/rhs/brick3/disperse
Brick10: 10.70.35.135:/rhs/brick3/disperse
Brick11: 10.70.35.8:/rhs/brick3/disperse
Brick12: 10.70.35.196:/rhs/brick3/disperse
Options Reconfigured:
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
ganesha.enable: on
features.cache-invalidation: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp35-37 log]# gluster v status disperse
Status of volume: disperse
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.37:/rhs/brick2/disperse      49153     0          Y       21634
Brick 10.70.35.116:/rhs/brick2/disperse     49153     0          Y       17741
Brick 10.70.35.239:/rhs/brick2/disperse     49153     0          Y       8056 
Brick 10.70.35.135:/rhs/brick2/disperse     49154     0          Y       31840
Brick 10.70.35.8:/rhs/brick2/disperse       49154     0          Y       31773
Brick 10.70.35.196:/rhs/brick2/disperse     49152     0          Y       6865 
Brick 10.70.35.37:/rhs/brick3/disperse      49154     0          Y       21654
Brick 10.70.35.116:/rhs/brick3/disperse     49154     0          Y       17761
Brick 10.70.35.239:/rhs/brick3/disperse     49154     0          Y       8076 
Brick 10.70.35.135:/rhs/brick3/disperse     49155     0          Y       31860
Brick 10.70.35.8:/rhs/brick3/disperse       49155     0          Y       31793
Brick 10.70.35.196:/rhs/brick3/disperse     49153     0          Y       6885 
Self-heal Daemon on localhost               N/A       N/A        Y       21675
Self-heal Daemon on dhcp35-8.lab.eng.blr.re
dhat.com                                    N/A       N/A        Y       31815
Self-heal Daemon on dhcp35-196.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       6914 
Self-heal Daemon on dhcp35-135.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       31881
Self-heal Daemon on dhcp35-116.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       17794
Self-heal Daemon on dhcp35-239.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       8121 
 
Task Status of Volume disperse
------------------------------------------------------------------------------
There are no active volume tasks
 


Version-Release number of selected component (if applicable):
=====================================
[root@dhcp35-37 log]# rpm -qa|grep gluster
glusterfs-geo-replication-3.8.4-10.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
glusterfs-3.8.4-10.el7rhgs.x86_64
python-gluster-3.8.4-10.el7rhgs.noarch
glusterfs-cli-3.8.4-10.el7rhgs.x86_64
glusterfs-rdma-3.8.4-10.el7rhgs.x86_64
glusterfs-api-3.8.4-10.el7rhgs.x86_64
glusterfs-server-3.8.4-10.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-10.el7rhgs.x86_64
glusterfs-libs-3.8.4-10.el7rhgs.x86_64
glusterfs-devel-3.8.4-10.el7rhgs.x86_64
glusterfs-fuse-3.8.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-10.el7rhgs.x86_64
glusterfs-events-3.8.4-10.el7rhgs.x86_64




Steps to Reproduce:
=====================
1.create a disp-ec volume and mount it on two different clients using different VIPs of ganesha cluster
2.from c1, start to create a file of say 1GB
3. While the file is getting created, do a lookup on client c2
3.once c1 completes ,immediately issue stat of file on c1, it will show size as 1GB
4. now issue on c2, it will show as 250MB(instead of 1GB)

this is wrong, as it will lead to inconsistent data for applications relying on these details

Now even if you do a md5sum, it calculated for the cached or 250mb file which is wrong



Note, that this file updates to the correct data after sometime(hence not sure if mdcaching time settings are playing a role here)

Comment 2 Nag Pavan Chilakam 2017-01-04 08:04:00 UTC

I disable mdcache settings after which I don't see this problem
hence seems to be with all likeliness mdcache issue

Comment 3 Nag Pavan Chilakam 2017-01-04 10:42:47 UTC

I am seeing this issue even with fuse protocol, hence changing the title to something more generic(removing ganesha)
This is seen with mdcache settings only
Also, note that the reason it can skip someone's eye or observation with fuse protocol, is that, whenever we do the lookup say ll of the file, it gets the update information of the file(while dd is happening from other client) in fuse mount. However once file is completed with dd , then  only the ll reflects to only the size of file on one brick .
Whereas in ganesha, the ll doesnt always serve the an updated size information, probably with ganesha caching related stuff. Hence it is more easy for human eye to catch the discrepancy.

Comment 5 Atin Mukherjee 2017-01-05 12:17:41 UTC

upstream mainline patch http://review.gluster.org/16329 posted for review.

Comment 6 Poornima G 2017-01-06 06:19:03 UTC

https://code.engineering.redhat.com/gerrit/#/c/94317/

Comment 8 Nag Pavan Chilakam 2017-01-09 11:04:45 UTC

QATP:
ran the testcase which was mentioned while raising the bug, the case passed

Other cases:
saw a problem with truncate of file fixing atime as epoch starting time, raised a new bug https://bugzilla.redhat.com/show_bug.cgi?id=1411277  truncate setting accesstime of a file (atime) to unix epoch beginning time(01-01-1970) 

Also, noticed that there were some atime inconsistencies, like below
on ganesha setup,with mdcache settings enabled
when we access the file, the atime changes, on the local machine, but is not saved on the brick side, hence the atime is lost later.
I will discuss and raise a new bug accordingly

Moving to verified 


[root@dhcp35-37 ~]# gluster v status disperse
Status of volume: disperse
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.37:/rhs/brick2/disperse      49153     0          Y       20857
Brick 10.70.35.116:/rhs/brick2/disperse     49153     0          Y       18063
Brick 10.70.35.239:/rhs/brick2/disperse     49153     0          Y       18140
Brick 10.70.35.135:/rhs/brick2/disperse     49154     0          Y       4732 
Brick 10.70.35.8:/rhs/brick2/disperse       49154     0          Y       4737 
Brick 10.70.35.196:/rhs/brick2/disperse     49152     0          Y       17851
Brick 10.70.35.37:/rhs/brick3/disperse      49154     0          Y       20877
Brick 10.70.35.116:/rhs/brick3/disperse     49154     0          Y       18083
Brick 10.70.35.239:/rhs/brick3/disperse     49154     0          Y       18160
Brick 10.70.35.135:/rhs/brick3/disperse     49155     0          Y       4752 
Brick 10.70.35.8:/rhs/brick3/disperse       49155     0          Y       4757 
Brick 10.70.35.196:/rhs/brick3/disperse     49153     0          Y       17871
Self-heal Daemon on localhost               N/A       N/A        Y       8229 
Self-heal Daemon on 10.70.35.135            N/A       N/A        Y       5022 
Self-heal Daemon on 10.70.35.196            N/A       N/A        Y       5010 
Self-heal Daemon on 10.70.35.239            N/A       N/A        Y       5265 
Self-heal Daemon on 10.70.35.8              N/A       N/A        Y       5042 
Self-heal Daemon on 10.70.35.116            N/A       N/A        Y       5273 
 
Task Status of Volume disperse
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-37 ~]# gluster v info disperse
 
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: 53f82e48-d215-4301-be59-5f3e58f8b846
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick2/disperse
Brick2: 10.70.35.116:/rhs/brick2/disperse
Brick3: 10.70.35.239:/rhs/brick2/disperse
Brick4: 10.70.35.135:/rhs/brick2/disperse
Brick5: 10.70.35.8:/rhs/brick2/disperse
Brick6: 10.70.35.196:/rhs/brick2/disperse
Brick7: 10.70.35.37:/rhs/brick3/disperse
Brick8: 10.70.35.116:/rhs/brick3/disperse
Brick9: 10.70.35.239:/rhs/brick3/disperse
Brick10: 10.70.35.135:/rhs/brick3/disperse
Brick11: 10.70.35.8:/rhs/brick3/disperse
Brick12: 10.70.35.196:/rhs/brick3/disperse
Options Reconfigured:
cluster.shd-max-threads: 5
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
ganesha.enable: on
features.cache-invalidation: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp35-37 ~]# rpm -qa|egrep "ganesha|gluster"
glusterfs-events-3.8.4-11.el7rhgs.x86_64
glusterfs-rdma-3.8.4-11.el7rhgs.x86_64
glusterfs-api-3.8.4-11.el7rhgs.x86_64
glusterfs-server-3.8.4-11.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.1-3.el7rhgs.x86_64
glusterfs-libs-3.8.4-11.el7rhgs.x86_64
glusterfs-cli-3.8.4-11.el7rhgs.x86_64
nfs-ganesha-2.4.1-3.el7rhgs.x86_64
glusterfs-3.8.4-11.el7rhgs.x86_64
glusterfs-fuse-3.8.4-11.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-11.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-11.el7rhgs.x86_64
python-gluster-3.8.4-11.el7rhgs.noarch
glusterfs-ganesha-3.8.4-11.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-11.el7rhgs.x86_64
[root@dhcp35-37 ~]#

Comment 9 Nag Pavan Chilakam 2017-01-23 13:55:13 UTC

TestCaseId:RHG3-11784

Comment 11 errata-xmlrpc 2017-03-23 06:01:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.