1393743 – the return size of fstat sometime is not correct while write-behind feature enabled

Bug 1393743 - the return size of fstat sometime is not correct while write-behind feature enabled

Summary: the return size of fstat sometime is not correct while write-behind feature e...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	md-cache
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Assignee:	Poornima G
QA Contact:
Docs Contact:
URL:
Whiteboard:	GLUSTERFS_METADATA_INCONSISTENCY
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-10 09:00 UTC by George
Modified:	2019-05-11 03:50 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-05-11 03:50:09 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description George 2016-11-10 09:00:23 UTC

Description of problem:

sometimes the size of fstat is incorrect, it could be test by tail application
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.make 2 SN with replicated.
2.make a CLI host mounted to the SN, let's say /mnt/log
3. on CLI host start the rsyslog servie and let the output store in /mnt/log, let's say /mnt/log/syslog
4. tail -f /mnt/log/syslog >/devnull on CLI host
5. on CLI host lanuch 4 process 
  a) logger 51 bytes every 101 ms
  b) logger 101 bytes every 111 ms
  c) logger 301 bytes every 121 ms
  d) logger 501 bytes every 131 ms

Actual results:
"syslog: file truncated" printer on terminal about ten minutes

Expected results:
tail -f should be no error information printed on terminal

Additional info:

has set trace log and add some private log in glusterfs, some userful log , also include some of my private comments with begin of [George/C].

[2016-11-10 03:11:31.686813] T [fuse-bridge.c:2304:fuse_write_resume] 0-glusterfs-fuse: 93101: WRITE (0xa4757c, size=50, offset=39739375)

[George/C] the size should be 39739375+50 now

[2016-11-10 03:11:31.679749] T [fuse-bridge.c:832:fuse_getattr_resume] 0-glusterfs-fuse: 93099: FGETATTR 139805905838312 (/messages/0xa47654)
[2016-11-10 03:11:31.687328] W [md-cache.c:937:mdc_fstat_cbk] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7f272c86bebc] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/md-cache.so(mdc_fstat_cbk+0x7b)[0x7f271a8b28bb] (--> /lib64/libglusterfs.so.0(default_fstat_cbk+0xac)[0x7f272c872c5c] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/read-ahead.so(ra_attr_cbk+0xac)[0x7f271b0e08ac] (--> /lib64/libglusterfs.so.0(default_fstat_cbk+0xac)[0x7f272c872c5c] ))))) 0-mdc-cache: mdc_fstat from storage server stat size[39739375]
[George/C] the size of fstat is still 39739375, 50 bytes not included, why?

[2016-11-10 03:11:31.687694] T [fuse-bridge.c:832:fuse_getattr_resume] 0-glusterfs-fuse: 93102: FGETATTR 139805905838312 (/messages/0xa47654)
[2016-11-10 03:11:31.687893] W [md-cache.c:961:mdc_fstat] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7f272c86bebc] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/md-cache.so(mdc_fstat+0x1b7)[0x7f271a8b3977] (--> /usr/lib64/glusterfs/3.6.9/xlator/debug/io-stats.so(io_stats_fstat+0x149)[0x7f271a6977c9] (--> /lib64/libglusterfs.so.0(default_fstat+0x5c)[0x7f272c87015c] (--> /usr/lib64/glusterfs/3.6.9/xlator/meta.so(meta_fstat+0x2e)[0x7f271a4811ce] ))))) 0-mdc-cache: mdc_fstat from cache stat size[39739375]

[2016-11-10 03:11:31.696637] T [fuse-bridge.c:832:fuse_getattr_resume] 0-glusterfs-fuse: 93105: FGETATTR 139805905838312 (/messages/0xa47654)
[2016-11-10 03:11:31.696780] W [md-cache.c:961:mdc_fstat] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7f272c86bebc] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/md-cache.so(mdc_fstat+0x1b7)[0x7f271a8b3977] (--> /usr/lib64/glusterfs/3.6.9/xlator/debug/io-stats.so(io_stats_fstat+0x149)[0x7f271a6977c9] (--> /lib64/libglusterfs.so.0(default_fstat+0x5c)[0x7f272c87015c] (--> /usr/lib64/glusterfs/3.6.9/xlator/meta.so(meta_fstat+0x2e)[0x7f271a4811ce] ))))) 0-mdc-cache: mdc_fstat from cache stat size[39739375]

[2016-11-10 03:11:31.696818] T [fuse-bridge.c:755:fuse_attr_cbk] 0-glusterfs-fuse: 93105: FSTAT() /messages => 12997464502055181759

[2016-11-10 03:11:31.697940] T [fuse-bridge.c:2174:fuse_readv_cbk] 0-glusterfs-fuse: 93106: READ => 33/4096,39739392/0

[George/C] the read from 39739392 and read 33bytes, 

[2016-11-10 03:11:31.698039] T [fuse-bridge.c:832:fuse_getattr_resume] 0-glusterfs-fuse: 93107: FGETATTR 139805905838312 (/messages/0xa47654)
[2016-11-10 03:11:31.698170] W [md-cache.c:961:mdc_fstat] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7f272c86bebc] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/md-cache.so(mdc_fstat+0x1b7)[0x7f271a8b3977] (--> /usr/lib64/glusterfs/3.6.9/xlator/debug/io-stats.so(io_stats_fstat+0x149)[0x7f271a6977c9] (--> /lib64/libglusterfs.so.0(default_fstat+0x5c)[0x7f272c87015c] (--> /usr/lib64/glusterfs/3.6.9/xlator/meta.so(meta_fstat+0x2e)[0x7f271a4811ce] ))))) 0-mdc-cache: mdc_fstat from cache stat size[39739375]
[George/C] the fstat still return 39739375, still 50 bytes less, Why?

[2016-11-10 03:11:31.698188] T [fuse-bridge.c:755:fuse_attr_cbk] 0-glusterfs-fuse: 93107: FSTAT() /messages => 12997464502055181759

Comment 1 George 2016-11-11 08:22:40 UTC

I seems find the root cause of this issue.

T0: f_stat is called with mdc-cache not hit, so called from server via RPC
T1: write is called
T2: mdc_writev_cbk is called , and call mdc_inode_iatt_set_validate , due to  “(!iatt || !iatt->ia_ctime)” set mdc->ia_time=0 , this will let is_md_cache_iatt_valid return faluse, am I right?
T3: f_stat_cbk is called and call mdc_inode_iatt_set_validate, and overwrite  mdc->ia_time  with “time (&mdc->ia_time);”
T4: another fstat is called, mdc-cache hit, so just get the old st_size from the old value stored by previous f_stat, but it not include the write of T1

T0<T1<T2<T3<T4,
In this case, is there a bug here?

And my fix is like below:
                mdc_from_iatt (mdc, iatt);

-                time (&mdc->ia_time);
+               if(mdc->ia_time)  time (&mdc->ia_time);  /*if mdc->ia_time is 0, don’t update it , so just let the stat md_cache_iatt_valid  is invalidate */

Now I am test the change , and will update the result when the time is long enough.

Could you please confirm my findings and the code change for this issue? is there any risk for this change?

Comment 2 George 2016-11-14 01:14:52 UTC

with the verification of the changes about more than one days, the issue is not reproduced with the changes to md-cache.c.
and if not with the changes, the issue will be easy reproduced about 10 minute or late.

so , could anyone help me to review the changes, is there any risk for this changes?

Thanks a lots in advance

Comment 3 Soumya Koduri 2016-11-15 12:11:05 UTC

Could you please send out for code review using review.gluster.org site..

Comment 4 George 2016-11-16 01:24:25 UTC

sorry, I meet an issue on git commit -S
gpg: skipped "GeorgeLian <george.lian>": No secret key
gpg: signing failed: No secret key
error: gpg failed to sign the data
fatal: failed to write commit object

I would like put the git diff messages here , and could you please take a look first?

--- a/xlators/performance/md-cache/src/md-cache.c
+++ b/xlators/performance/md-cache/src/md-cache.c
@@ -382,8 +382,14 @@ is_md_cache_iatt_valid (xlator_t *this, struct md_cache *mdc)
         LOCK (&mdc->lock);
         {
                 ret = __is_cache_valid (this, mdc->ia_time);
+
+                /* differ with mdc->ia_time =0  which set in mdc_inode_iatt_set_validate
+                   set mdc->ia_time = 1 mean 0 is handled here, and indicate the value could be reset
+                   in mdc_inode_iatt_set_validate
+                */
+
                 if (ret == _gf_false)
-                        mdc->ia_time = 0;
+                        mdc->ia_time = 1;
         }
         UNLOCK (&mdc->lock);

@@ -520,6 +526,15 @@ mdc_inode_iatt_set_validate(xlator_t *this, inode_t *inode, struct iatt *prebuf,

                 mdc_from_iatt (mdc, iatt);

+                /* Not reset mdc->ia_time while mdc->ia_time is 0,
+                 * which is the flag set by write-behind wb_cbk, if reset here,
+                 * which mean the flag is over-writed here,
+                 * that will cause the meta-data cache in mdc-cache is stale data
+                 * Make sure the 0 is read by is_md_cache_iatt_valid then reset mdc->ia_time
+                 */
+
+                if(!mdc->ia_time) goto unlock;
+
                 time (&mdc->ia_time);
                 gf_msg_callingfn ("md-cache", GF_LOG_TRACE, 0,
                                   MD_CACHE_MSG_CACHE_UPDATE, "Updated iatt(%s)"

Comment 5 George 2016-11-17 00:05:06 UTC

draft code has sent for review:
http://review.gluster.org/#/c/15862/

Comment 6 Raghavendra G 2018-06-29 16:06:05 UTC

Correct fix can be found at:
https://review.gluster.org/20410

Comment 7 Raghavendra G 2019-05-11 03:50:09 UTC

https://review.gluster.org/#/c/glusterfs/+/20549/

Note You need to log in before you can comment on or make changes to this bug.