Bug 1730686

Summary: [Ganesha] du -sh giving inconsistent output where lookups are running in parallel
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manisha Saini <msaini>
Component: nfs-ganeshaAssignee: Daniel Gryniewicz <dang>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.5CC: amukherj, dang, ffilz, grajoria, jthottan, mbenjamin, pasik, rcyriac, rhs-bugs, skoduri, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: RHGS 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nfs-ganesha-2.7.3-7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:15:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1475699, 1696809    
Attachments:
Description Flags
Du output when linux untar was running in parallel
none
Du output when linux untar was running in parallel none

Description Manisha Saini 2019-07-17 11:07:32 UTC
Description of problem:
=====================
Hit this issue on same setup - https://bugzilla.redhat.com/show_bug.cgi?id=1730654

du -sh is giving inconsistent output when ls -lRt and find's (named/unnamed) were running.

Note:
Linux untars got error out but lookups were running from other clients.No new IO's were triggered when output of du -sh was captured.

Version-Release number of selected component (if applicable):
==============================
# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-5.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.7.3-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-5.el7rhgs.x86_64
glusterfs-ganesha-6.0-7.el7rhgs.x86_64


How reproducible:
================
1/1


Steps to Reproduce:
=================
1.Create 8 node ganesha cluster
2.Create 8*3 Distributed-Replicate Volume
3.Export the volume via ganesha
4.Mount the volume on 5 clients via v4.1
5.Run the following workload
Client 1: Linux untars for large dirs
Client 2: du -sh in loop
Client 3: ls -lRt in loop
Client 4: find . -mindepth 1 -type f -name _04_* in loop
Client 5:  find . -mindepth 1 -type f in loop

Actual results:
=============
Linux untar got error out - BZ 1730654
Took 3 iterations of du -sh from 2 clients on same setup (No new IO's were triggered)

Client 1:
---------
[root@f12-h08-000-1029u ganesha]# du -sh
49G     .
[root@f12-h08-000-1029u ganesha]# du -sh
85G     .
[root@f12-h08-000-1029u ganesha]# du -sh
439G   


Client 2:
--------
[root@f12-h12-000-1029u ganesha]# while true;do du -sh;done             |
43G     .                           
34G  

Expected results:
===========

du -sh output should be consistent

Additional info:

Comment 11 Daniel Gryniewicz 2019-07-19 14:39:40 UTC
Looking this over, I think there's enough debugging, as long as NFS_READDIR is at FULL_DEBUG.

Comment 19 Manisha Saini 2019-07-29 12:41:38 UTC
Ran the test mentioned in comment 0 of the BZ with the testbuild for nfs and kernel provided  in comment 17

#  rpm -qa | grep ganesha
nfs-ganesha-gluster-2.7.3-6.el7rhgs.TESTFIX1.x86_64
nfs-ganesha-2.7.3-6.el7rhgs.TESTFIX1.x86_64
glusterfs-ganesha-6.0-9.el7rhgs.TESTFIX.bz1730654.x86_64
nfs-ganesha-debuginfo-2.7.3-6.el7rhgs.TESTFIX1.x86_64

#  rpm -qa | grep kernel
kernel-3.10.0-1062.el7.bz1732427.x86_64
kernel-3.10.0-1058.el7.x86_64
kernel-3.10.0-1061.el7.x86_64
abrt-addon-kerneloops-2.1.11-55.el7.x86_64
kernel-tools-3.10.0-1062.el7.bz1732427.x86_64
kernel-tools-libs-3.10.0-1062.el7.bz1732427.x86_64


Ran the test over weekend.While linux untar was in process,there was minor inconsistency observed in du (Attached is the screenshot).There were no files deleted while test was in process.Let me know if this is expected??

Once linux untar got completed,du -sh was giving consistent output when ran with parallel lookups 

Terminal output-
-------
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
-------

Comment 20 Manisha Saini 2019-07-29 12:43:44 UTC
Created attachment 1594257 [details]
Du output when linux untar was running in parallel

Comment 21 Manisha Saini 2019-07-29 12:58:19 UTC
Created attachment 1594263 [details]
Du output when linux untar was running in parallel

Comment 23 Manisha Saini 2019-08-12 10:06:20 UTC
Verified this BZ with

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-7.el7rhgs.x86_64
glusterfs-ganesha-6.0-11.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-7.el7rhgs.x86_64


Steps:
========

1.Create 4 node ganesha cluster
2.Create 4*3 Distributed-Replicate Volume
3.Export the volume via ganesha
4.Mount the volume on 3 clients via v4.1
5.Run the following workload
Client 1: Linux untars for large dirs
Client 2: du -sh in loop
Client 3: ls -lRt in loop


=======
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
========


Du -sh output is consistent.Moving this BZ to verified state

Comment 25 errata-xmlrpc 2019-10-30 12:15:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3252