Bug 1730686 - [Ganesha] du -sh giving inconsistent output where lookups are running in parallel
Summary: [Ganesha] du -sh giving inconsistent output where lookups are running in para...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: RHGS 3.5.0
Assignee: Daniel Gryniewicz
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks: 1475699 1696809
TreeView+ depends on / blocked
 
Reported: 2019-07-17 11:07 UTC by Manisha Saini
Modified: 2019-10-30 12:15 UTC (History)
12 users (show)

Fixed In Version: nfs-ganesha-2.7.3-7
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-30 12:15:39 UTC
Target Upstream Version:


Attachments (Terms of Use)
Du output when linux untar was running in parallel (1.75 MB, image/jpeg)
2019-07-29 12:43 UTC, Manisha Saini
no flags Details
Du output when linux untar was running in parallel (1.32 MB, image/jpeg)
2019-07-29 12:58 UTC, Manisha Saini
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3252 None None None 2019-10-30 12:15:52 UTC

Description Manisha Saini 2019-07-17 11:07:32 UTC
Description of problem:
=====================
Hit this issue on same setup - https://bugzilla.redhat.com/show_bug.cgi?id=1730654

du -sh is giving inconsistent output when ls -lRt and find's (named/unnamed) were running.

Note:
Linux untars got error out but lookups were running from other clients.No new IO's were triggered when output of du -sh was captured.

Version-Release number of selected component (if applicable):
==============================
# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-5.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.7.3-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-5.el7rhgs.x86_64
glusterfs-ganesha-6.0-7.el7rhgs.x86_64


How reproducible:
================
1/1


Steps to Reproduce:
=================
1.Create 8 node ganesha cluster
2.Create 8*3 Distributed-Replicate Volume
3.Export the volume via ganesha
4.Mount the volume on 5 clients via v4.1
5.Run the following workload
Client 1: Linux untars for large dirs
Client 2: du -sh in loop
Client 3: ls -lRt in loop
Client 4: find . -mindepth 1 -type f -name _04_* in loop
Client 5:  find . -mindepth 1 -type f in loop

Actual results:
=============
Linux untar got error out - BZ 1730654
Took 3 iterations of du -sh from 2 clients on same setup (No new IO's were triggered)

Client 1:
---------
[root@f12-h08-000-1029u ganesha]# du -sh
49G     .
[root@f12-h08-000-1029u ganesha]# du -sh
85G     .
[root@f12-h08-000-1029u ganesha]# du -sh
439G   


Client 2:
--------
[root@f12-h12-000-1029u ganesha]# while true;do du -sh;done             |
43G     .                           
34G  

Expected results:
===========

du -sh output should be consistent

Additional info:

Comment 11 Daniel Gryniewicz 2019-07-19 14:39:40 UTC
Looking this over, I think there's enough debugging, as long as NFS_READDIR is at FULL_DEBUG.

Comment 19 Manisha Saini 2019-07-29 12:41:38 UTC
Ran the test mentioned in comment 0 of the BZ with the testbuild for nfs and kernel provided  in comment 17

#  rpm -qa | grep ganesha
nfs-ganesha-gluster-2.7.3-6.el7rhgs.TESTFIX1.x86_64
nfs-ganesha-2.7.3-6.el7rhgs.TESTFIX1.x86_64
glusterfs-ganesha-6.0-9.el7rhgs.TESTFIX.bz1730654.x86_64
nfs-ganesha-debuginfo-2.7.3-6.el7rhgs.TESTFIX1.x86_64

#  rpm -qa | grep kernel
kernel-3.10.0-1062.el7.bz1732427.x86_64
kernel-3.10.0-1058.el7.x86_64
kernel-3.10.0-1061.el7.x86_64
abrt-addon-kerneloops-2.1.11-55.el7.x86_64
kernel-tools-3.10.0-1062.el7.bz1732427.x86_64
kernel-tools-libs-3.10.0-1062.el7.bz1732427.x86_64


Ran the test over weekend.While linux untar was in process,there was minor inconsistency observed in du (Attached is the screenshot).There were no files deleted while test was in process.Let me know if this is expected??

Once linux untar got completed,du -sh was giving consistent output when ran with parallel lookups 

Terminal output-
-------
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
-------

Comment 20 Manisha Saini 2019-07-29 12:43:44 UTC
Created attachment 1594257 [details]
Du output when linux untar was running in parallel

Comment 21 Manisha Saini 2019-07-29 12:58:19 UTC
Created attachment 1594263 [details]
Du output when linux untar was running in parallel

Comment 23 Manisha Saini 2019-08-12 10:06:20 UTC
Verified this BZ with

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-7.el7rhgs.x86_64
glusterfs-ganesha-6.0-11.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-7.el7rhgs.x86_64


Steps:
========

1.Create 4 node ganesha cluster
2.Create 4*3 Distributed-Replicate Volume
3.Export the volume via ganesha
4.Mount the volume on 3 clients via v4.1
5.Run the following workload
Client 1: Linux untars for large dirs
Client 2: du -sh in loop
Client 3: ls -lRt in loop


=======
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
11G     .
========


Du -sh output is consistent.Moving this BZ to verified state

Comment 25 errata-xmlrpc 2019-10-30 12:15:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3252


Note You need to log in before you can comment on or make changes to this bug.