1613890 – [Ganesha] Wrong file count is showing on Ganesha mount for some clients, post running du -sh command

Bug 1613890 - [Ganesha] Wrong file count is showing on Ganesha mount for some clients, post running du -sh command

Summary: [Ganesha] Wrong file count is showing on Ganesha mount for some clients, post...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Kaleb KEITHLEY
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	1613273
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-08-08 13:21 UTC by Manisha Saini
Modified:	2018-09-24 05:03 UTC (History)
CC List:	13 users (show)
Fixed In Version:	nfs-ganesha-2.5.5-10
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:55:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2610	0	None	None	None	2018-09-04 06:56:53 UTC

Description Manisha Saini 2018-08-08 13:21:22 UTC

Description of problem:

Hit this issue while running the use-case reported in BZ 1613273.

Issue: Wrong file count is showing on mount point after running du command on 
       ganesha mount point.


Create 3 directories on mount point.Inside each directory create 1 Lakh File from 3 different clients (v4).

Run du -sh on mount point from one of the client.

While checking the total number of files on mount point,it shows wrong file count on 1 out of 3 clients.

snippet of "ls | wc -l" command on all the 3 directories from 3 different clients

---------
Client 1:
---------
[root@rhs-client6 dir1]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
21928

[root@rhs-client6 dir2]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
26416

[root@rhs-client6 dir3]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
40306

---------
Client 2:
---------
**********
without running du command on client 2
**********
[root@rhs-client8 dir1]# ls | wc -l
100001


[root@rhs-client8 dir2]# ls | wc -l
100001

[root@rhs-client8 dir3]# ls | wc -l
100001

***********
After running du command on mount point 
***********
root@rhs-client8 dir1]#  du -sh
98G	.
[root@rhs-client8 dir1]# ls | wc -l
24961
[root@rhs-client8 dir1]# ls | wc -l
24961
[root@rhs-client8 dir1]#  du -sh
25G	.

---------
Client 3:
---------

[root@rhs-client9 dir1]# ls | wc -l
100001

[root@rhs-client9 dir2]# ls | wc -l
100001

root@rhs-client9 dir2]# ls | wc -l
100001


---------------------------------

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-2.5.5-9.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-9.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-15.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-9.el7rhgs.x86_64


How reproducible:
2/2

Steps to Reproduce:
1.Create 6 node ganesha cluster
2.Create 6*3 Distributed-Replicate Volume.export the volume via ganesha
3.Mount the volume on 3 clients via 3 different VIP's (v4 protocol)
4.Create 3 directories on mount point
5.Create 1 Lakh file in each directory from 3 clients.
6.Run # ls | wc -l to check number of files present -> Shows correct count
7.Check du -sh disk usage
8.Run # ls | wc -l to check number of files present -> shows incorrect count



Actual results:

ls | wc -l is giving wrong output when run post running du -sh on mount point 

[root@rhs-client6 dir1]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
21928

Expected results:

Should show correct number of files present on mount point

Additional info:

This is a regression as the issue is not been observed with 3.3.1 ganesha bits.

Comment 3 Manisha Saini 2018-08-08 13:55:54 UTC

Marking this as a blocker since its an application side impact showing less number of files as compared to actual files present on mount point and also regression

Comment 4 Manisha Saini 2018-08-08 13:57:47 UTC

Gluster-health-check-report


------------
[root@moonshine gluster-health-report-master]#  gluster-health-report

Loaded reports: coredump, disk_usage, errors_in_logs, firewall-check, georep, gfid-mismatch-dht-report, glusterd-op-version, glusterd-peer-disconnect, glusterd, glusterd_volume_version_cksum_errors, kernel_issues, memory_usage, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=18
[     OK] Disk used percentage  path=/var  percentage=18
[     OK] Disk used percentage  path=/tmp  percentage=18
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=31302  max_op_version=31302
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      14172/glusterd      

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      13152/glusterfsd    
4:tcp        0      0 0.0.0.0:49154           0.0.0.0:*               LISTEN      19962/glusterfsd    

[  ERROR] Report failure  report=report_check_worker_restarts
[     OK] Glusterd is running  uptime_sec=602773
[WARNING] Errors in Glusterd log file  num_errors=33
[WARNING] Warnings in Glusterd log file  num_warning=11
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=76
[WARNING] Warnings in Glusterd log file num_warnings=31

---------------------


root@tettnang gluster-health-report-master]#  gluster-health-report

Loaded reports: coredump, disk_usage, errors_in_logs, firewall-check, georep, gfid-mismatch-dht-report, glusterd-op-version, glusterd-peer-disconnect, glusterd, glusterd_volume_version_cksum_errors, kernel_issues, memory_usage, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=45
[     OK] Disk used percentage  path=/var  percentage=45
[     OK] Disk used percentage  path=/tmp  percentage=45
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=31302  max_op_version=31302
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      16591/glusterd      

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      7949/glusterfsd     
4:tcp        0      0 0.0.0.0:49154           0.0.0.0:*               LISTEN      19648/glusterfsd    

[  ERROR] Report failure  report=report_check_worker_restarts
[     OK] Glusterd is running  uptime_sec=602772
[WARNING] Errors in Glusterd log file  num_errors=29
[WARNING] Warnings in Glusterd log file  num_warning=11
[ NOT OK] Receive errors in "ifconfig eno2" output
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=77
[WARNING] Warnings in Glusterd log file num_warnings=31


--------------------


root@zod gluster-health-report-master]#  gluster-health-report

Loaded reports: coredump, disk_usage, errors_in_logs, firewall-check, georep, gfid-mismatch-dht-report, glusterd-op-version, glusterd-peer-disconnect, glusterd, glusterd_volume_version_cksum_errors, kernel_issues, memory_usage, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=30
[     OK] Disk used percentage  path=/var  percentage=30
[     OK] Disk used percentage  path=/tmp  percentage=30
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=31302  max_op_version=31302
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      13327/glusterd      

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      10933/glusterfsd    
4:tcp        0      0 0.0.0.0:49154           0.0.0.0:*               LISTEN      16570/glusterfsd    

[  ERROR] Report failure  report=report_check_worker_restarts
[     OK] Glusterd is running  uptime_sec=602772
[WARNING] Errors in Glusterd log file  num_errors=20
[WARNING] Warnings in Glusterd log file  num_warning=13
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=63
[WARNING] Warnings in Glusterd log file num_warnings=33



---------------------------

root@yarrow gluster-health-report-master]#  gluster-health-report

Loaded reports: coredump, disk_usage, errors_in_logs, firewall-check, georep, gfid-mismatch-dht-report, glusterd-op-version, glusterd-peer-disconnect, glusterd, glusterd_volume_version_cksum_errors, kernel_issues, memory_usage, errors_in_logs, ifconfig, nic-health, process_status

[ NOT OK] Disk used percentage is exceeding threshold, consider deleting unnecessary data  path=/  percentage=90
[ NOT OK] Disk used percentage is exceeding threshold, consider deleting unnecessary data  path=/var  percentage=90
[ NOT OK] Disk used percentage is exceeding threshold, consider deleting unnecessary data  path=/tmp  percentage=90
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=31302  max_op_version=31302
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1909/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      2680/glusterfsd     

[  ERROR] Report failure  report=report_check_worker_restarts
[     OK] Glusterd is running  uptime_sec=602772
[WARNING] Errors in Glusterd log file  num_errors=22
[WARNING] Warnings in Glusterd log file  num_warning=10
[     OK] No errors seen at network card
[     OK] No errors seen at network card
3
[WARNING] Errors in Glusterd log file num_errors=78
[WARNING] Warnings in Glusterd log file num_warnings=31

Comment 6 Manisha Saini 2018-08-09 17:42:17 UTC

Tested the same with disabling readdir plus in volume export block. With disabling readir plus,I am able to reproduce the issue again

Detail updated in BZ 1613273

Comment 9 Daniel Gryniewicz 2018-08-14 12:26:23 UTC

This is the same issue as bug #1613273

Comment 10 Frank Filz 2018-08-14 16:57:46 UTC

I'd like to add that very likely bug #1558974 is also the same issue

Comment 12 Manisha Saini 2018-08-16 09:07:52 UTC

Verified this with

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64

Counting files on mount point post running du -sh,now shows correct file count on each iteration

[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999
[root@rhs-client8 dir1]# ls | wc -l
99999


Moving this BZ to verified state.

Comment 13 Manisha Saini 2018-08-17 11:11:53 UTC

I am observing this issue if I count the files on mount point post running du -sh on Ganesha mount.

If I don't run du -sh on mount point and only runs ls | wc -l ,it shows the correct number of files on mount point.


Only running ls | wc -l

------
[root@rhs-client6 dir1]# ls | wc -l
100001
[root@rhs-client6 dir1]# ls | wc -l
100001
[root@rhs-client6 dir1]# ls | wc -l
100001
[root@rhs-client6 dir1]# ls | wc -l
100001
-------

After running du -sh and counting files


------
[root@rhs-client6 dir2]# du -sh
13G     .
[root@rhs-client6 dir2]# du -sh
15G     .
[root@rhs-client6 dir2]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
15004
[root@rhs-client6 dir2]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
15004
[root@rhs-client6 dir2]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
15004
[root@rhs-client6 dir2]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
15004
[root@rhs-client6 dir2]# ls | wc -l
ls: reading directory .: Too many levels of symbolic links
15004
-----

Comment 14 Rahul Hinduja 2018-08-17 14:24:45 UTC

(In reply to Manisha Saini from comment #13)
> I am observing this issue if I count the files on mount point post running
> du -sh on Ganesha mount.
> 

Just to rephrase. The exercise for comment 13 is done with readdir enable. 

readdir = enabled
Scenario 1: running "wc -l" or "ls -lrt" lists all the entries consistently
Scenario 2: run du -sh and than run "wc -l" or "ls -lrt". In this case the entries are inconsistent.

Comment 16 errata-xmlrpc 2018-09-04 06:55:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610

Note You need to log in before you can comment on or make changes to this bug.