Bug 1067256 - ls on some directories takes minutes to complete
Summary: ls on some directories takes minutes to complete
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.4.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1099270 1105883
TreeView+ depends on / blocked
 
Reported: 2014-02-20 03:07 UTC by Franco Broi
Modified: 2015-05-14 17:42 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.7.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-14 17:25:36 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
log file from gluster client (672.18 KB, text/x-log)
2014-02-20 03:09 UTC, Franco Broi
no flags Details
extract from logfile showing glusterfs crash (9.40 KB, text/x-log)
2014-02-25 01:10 UTC, Franco Broi
no flags Details

Description Franco Broi 2014-02-20 03:07:24 UTC
Description of problem:

Running ls on some directories takes minutes to complete. The problem appears to be in readdir pausing before returning NULL for the last call.


Version-Release number of selected component (if applicable):
glusterfs 3.4.1


How reproducible:
Happens every time on affected directories.


Steps to Reproduce:

Running this Perl sctipt illustrates the problem

$dir=shift();

opendir(D, "$dir");
$then=time();
while($file=readdir(D)){
    print "$file ",time()-$then,"\n";
    $then=time();
}
print  "last ",time()-$then,"\n";
print $?, "\n";

Actual results:

[franco@charlie9 02_postwib]$ perl ~/dir.pl .
. 0
.. 0
12578248 0
32958783 0
98841022 0
87252299 0
90090733 0
79566203 0
47155616 0
72790643 0
....
98841018 0
15174257 0
24310895 0
last 106
0

Expected results:

Not sure what the last call to readdir is doing but I would expect it to return immediately.

Additional info:

From a logfile posted to Gluster users group and attached below:

This is the READDIRP that returned the last entry in the directory:

[2014-02-17 01:21:20.828742] T [fuse-bridge.c:2594:fuse_readdirp_cbk] 0-glusterfs-fuse: 34: READDIRP => 11/4096,8498796000
[2014-02-17 01:21:20.828877] T [fuse-bridge.c:2672:fuse_readdirp_resume] 0-glusterfs-fuse: 35: READDIRP (0x10ddb2c, size=4096, offset=8581931696)


and this is the READDIRP that returns NULL:

[2014-02-17 01:22:53.823017] T [fuse-bridge.c:2594:fuse_readdirp_cbk] 0-glusterfs-fuse: 35: READDIRP => 0/4096,8581931696
[2014-02-17 01:22:53.823396] T [fuse-bridge.c:2720:fuse_releasedir] 0-glusterfs-fuse: 36: RELEASEDIR 0x10ddb2c

note how long it took to complete .


The directory being read itself contains about 600 other directories.

Here's another example of what I'm seeing:

[franco@charlie9 02_postwib]$ time set list=*
0.006u 0.030s 3:31.36 0.0%      0+0k 0+0io 0pf+0w

[franco@charlie9 02_postwib]$ echo $list | wc -w
607

This took 3.5 minutes to list the ~600 directories

[franco@charlie9 02_postwib]$ time ls -d $list
....
0.012u 0.024s 0:33.54 0.0%      0+0k 0+0io 0pf+0w

listing the directory contents explicitly takes 34 seconds.  All the wasted time is in waiting for the last readdir call to return NULL.


I think this problem could have something to do with the directory being expanded using fix-layout, we added 8 bricks to an existing 8 brick volume.

I haven't seen the problem with recently created directories so I tried moving all the files, recreating the directory and moving the files back but it made no difference.

We are going to be adding more bricks soon and I am worried that all the existing directories will suffer the same fate.

Volume Name: data
Type: Distribute
Volume ID: 11d03f34-cc91-469f-afc3-35005db0faef
Status: Started
Number of Bricks: 16
Transport-type: tcp
Bricks:
Brick1: nas1-10g:/data1/gvol
Brick2: nas2-10g:/data5/gvol
Brick3: nas1-10g:/data2/gvol
Brick4: nas2-10g:/data6/gvol
Brick5: nas1-10g:/data3/gvol
Brick6: nas2-10g:/data7/gvol
Brick7: nas1-10g:/data4/gvol
Brick8: nas2-10g:/data8/gvol
Brick9: nas3-10g:/data9/gvol
Brick10: nas3-10g:/data10/gvol
Brick11: nas3-10g:/data11/gvol
Brick12: nas3-10g:/data12/gvol
Brick13: nas4-10g:/data13/gvol
Brick14: nas4-10g:/data14/gvol
Brick15: nas4-10g:/data15/gvol
Brick16: nas4-10g:/data16/gvol
Options Reconfigured:
network.frame-timeout: 10800
cluster.min-free-disk: 5%
nfs.export-volumes: on
nfs.disable: on

Comment 1 Franco Broi 2014-02-20 03:09:17 UTC
Created attachment 865325 [details]
log file from gluster client

Comment 2 Franco Broi 2014-02-22 05:05:57 UTC
setting cluster.readdir-optimize as suggested by Vijay Bellur on the gluster mailing list appears to have fixed the problem.

Comment 3 Franco Broi 2014-02-24 00:56:55 UTC
(In reply to Franco Broi from comment #2)
> setting cluster.readdir-optimize as suggested by Vijay Bellur on the gluster
> mailing list appears to have fixed the problem.

Strike that. All the client filesystems crashed and coredumped. Unfortunately I didn't save any of the core dump files, was in such a panic to get things back up and the core file was filling the ramdisk root filesystem on each machine. I'm not even sure not that it made any difference to the ls problem, my quick check after setting the cluster.readdir-optimize was flawed.

Comment 4 Vijay Bellur 2014-02-24 01:21:55 UTC
(In reply to Franco Broi from comment #3)
> 
> Strike that. All the client filesystems crashed and coredumped.
> Unfortunately I didn't save any of the core dump files, was in such a panic
> to get things back up and the core file was filling the ramdisk root
> filesystem on each machine. I'm not even sure not that it made any
> difference to the ls problem, my quick check after setting the
> cluster.readdir-optimize was flawed.

Sorry to hear that. Do you happen to have logs from the clients at the time of the crash? cluster.readdir-optimize is used by rebalance operation and neither I could not hit any problems in my tests with the option enabled. The log files usually capture the backtrace and can possibly help in understanding the problem better.

Comment 5 Franco Broi 2014-02-25 01:08:40 UTC
(In reply to Vijay Bellur from comment #4)
> (In reply to Franco Broi from comment #3)
> > 
> > Strike that. All the client filesystems crashed and coredumped.
> > Unfortunately I didn't save any of the core dump files, was in such a panic
> > to get things back up and the core file was filling the ramdisk root
> > filesystem on each machine. I'm not even sure not that it made any
> > difference to the ls problem, my quick check after setting the
> > cluster.readdir-optimize was flawed.
> 
> Sorry to hear that. Do you happen to have logs from the clients at the time
> of the crash? cluster.readdir-optimize is used by rebalance operation and
> neither I could not hit any problems in my tests with the option enabled.
> The log files usually capture the backtrace and can possibly help in
> understanding the problem better.


I found a log on one of the machines and it has a huge number of
messages about mismatching layouts, looks like this has been going on
for a quite a while without me realising it. I've cut out a bit around
the crash.

PS. I replied via email but it didn't make it to this list.

Comment 6 Franco Broi 2014-02-25 01:10:32 UTC
Created attachment 867218 [details]
extract from logfile showing glusterfs crash

Comment 7 Anand Avati 2014-05-21 05:19:13 UTC
REVIEW: http://review.gluster.org/7816 (nfs/drc: Fix memory corruptions) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 8 Franco Broi 2014-05-21 06:39:39 UTC
Is the above relevant to this particular bug? This bug report is about ls being slow with the Gluster Fuse clients.

Comment 9 santosh pradhan 2014-05-21 09:00:15 UTC
(In reply to Franco Broi from comment #8)
> Is the above relevant to this particular bug? This bug report is about ls
> being slow with the Gluster Fuse clients.

No, the patch in comment # 7 needs to be modified to point to the correct BZ number i.e. BZ 1099270.

Comment 10 Anand Avati 2014-06-30 05:03:23 UTC
REVIEW: http://review.gluster.org/7816 (nfs/drc: Fix memory corruptions) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 11 Anand Avati 2014-07-30 13:54:07 UTC
COMMIT: http://review.gluster.org/7816 committed in master by Niels de Vos (ndevos) 
------
commit bb385390a945fe755a302a011aa7a2ec05941fad
Author: Raghavendra G <rgowdapp>
Date:   Wed May 21 10:22:22 2014 +0530

    nfs/drc: Fix memory corruptions
    
    * A wrong memory allocator was used to (de)allocate nodes (not data in
    them) of rb tree. This patch uses default allocator, since that suits
    our purpose.
    * Fix reference counting of client, though hitting the codepath
    containing this bug is highly unlikely.
    
    Change-Id: I7692097351d6e54288fee01da5af18e761fd0e8c
    Signed-off-by: Raghavendra G <rgowdapp>
    BUG: 1067256
    Reviewed-on: http://review.gluster.org/7816
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Santosh Pradhan <spradhan>
    Reviewed-by: Niels de Vos <ndevos>

Comment 12 Niels de Vos 2015-05-14 17:25:36 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 13 Niels de Vos 2015-05-14 17:35:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 14 Niels de Vos 2015-05-14 17:37:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 15 Niels de Vos 2015-05-14 17:42:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.