Description of problem: Running ls on some directories takes minutes to complete. The problem appears to be in readdir pausing before returning NULL for the last call. Version-Release number of selected component (if applicable): glusterfs 3.4.1 How reproducible: Happens every time on affected directories. Steps to Reproduce: Running this Perl sctipt illustrates the problem $dir=shift(); opendir(D, "$dir"); $then=time(); while($file=readdir(D)){ print "$file ",time()-$then,"\n"; $then=time(); } print "last ",time()-$then,"\n"; print $?, "\n"; Actual results: [franco@charlie9 02_postwib]$ perl ~/dir.pl . . 0 .. 0 12578248 0 32958783 0 98841022 0 87252299 0 90090733 0 79566203 0 47155616 0 72790643 0 .... 98841018 0 15174257 0 24310895 0 last 106 0 Expected results: Not sure what the last call to readdir is doing but I would expect it to return immediately. Additional info: From a logfile posted to Gluster users group and attached below: This is the READDIRP that returned the last entry in the directory: [2014-02-17 01:21:20.828742] T [fuse-bridge.c:2594:fuse_readdirp_cbk] 0-glusterfs-fuse: 34: READDIRP => 11/4096,8498796000 [2014-02-17 01:21:20.828877] T [fuse-bridge.c:2672:fuse_readdirp_resume] 0-glusterfs-fuse: 35: READDIRP (0x10ddb2c, size=4096, offset=8581931696) and this is the READDIRP that returns NULL: [2014-02-17 01:22:53.823017] T [fuse-bridge.c:2594:fuse_readdirp_cbk] 0-glusterfs-fuse: 35: READDIRP => 0/4096,8581931696 [2014-02-17 01:22:53.823396] T [fuse-bridge.c:2720:fuse_releasedir] 0-glusterfs-fuse: 36: RELEASEDIR 0x10ddb2c note how long it took to complete . The directory being read itself contains about 600 other directories. Here's another example of what I'm seeing: [franco@charlie9 02_postwib]$ time set list=* 0.006u 0.030s 3:31.36 0.0% 0+0k 0+0io 0pf+0w [franco@charlie9 02_postwib]$ echo $list | wc -w 607 This took 3.5 minutes to list the ~600 directories [franco@charlie9 02_postwib]$ time ls -d $list .... 0.012u 0.024s 0:33.54 0.0% 0+0k 0+0io 0pf+0w listing the directory contents explicitly takes 34 seconds. All the wasted time is in waiting for the last readdir call to return NULL. I think this problem could have something to do with the directory being expanded using fix-layout, we added 8 bricks to an existing 8 brick volume. I haven't seen the problem with recently created directories so I tried moving all the files, recreating the directory and moving the files back but it made no difference. We are going to be adding more bricks soon and I am worried that all the existing directories will suffer the same fate. Volume Name: data Type: Distribute Volume ID: 11d03f34-cc91-469f-afc3-35005db0faef Status: Started Number of Bricks: 16 Transport-type: tcp Bricks: Brick1: nas1-10g:/data1/gvol Brick2: nas2-10g:/data5/gvol Brick3: nas1-10g:/data2/gvol Brick4: nas2-10g:/data6/gvol Brick5: nas1-10g:/data3/gvol Brick6: nas2-10g:/data7/gvol Brick7: nas1-10g:/data4/gvol Brick8: nas2-10g:/data8/gvol Brick9: nas3-10g:/data9/gvol Brick10: nas3-10g:/data10/gvol Brick11: nas3-10g:/data11/gvol Brick12: nas3-10g:/data12/gvol Brick13: nas4-10g:/data13/gvol Brick14: nas4-10g:/data14/gvol Brick15: nas4-10g:/data15/gvol Brick16: nas4-10g:/data16/gvol Options Reconfigured: network.frame-timeout: 10800 cluster.min-free-disk: 5% nfs.export-volumes: on nfs.disable: on
Created attachment 865325 [details] log file from gluster client
setting cluster.readdir-optimize as suggested by Vijay Bellur on the gluster mailing list appears to have fixed the problem.
(In reply to Franco Broi from comment #2) > setting cluster.readdir-optimize as suggested by Vijay Bellur on the gluster > mailing list appears to have fixed the problem. Strike that. All the client filesystems crashed and coredumped. Unfortunately I didn't save any of the core dump files, was in such a panic to get things back up and the core file was filling the ramdisk root filesystem on each machine. I'm not even sure not that it made any difference to the ls problem, my quick check after setting the cluster.readdir-optimize was flawed.
(In reply to Franco Broi from comment #3) > > Strike that. All the client filesystems crashed and coredumped. > Unfortunately I didn't save any of the core dump files, was in such a panic > to get things back up and the core file was filling the ramdisk root > filesystem on each machine. I'm not even sure not that it made any > difference to the ls problem, my quick check after setting the > cluster.readdir-optimize was flawed. Sorry to hear that. Do you happen to have logs from the clients at the time of the crash? cluster.readdir-optimize is used by rebalance operation and neither I could not hit any problems in my tests with the option enabled. The log files usually capture the backtrace and can possibly help in understanding the problem better.
(In reply to Vijay Bellur from comment #4) > (In reply to Franco Broi from comment #3) > > > > Strike that. All the client filesystems crashed and coredumped. > > Unfortunately I didn't save any of the core dump files, was in such a panic > > to get things back up and the core file was filling the ramdisk root > > filesystem on each machine. I'm not even sure not that it made any > > difference to the ls problem, my quick check after setting the > > cluster.readdir-optimize was flawed. > > Sorry to hear that. Do you happen to have logs from the clients at the time > of the crash? cluster.readdir-optimize is used by rebalance operation and > neither I could not hit any problems in my tests with the option enabled. > The log files usually capture the backtrace and can possibly help in > understanding the problem better. I found a log on one of the machines and it has a huge number of messages about mismatching layouts, looks like this has been going on for a quite a while without me realising it. I've cut out a bit around the crash. PS. I replied via email but it didn't make it to this list.
Created attachment 867218 [details] extract from logfile showing glusterfs crash
REVIEW: http://review.gluster.org/7816 (nfs/drc: Fix memory corruptions) posted (#1) for review on master by Raghavendra G (rgowdapp)
Is the above relevant to this particular bug? This bug report is about ls being slow with the Gluster Fuse clients.
(In reply to Franco Broi from comment #8) > Is the above relevant to this particular bug? This bug report is about ls > being slow with the Gluster Fuse clients. No, the patch in comment # 7 needs to be modified to point to the correct BZ number i.e. BZ 1099270.
REVIEW: http://review.gluster.org/7816 (nfs/drc: Fix memory corruptions) posted (#2) for review on master by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/7816 committed in master by Niels de Vos (ndevos) ------ commit bb385390a945fe755a302a011aa7a2ec05941fad Author: Raghavendra G <rgowdapp> Date: Wed May 21 10:22:22 2014 +0530 nfs/drc: Fix memory corruptions * A wrong memory allocator was used to (de)allocate nodes (not data in them) of rb tree. This patch uses default allocator, since that suits our purpose. * Fix reference counting of client, though hitting the codepath containing this bug is highly unlikely. Change-Id: I7692097351d6e54288fee01da5af18e761fd0e8c Signed-off-by: Raghavendra G <rgowdapp> BUG: 1067256 Reviewed-on: http://review.gluster.org/7816 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Santosh Pradhan <spradhan> Reviewed-by: Niels de Vos <ndevos>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user