Bug 1332397 - posix: Set correct d_type for readdirp() calls
Summary: posix: Set correct d_type for readdirp() calls
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: 3.7.11
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
Assignee: Prashanth Pai
QA Contact:
URL:
Whiteboard:
Depends On: 1175711
Blocks: 1332396
TreeView+ depends on / blocked
 
Reported: 2016-05-03 06:30 UTC by Prashanth Pai
Modified: 2016-06-28 12:16 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.7.12
Doc Type: Bug Fix
Doc Text:
Clone Of: 1175711
Environment:
Last Closed: 2016-06-28 11:09:26 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Prashanth Pai 2016-05-03 06:30:11 UTC
+++ This bug was initially created as a clone of Bug #1175711 +++

Description of problem:
os.walk() in Python walks the entire path given to it. It internally does a stat to determine if a file is a file or directory. This additional stat is not required to determine of a file is a file/directory. An alternative implementation called "scandir.walk()" exists which is at least 2-3 times faster. This is because "scandir.walk()" reads the d_type member of dirent structure returned by readdir(). GlusterFS posix xlator does properly populate the d_type member. Hence it can be accessed/consumed by applications.

https://github.com/benhoyt/scandir

Version-Release number of selected component (if applicable):
GlusterFS master branch

How reproducible:
Run the benchmark script on glusterfs mount point vs on a xfs mountpoint.
https://github.com/benhoyt/scandir/blob/master/benchmark.py


Actual results:
On XFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.035s, scandir.walk took 0.019s -- 1.9x as fast

On GlusterFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.845s, scandir.walk took 0.864s -- 1.0x as fast


Expected results:
scandir.walk() to be faster than os.walk() as it only does readdir() without doing stat() on each file.

TODO:
Retry with all performance xlators disabled.

--- Additional comment from Niels de Vos on 2014-12-23 07:38:50 EST ---

We need to verify if the 'struct dirent'->d_type is retrieved correctly over the fuse filesystem. In case it is not, this would be a bug in fuse.

--- Additional comment from Prashanth Pai on 2014-12-23 09:54:04 EST ---

I did check that (using following script) on latest master branch code. It does fill that.

#!/usr/bin/env python

# Return status indicates if d_type returned

import ctypes
import sys

(DT_UNKNOWN, DT_DIR,) = (0, 4,)

class dirent(ctypes.Structure):
  _fields_ = [
    ("d_ino", ctypes.c_long),
    ("d_off", ctypes.c_long),
    ("d_reclen", ctypes.c_ushort),
    ("d_type", ctypes.c_ubyte),
    ("d_name", ctypes.c_char*256)]

direntp = ctypes.POINTER(dirent)

libc = ctypes.cdll.LoadLibrary("libc.so.6")
libc.readdir.restype = direntp

dirp = libc.opendir(".")
if dirp:
  ep = libc.readdir(dirp)
else:
  sys.exit(1)

print ep.contents.d_type

--- Additional comment from Prashanth Pai on 2016-04-27 07:28:51 EDT ---

I was wrong. The above script failed to detect it because d_type is always set correctly for "." and ".." entries. GlusterFS correctly propagates d_type from posix xlator up the stack till FUSE.

It turns out that XFS does't fill correct d_type until recently (Linux>=3.15 and xfsprogs>=3.2.0). If one formats his/her filesystem with XFS's newer version 5 on-disk format, d_type is rightly set.

Example: mkfs.xfs -m crc=1 /srv/disk1

However, GlusterFS can support filling the right d_type in readdirp() responses even if XFS doesn't using the pre-fetched stat information.

--- Additional comment from Vijay Bellur on 2016-04-27 10:14:05 EDT ---

REVIEW: http://review.gluster.org/14095 (posix: Set correct d_type for readdirp() calls) posted (#1) for review on master by Prashanth Pai (ppai)

--- Additional comment from Prashanth Pai on 2016-04-27 10:32:35 EDT ---

Created a nested fs tree of depth = 4 on glusterfs mountpoint.

In the below example: ls command from coreutils is capable of avoiding additional lstat() if it finds d_type to be set correctly.

BEFORE http://review.gluster.org/14095:

root# strace -fc -e getdents,lstat ls -fR /mnt/gluster-object/gsmetadata >> /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.95    0.672307          30     22226           getdents
 44.05    0.529388          24     22224           lstat
------ ----------- ----------- --------- --------- ----------------
100.00    1.201695                 44450           total


AFTER http://review.gluster.org/14095:
root# strace -fc -e getdents,lstat ls -fR /mnt/gluster-object/gsmetadata >> /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.595680          27     22226           getdents
------ ----------- ----------- --------- --------- ----------------
100.00    0.595680                 22226           total

--- Additional comment from Vijay Bellur on 2016-05-02 01:32:25 EDT ---

REVIEW: http://review.gluster.org/14095 (posix: Set correct d_type for readdirp() calls) posted (#2) for review on master by Prashanth Pai (ppai)

--- Additional comment from Vijay Bellur on 2016-05-02 07:48:49 EDT ---

COMMIT: http://review.gluster.org/14095 committed in master by Jeff Darcy (jdarcy) 
------
commit 77def44d497d090ef3f393b6d9403c1a29dcf993
Author: Prashanth Pai <ppai>
Date:   Wed Apr 27 13:37:07 2016 +0530

    posix: Set correct d_type for readdirp() calls
    
    dirent.d_type can contain the type of the directory entry. The 'd_type'
    struct member in dirent is present in Linux and many BSD flavours.
    However, filling d_type with correct value requires support from the
    underlying filesystem. If not, d_type is set to DT_UNKNOWN. XFS added
    support for d_type as part of their newer version 5 on-disk format.
    However, this requires Linux >= 3.15, xfsprogs >= 3.2.0 and the bricks
    to be formatted using the new format.
    
    This patch enables posix xlator to set d_type to the right value even
    when the underlying filesystem does not support it. d_type can be set
    using information previously fetched by stat() on the dir entry.
    This will aid FUSE applications to leverage d_type to avoid the expense
    of calling lstat() if further actions depend on the type of the file.
    
    Refer `man 3 readdir` and `man 2 getdents`
    
    BUG: 1175711
    Change-Id: Ic5a262fe4c64122726b4fae2d1bea375c559ca04
    Signed-off-by: Prashanth Pai <ppai>
    Reviewed-on: http://review.gluster.org/14095
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 1 Vijay Bellur 2016-05-03 06:37:13 UTC
REVIEW: http://review.gluster.org/14176 (posix: Set correct d_type for readdirp() calls) posted (#1) for review on release-3.7 by Prashanth Pai (ppai)

Comment 2 Vijay Bellur 2016-05-11 11:09:40 UTC
COMMIT: http://review.gluster.org/14176 committed in release-3.7 by Niels de Vos (ndevos) 
------
commit a0ae826a7413e1ad0a5796201d156d8d915c93ad
Author: Prashanth Pai <ppai>
Date:   Wed Apr 27 13:37:07 2016 +0530

    posix: Set correct d_type for readdirp() calls
    
    dirent.d_type can contain the type of the directory entry. The 'd_type'
    struct member in dirent is present in Linux and many BSD flavours.
    However, filling d_type with correct value requires support from the
    underlying filesystem. If not, d_type is set to DT_UNKNOWN. XFS added
    support for d_type as part of their newer version 5 on-disk format.
    However, this requires Linux >= 3.15, xfsprogs >= 3.2.0 and the bricks
    to be formatted using the new format.
    
    This patch enables posix xlator to set d_type to the right value even
    when the underlying filesystem does not support it. d_type can be set
    using information previously fetched by stat() on the dir entry.
    This will aid FUSE applications to leverage d_type to avoid the expense
    of calling lstat() if further actions depend on the type of the file.
    
    Refer `man 3 readdir` and `man 2 getdents`
    
    > Change-Id: Ic5a262fe4c64122726b4fae2d1bea375c559ca04
    > Signed-off-by: Prashanth Pai <ppai>
    > Reviewed-on: http://review.gluster.org/14095
    > Smoke: Gluster Build System <jenkins.com>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.com>
    > Reviewed-by: Jeff Darcy <jdarcy>
    
    (cherry picked from commit 77def44d497d090ef3f393b6d9403c1a29dcf993)
    
    Change-Id: I8de1e643dbe88c57eb7a946357283f46c30ae701
    BUG: 1332397
    Signed-off-by: Prashanth Pai <ppai>
    Reviewed-on: http://review.gluster.org/14176
    Smoke: Gluster Build System <jenkins.com>
    Tested-by: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Niels de Vos <ndevos>

Comment 3 Kaushal 2016-06-28 12:16:31 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.