Bug 1175711 - posix: Set correct d_type for readdirp() calls
Summary: posix: Set correct d_type for readdirp() calls
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
Assignee: Prashanth Pai
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1332396 1332397
TreeView+ depends on / blocked
 
Reported: 2014-12-18 12:49 UTC by Prashanth Pai
Modified: 2016-11-22 13:04 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1332396 1332397 (view as bug list)
Environment:
Last Closed: 2016-11-22 13:04:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: v3.9.0
Embargoed:


Attachments (Terms of Use)

Description Prashanth Pai 2014-12-18 12:49:33 UTC
Description of problem:
os.walk() in Python walks the entire path given to it. It internally does a stat to determine if a file is a file or directory. This additional stat is not required to determine of a file is a file/directory. An alternative implementation called "scandir.walk()" exists which is at least 2-3 times faster. This is because "scandir.walk()" reads the d_type member of dirent structure returned by readdir(). GlusterFS posix xlator does properly populate the d_type member. Hence it can be accessed/consumed by applications.

https://github.com/benhoyt/scandir

Version-Release number of selected component (if applicable):
GlusterFS master branch

How reproducible:
Run the benchmark script on glusterfs mount point vs on a xfs mountpoint.
https://github.com/benhoyt/scandir/blob/master/benchmark.py


Actual results:
On XFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.035s, scandir.walk took 0.019s -- 1.9x as fast

On GlusterFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.845s, scandir.walk took 0.864s -- 1.0x as fast


Expected results:
scandir.walk() to be faster than os.walk() as it only does readdir() without doing stat() on each file.

TODO:
Retry with all performance xlators disabled.

Comment 1 Niels de Vos 2014-12-23 12:38:50 UTC
We need to verify if the 'struct dirent'->d_type is retrieved correctly over the fuse filesystem. In case it is not, this would be a bug in fuse.

Comment 2 Prashanth Pai 2014-12-23 14:54:04 UTC
I did check that (using following script) on latest master branch code. It does fill that.

#!/usr/bin/env python

# Return status indicates if d_type returned

import ctypes
import sys

(DT_UNKNOWN, DT_DIR,) = (0, 4,)

class dirent(ctypes.Structure):
  _fields_ = [
    ("d_ino", ctypes.c_long),
    ("d_off", ctypes.c_long),
    ("d_reclen", ctypes.c_ushort),
    ("d_type", ctypes.c_ubyte),
    ("d_name", ctypes.c_char*256)]

direntp = ctypes.POINTER(dirent)

libc = ctypes.cdll.LoadLibrary("libc.so.6")
libc.readdir.restype = direntp

dirp = libc.opendir(".")
if dirp:
  ep = libc.readdir(dirp)
else:
  sys.exit(1)

print ep.contents.d_type

Comment 3 Prashanth Pai 2016-04-27 11:28:51 UTC
I was wrong. The above script failed to detect it because d_type is always set correctly for "." and ".." entries. GlusterFS correctly propagates d_type from posix xlator up the stack till FUSE.

It turns out that XFS does't fill correct d_type until recently (Linux>=3.15 and xfsprogs>=3.2.0). If one formats his/her filesystem with XFS's newer version 5 on-disk format, d_type is rightly set.

Example: mkfs.xfs -m crc=1 /srv/disk1

However, GlusterFS can support filling the right d_type in readdirp() responses even if XFS doesn't using the pre-fetched stat information.

Comment 4 Vijay Bellur 2016-04-27 14:14:05 UTC
REVIEW: http://review.gluster.org/14095 (posix: Set correct d_type for readdirp() calls) posted (#1) for review on master by Prashanth Pai (ppai)

Comment 5 Prashanth Pai 2016-04-27 14:32:35 UTC
Created a nested fs tree of depth = 4 on glusterfs mountpoint.

In the below example: ls command from coreutils is capable of avoiding additional lstat() if it finds d_type to be set correctly.

BEFORE http://review.gluster.org/14095:

root# strace -fc -e getdents,lstat ls -fR /mnt/gluster-object/gsmetadata >> /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.95    0.672307          30     22226           getdents
 44.05    0.529388          24     22224           lstat
------ ----------- ----------- --------- --------- ----------------
100.00    1.201695                 44450           total


AFTER http://review.gluster.org/14095:
root# strace -fc -e getdents,lstat ls -fR /mnt/gluster-object/gsmetadata >> /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.595680          27     22226           getdents
------ ----------- ----------- --------- --------- ----------------
100.00    0.595680                 22226           total

Comment 6 Vijay Bellur 2016-05-02 05:32:25 UTC
REVIEW: http://review.gluster.org/14095 (posix: Set correct d_type for readdirp() calls) posted (#2) for review on master by Prashanth Pai (ppai)

Comment 7 Vijay Bellur 2016-05-02 11:48:49 UTC
COMMIT: http://review.gluster.org/14095 committed in master by Jeff Darcy (jdarcy) 
------
commit 77def44d497d090ef3f393b6d9403c1a29dcf993
Author: Prashanth Pai <ppai>
Date:   Wed Apr 27 13:37:07 2016 +0530

    posix: Set correct d_type for readdirp() calls
    
    dirent.d_type can contain the type of the directory entry. The 'd_type'
    struct member in dirent is present in Linux and many BSD flavours.
    However, filling d_type with correct value requires support from the
    underlying filesystem. If not, d_type is set to DT_UNKNOWN. XFS added
    support for d_type as part of their newer version 5 on-disk format.
    However, this requires Linux >= 3.15, xfsprogs >= 3.2.0 and the bricks
    to be formatted using the new format.
    
    This patch enables posix xlator to set d_type to the right value even
    when the underlying filesystem does not support it. d_type can be set
    using information previously fetched by stat() on the dir entry.
    This will aid FUSE applications to leverage d_type to avoid the expense
    of calling lstat() if further actions depend on the type of the file.
    
    Refer `man 3 readdir` and `man 2 getdents`
    
    BUG: 1175711
    Change-Id: Ic5a262fe4c64122726b4fae2d1bea375c559ca04
    Signed-off-by: Prashanth Pai <ppai>
    Reviewed-on: http://review.gluster.org/14095
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>


Note You need to log in before you can comment on or make changes to this bug.