Bug 1529072

Summary: parallel-readdir = TRUE prevents directories listing
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Raghavendra G <rgowdapp>
Component: distributeAssignee: Poornima G <pgurusid>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: bugs, mailinglists, nbalacha, pgurusid, ravishankar, rgowdapp, rhinduja, rhs-bugs, sheggodu, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1512437 Environment:
Last Closed: 2018-09-04 06:40:20 UTC Type: Bug
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1512371, 1512437    
Bug Blocks: 1503137    

Description Raghavendra G 2017-12-26 09:17:45 UTC
+++ This bug was initially created as a clone of Bug #1512437 +++

+++ This bug was initially created as a clone of Bug #1512371 +++

Description of problem:

We found that our Gluster clients couldn't see directories when running `ls` or `find`.

- They could create directories (which could not be seen after the fact).
- They could enter the directories they couldn't see with `cd`.
- They could create and see files.
- The hosts could see the directories.

After disabling `performance.parallel-readdir` on each volume - the problem went away.

As per the docs, prior to enabling `performance.parallel-readdir` I had enabled `performance.readdir-ahead`.


We suspect that as our typology is replica 3, arbiter 1 - _perhaps_ the read operations are also happening on the arbiter node, where of course data doesn't _really_ exist, only the metadata?


Version-Release number of selected component (if applicable):

- CentOS 7 x64
- Gluster Versions 3.12.1 and 3.12.2
- Gluster Client Versions 3.12.1 and 3.12.2

How reproducible:

- Always

Steps to Reproduce:

1. Setup CentOS 7, 3 Replica, 1 Arbiter node cluster running 3.12.2 (I'm assuming .3 will also have the problem)
2. Create a volume for use by the native Gluster FUSE client
3. Enable performance.parallel-readdir on the volume
4. Mount the volume on a client using the native fuse client
5. Create a directory within the volume

Actual results:

- The directory cannot be seen with ls, find etc...

Expected results:

- The directory should show up


Additional info:

Example volume (in the broken state):


# gluster volume info dev_static

Volume Name: dev_static
Type: Replicate
Volume ID: e5042a4d-9ee8-42e4-a4b2-fd66c3e8cb39
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: int-gluster-01.fqdn:/mnt/gluster-storage/dev_static
Brick2: int-gluster-02.fqdn:/mnt/gluster-storage/dev_static
Brick3: int-gluster-03.fqdn:/mnt/gluster-storage/dev_static
Options Reconfigured:
performance.parallel-readdir: true
performance.cache-refresh-timeout: 2
performance.write-behind-window-size: 2MB
server.event-threads: 10
performance.stat-prefetch: true
performance.io-thread-count: 32
performance.cache-size: 128MB
network.ping-timeout: 10
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.use-compound-fops: true
cluster.readdir-optimize: true
cluster.lookup-optimize: true
cluster.favorite-child-policy: size
client.event-threads: 10
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: true
cluster.brick-multiplex: enable

--- Additional comment from Poornima G on 2017-11-12 23:39:29 EST ---

I tried on my local setup, i couldn't reproduce the issue, the directories were listed for me.

Will need more details, from which version did you upgrade to 3.12.1?. Which version these directories were created? Does ls not see even the newly created directories? Which version did you start enabling parallel-readdir from? Is it possible to unmount and mount the volume? This is not required as such, instead creating another mount point locally and trying ls, will also do.

--- Additional comment from Ravishankar N on 2017-11-13 00:18:20 EST ---

I was able to hit the issue on a 1 brick distribute volume also, so this is not related to AFR or arbiter as such. When parallel-readdir is enabled, the skip_dirs flag for posix_fill_readdir is set to true. My volinfo:

Volume Name: testvol
Type: Distribute
Volume ID: 0c3b3c49-db17-4c14-95f8-e0e3f8b3f071
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 127.0.0.2:/bricks/brick1
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.parallel-readdir: off
performance.client-io-threads: true
client.event-threads: 10
cluster.favorite-child-policy: size
cluster.lookup-optimize: true
cluster.readdir-optimize: true
cluster.use-compound-fops: true
performance.cache-size: 128MB
performance.io-thread-count: 32
performance.stat-prefetch: true
server.event-threads: 10
performance.write-behind-window-size: 2MB
performance.cache-refresh-timeout: 2
transport.address-family: inet
nfs.disable: on

--- Additional comment from Worker Ant on 2017-11-13 02:28:20 EST ---

REVIEW: https://review.gluster.org/18723 (dht: Fill fist_up_subvol before use in dht_opendir) posted (#1) for review on master by Poornima G

--- Additional comment from Worker Ant on 2017-11-13 04:10:23 EST ---

REVIEW: https://review.gluster.org/18723 (dht: Fill fist_up_subvol before use in dht_opendir) posted (#2) for review on master by Poornima G

--- Additional comment from Worker Ant on 2017-12-15 00:10:34 EST ---

COMMIT: https://review.gluster.org/18723 committed in master by \"Poornima G\" <pgurusid> with a commit message- dht: Fill first_up_subvol before use in dht_opendir

Reported by: Sam McLeod

Change-Id: Ic8f9b46b173796afd70aff1042834b03ac3e80b2
BUG: 1512437
Signed-off-by: Poornima G <pgurusid>

Comment 7 Prasad Desala 2018-05-22 09:11:47 UTC
Could not reproduce this issue on glusterfs version 3.12.2-11.el7rhgs.x86_64 by following the steps mentioned in the description. 

Moving this BZ to Verified.

Comment 9 errata-xmlrpc 2018-09-04 06:40:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607