1512371 – parallel-readdir = TRUE prevents directories listing

Bug 1512371 - parallel-readdir = TRUE prevents directories listing

Summary: parallel-readdir = TRUE prevents directories listing

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	3.12
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Poornima G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1512437 1529072
TreeView+	depends on / blocked

Reported:	2017-11-13 03:06 UTC by Sam McLeod
Modified:	2018-10-23 14:21 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.12.15
Clone Of:
Clones:	1512437 (view as bug list)
Environment:
Last Closed:	2018-10-23 14:21:35 UTC
Regression:	---
Mount Type:	fuse
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sam McLeod 2017-11-13 03:06:58 UTC

Description of problem:

We found that our Gluster clients couldn't see directories when running `ls` or `find`.

- They could create directories (which could not be seen after the fact).
- They could enter the directories they couldn't see with `cd`.
- They could create and see files.
- The hosts could see the directories.

After disabling `performance.parallel-readdir` on each volume - the problem went away.

As per the docs, prior to enabling `performance.parallel-readdir` I had enabled `performance.readdir-ahead`.

We suspect that as our typology is replica 3, arbiter 1 - _perhaps_ the read operations are also happening on the arbiter node, where of course data doesn't _really_ exist, only the metadata?

Version-Release number of selected component (if applicable):

- CentOS 7 x64
- Gluster Versions 3.12.1 and 3.12.2
- Gluster Client Versions 3.12.1 and 3.12.2

How reproducible:

- Always

Steps to Reproduce:

1. Setup CentOS 7, 3 Replica, 1 Arbiter node cluster running 3.12.2 (I'm assuming .3 will also have the problem)
2. Create a volume for use by the native Gluster FUSE client
3. Enable performance.parallel-readdir on the volume
4. Mount the volume on a client using the native fuse client
5. Create a directory within the volume

Actual results:

- The directory cannot be seen with ls, find etc...

Expected results:

- The directory should show up

Additional info:

Example volume (in the broken state):

# gluster volume info dev_static

Volume Name: dev_static
Type: Replicate
Volume ID: e5042a4d-9ee8-42e4-a4b2-fd66c3e8cb39
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: int-gluster-01.fqdn:/mnt/gluster-storage/dev_static
Brick2: int-gluster-02.fqdn:/mnt/gluster-storage/dev_static
Brick3: int-gluster-03.fqdn:/mnt/gluster-storage/dev_static
Options Reconfigured:
performance.parallel-readdir: true
performance.cache-refresh-timeout: 2
performance.write-behind-window-size: 2MB
server.event-threads: 10
performance.stat-prefetch: true
performance.io-thread-count: 32
performance.cache-size: 128MB
network.ping-timeout: 10
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.use-compound-fops: true
cluster.readdir-optimize: true
cluster.lookup-optimize: true
cluster.favorite-child-policy: size
client.event-threads: 10
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: true
cluster.brick-multiplex: enable

Comment 1 Poornima G 2017-11-13 04:39:29 UTC

I tried on my local setup, i couldn't reproduce the issue, the directories were listed for me.

Will need more details, from which version did you upgrade to 3.12.1?. Which version these directories were created? Does ls not see even the newly created directories? Which version did you start enabling parallel-readdir from? Is it possible to unmount and mount the volume? This is not required as such, instead creating another mount point locally and trying ls, will also do.

Comment 2 Ravishankar N 2017-11-13 05:18:20 UTC

I was able to hit the issue on a 1 brick distribute volume also, so this is not related to AFR or arbiter as such. When parallel-readdir is enabled, the skip_dirs flag for posix_fill_readdir is set to true. My volinfo:

Volume Name: testvol
Type: Distribute
Volume ID: 0c3b3c49-db17-4c14-95f8-e0e3f8b3f071
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 127.0.0.2:/bricks/brick1
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.parallel-readdir: off
performance.client-io-threads: true
client.event-threads: 10
cluster.favorite-child-policy: size
cluster.lookup-optimize: true
cluster.readdir-optimize: true
cluster.use-compound-fops: true
performance.cache-size: 128MB
performance.io-thread-count: 32
performance.stat-prefetch: true
server.event-threads: 10
performance.write-behind-window-size: 2MB
performance.cache-refresh-timeout: 2
transport.address-family: inet
nfs.disable: on

Comment 3 Worker Ant 2017-11-13 07:28:20 UTC

REVIEW: https://review.gluster.org/18723 (dht: Fill fist_up_subvol before use in dht_opendir) posted (#1) for review on master by Poornima G

Comment 4 Worker Ant 2017-11-13 09:10:21 UTC

REVISION POSTED: https://review.gluster.org/18723 (dht: Fill fist_up_subvol before use in dht_opendir) posted (#2) for review on master by Poornima G

Comment 5 Sam McLeod 2017-11-20 06:18:10 UTC

I just met someone else who was caught unaware by this bug.

I was wondering, until it's fixed - is there something you can set on the Gluster server side to prevent clients from ever trying to use the arbiter node?

Comment 6 Ravishankar N 2017-11-20 06:25:41 UTC

(In reply to Sam McLeod from comment #5)
> I just met someone else who was caught unaware by this bug.
> 
> I was wondering, until it's fixed - is there something you can set on the
> Gluster server side to prevent clients from ever trying to use the arbiter
> node?

Sam, this is not related to arbiter (See comment #2). Setting performance.parallel-readdir to off on the volumes should serve as a workaround.

Comment 7 Poornima G 2017-11-20 06:39:46 UTC

This is going to be hit only when parallel-readdir is enabled. And in your use case, there is only 1 distribute, hence there is not much to be made parallel. I would suggest to disable parallel-readdir in this case.

Was it that, you saw any perf improvement after enabling parallel-readdir on your setup?

Comment 10 Nithya Balachandran 2018-10-08 05:32:28 UTC

The fix for this is part of release 4.1 and 5.
As there will be no more 3.12.x releases once version 5 is released, I will close this BZ once 5 is out.

Comment 11 Worker Ant 2018-10-08 08:21:44 UTC

REVIEW: https://review.gluster.org/21364 (dht: Fill first_up_subvol before use in dht_opendir) posted (#1) for review on release-3.12 by N Balachandran

Comment 12 Worker Ant 2018-10-10 05:23:22 UTC

COMMIT: https://review.gluster.org/21364 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- dht: Fill first_up_subvol before use in dht_opendir

Reported by: Sam McLeod

Change-Id: Ic8f9b46b173796afd70aff1042834b03ac3e80b2
BUG: 1512371
Signed-off-by: Poornima G <pgurusid>

Comment 13 Shyamsundar 2018-10-23 14:21:35 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.15, please open a new bug report.

glusterfs-3.12.15 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000114.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.