Bug 1369447 - readdir false-failure with non-Linux
Summary: readdir false-failure with non-Linux
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: 3.8
Hardware: Unspecified
OS: FreeBSD
medium
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1297203 1369448
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-23 12:57 UTC by hari gowtham
Modified: 2017-11-07 10:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1297203
Environment:
Last Closed: 2017-11-07 10:42:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description hari gowtham 2016-08-23 12:57:22 UTC
+++ This bug was initially created as a clone of Bug #1297203 +++

Description of problem:
With GlusterFS on FreeBSD and VMware ESXi as NFS client, I encountered the following error and accessing storage takes long time.

[2016-01-10 20:20:41.485157] E [posix.c:4902:posix_fill_readdir] 0-gv0-posix: seekdir(0x5) failed on dir=0x806813940: Invalid argument (offset reused from another DIR * structure?)
[2016-01-10 20:20:41.485283] I [server-rpc-fops.c:1882:server_readdirp_cbk] 0-gv0-server: 36451: READDIRP -2 (d3608328-6167-42fe-8ec8-e6cde384e1ab) ==> (Invalid argument)

How reproducible:
Always.

Steps to Reproduce:
1. Build on FreeBSD 10.1-amd64 environment.
2. Create a single-brick volume on UFS or ZFS pool.
3. Connect from VMware ESXi 5.1 to GlusterFS, as NFS storage.

Actual results:
Got the errors and accessing storage from ESXi takes too long.

Expected results:
No errors logged.

Additional info:
Reading GlusterFS posix storage code I found two problems.
One is __posix_fd_ctx_get (in xlators/storage/posix/src/posix-helpers.c) does not set pfd->dir_eof, and another one is posix_fill_readdir does not check whether pfd->dir_eof is set.

dir_eof is added for NetBSD port, with bug 1129939 and review http://review.gluster.org/8926.

--- Additional comment from Pranith Kumar K on 2016-06-15 03:20:25 EDT ---

hi,
     Is this bug still giving you a problem. I do not have any FreeBSD machines, but if you could help, I will be happy to work with you to fix this issue correctly for you. Sorry was busy with other commitments, so couldn't spend time on this issue.

Pranith

--- Additional comment from 2510 on 2016-06-15 08:58:30 EDT ---

Hello,

Yes, still it is a 'problem' for me.


Recently I found more important problem that glusterfs reuses a value ('cookie') returned by telldir for another DIR (opendir'ed for same directory), when accessing through gluster's NFS.
This problem also blocks me from using glusterfs.

I'm working for these bugs on GitHub. Here is a patch fixes problem above, but it is only for FreeBSD+UFS, and it does not fix the root problem.

https://github.com/2510/glusterfs-freebsd/blob/develop/patches/glusterfs-3.6.8.patch3

--- Additional comment from Pranith Kumar K on 2016-06-15 10:16:16 EDT ---

Will it be possible for you to collaborate with me to work together to find the Root cause and find a fix that works for you as well? Do you think you can share some test machines so that we can work together on those machines to fix this? I mainly develop on Linux. Do you think it is possible to recreate this bug on a VM with freeBSD?

Pranith

--- Additional comment from 2510 on 2016-06-15 10:31:38 EDT ---

Yes, it is reproducible on a VM, but we need a test NFS client that sends readdirs.
Please wait for a while. I'll prepare the machine.

--- Additional comment from Pranith Kumar K on 2016-06-15 10:36:09 EDT ---

Hey,
    I work in India TimeZone. It is 8PM here and I have some personal work now. Do you mind if we catch up on this tomorrow? Please let me know your timezone as well so that we can find a time that works for both of us.

Pranith

--- Additional comment from 2510 on 2016-06-15 10:59:30 EDT ---

Hello.

I am in Asia/Tokyo(Japan) timezone (UTC+9), and I am working for this problem personally. (not for corporation/business works)

And, okay, preparing a dev VM will take some more time.
Can you send me SSH public key so I can set up an account for you?

Thanks,
2510

--- Additional comment from 2510 on 2016-06-16 08:45:58 EDT ---

A dev VM is ready.

--- Additional comment from Niklaas Baudet von Gersdorff on 2016-07-18 03:06:07 EDT ---

I run into the same and other problems when using gluster on FreeBSD. At some point accessing the gluster becomes very slow and my systems slow down tremendously. Have a look at this log: http://sprunge.us/MMhM

On the contrary to the issue mentioned above, I disabled NFS because it didn't work either on FreeBSD. I already filed a bug report on FreeBSD's bugzilla: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209752

--- Additional comment from Pranith Kumar K on 2016-07-18 06:31:52 EDT ---

(In reply to 2510 from comment #7)
> A dev VM is ready.

I can come online, sorry I missed this email. And because of the update on comment-8 I came to know of this again. Please let me know.

--- Additional comment from 2510 on 2016-07-24 05:04:10 EDT ---

(In reply to Pranith Kumar K from comment #9)
> I can come online, sorry I missed this email. And because of the update on
> comment-8 I came to know of this again. Please let me know.

Can you tell me your SSH public key?
I'll create your account.

--- Additional comment from Pranith Kumar K on 2016-07-25 21:57:26 EDT ---

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDkduuGBq++zm/JKYVUcfM6YOqzYp2Dj0ag3OvlkFTXyNZ1QVOoEWuH9MAeF/MlHd14nLvFKSdpI+qr+faY+Wtyt/Za09YnizyMBuEo9hIw307EwynOdfAO8N/PKLAvtsNQ7Xk3UHUfHrvVuJr5qZFs1sWNau67/DBxd3bUO/FUl3FZoZqWg3/qsG8ZTCVEPc4N0qY9xiDFxgDh81lmK8t24S8d9RfMrKtpPbSe75HW1CxqM6AGLpQtDscIydGqmRYYcYSn9box4T3erbVxNpcpSlk6K1akMJhbuNoEbDfD7n4t8X/BLj/h3gJIUTlrXnpPj+hluiHDmeBlhu7a7ctd pk.eng.blr.redhat.com

--- Additional comment from 2510 on 2016-07-27 09:21:50 EDT ---

Okay, I created your account and VM is ready for dev.

------------------------------------------------------------------------

* ssh pk.154.119 with your key.

* Here's my work. You may copy to your home directory.

/home/2510/glusterfs-3.6.8-orig
  - Original glusterfs-3.6.8, without any patches.
/home/2510/glusterfs-3.6.8-patched
  - with some patches, from /home/2510/glusterfs-freebsd/patches/*

* You can use sudo (for mount/umount, install/start/stop glusterfs)
* To mount or umount NFS directory:

$ sudo mount -t nfs 10.0.0.1:/testvol /mnt/testvol
$ sudo umount /mnt/testvol

* To start/stop glusterfs service:

$ sudo service start glusterfsd
$ sudo killall glusterd glusterfsd

* This VM is built for this ticket.
  I do not use for other purposes.

* Device for a brick is /dev/vtbd0p4, and mounted on /mnt/brick.

------------------------------------------------------------------------

Currently, I suspect that glusterfs NFS reuses cookie returned by seekdir over different DIR.
(It means the ticket my subject, the false-positive, is wrong)

This can be tested by following:

1) Install glusterfs and run.
2) Create a volume (with a brick), then mount gluster volume via nfs.

# mkdir /mnt/testvol
# mount -t nfs 10.0.0.1:/testvol /mnt/testvol

3) Create many files on it.

# sudo sh -c 'for i in `seq 0 1 200`; do touch /mnt/testvol/$i; done'

4) List directory entries.

# ls /mnt/testvol

With original glusterfs, only 51 entries are listed. (entries numbered 48 to 200 are missing)
With patched glusterfs, all entries are listed.

--- Additional comment from Pranith Kumar K on 2016-07-29 14:18:58 EDT ---

(In reply to 2510 from comment #12)
> Okay, I created your account and VM is ready for dev.
> 
> ------------------------------------------------------------------------
> 
> * ssh pk.154.119 with your key.
> 
> * Here's my work. You may copy to your home directory.
> 
> /home/2510/glusterfs-3.6.8-orig
>   - Original glusterfs-3.6.8, without any patches.
> /home/2510/glusterfs-3.6.8-patched
>   - with some patches, from /home/2510/glusterfs-freebsd/patches/*
> 
> * You can use sudo (for mount/umount, install/start/stop glusterfs)
> * To mount or umount NFS directory:
> 
> $ sudo mount -t nfs 10.0.0.1:/testvol /mnt/testvol
> $ sudo umount /mnt/testvol
> 
> * To start/stop glusterfs service:
> 
> $ sudo service start glusterfsd
> $ sudo killall glusterd glusterfsd
> 
> * This VM is built for this ticket.
>   I do not use for other purposes.
> 
> * Device for a brick is /dev/vtbd0p4, and mounted on /mnt/brick.
> 
> ------------------------------------------------------------------------
> 
> Currently, I suspect that glusterfs NFS reuses cookie returned by seekdir
> over different DIR.
> (It means the ticket my subject, the false-positive, is wrong)
> 
> This can be tested by following:
> 
> 1) Install glusterfs and run.
> 2) Create a volume (with a brick), then mount gluster volume via nfs.
> 
> # mkdir /mnt/testvol
> # mount -t nfs 10.0.0.1:/testvol /mnt/testvol
> 
> 3) Create many files on it.
> 
> # sudo sh -c 'for i in `seq 0 1 200`; do touch /mnt/testvol/$i; done'
> 
> 4) List directory entries.
> 
> # ls /mnt/testvol
> 
> With original glusterfs, only 51 entries are listed. (entries numbered 48 to
> 200 are missing)
> With patched glusterfs, all entries are listed.

Thanks for this VM. I may have to show this problem to some of our NFS devs too. Please keep the VM until this bug is completely fixed. I will let you know once we are done with finding the Root cause

Comment 1 Niels de Vos 2016-09-12 05:38:19 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 2 Niels de Vos 2017-11-07 10:42:53 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.