Bug 845330 - RHS volume mounted as NFS causing a lot of readdir loop messages
Summary: RHS volume mounted as NFS causing a lot of readdir loop messages
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: x86_64
OS: Linux
high
unspecified
Target Milestone: ---
: ---
Assignee: Vivek Agarwal
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks: 858436
TreeView+ depends on / blocked
 
Reported: 2012-08-02 16:56 UTC by Veda Shankar
Modified: 2016-06-12 23:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 858436 (view as bug list)
Environment:
Last Closed: 2013-07-12 16:44:29 UTC
Embargoed:


Attachments (Terms of Use)

Description Veda Shankar 2012-08-02 16:56:21 UTC
Description of problem:
The customer has NFS mounted the RHS volume.  He is seeing the following messages in dmesg on most of his clients:

NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dtÿÿÿÿ@Ìÿÿÿÿ ÌÿÿÿÿPÿÿÿÿ.mozilla@Ìÿÿÿÿ Ìÿÿÿÿ.baklinux-x86-64.so.2 has duplicate cookie 99
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dtÿÿÿÿ@Ìÿÿÿÿ ÌÿÿÿÿPÿÿÿÿ.mozilla@Ìÿÿÿÿ Ìÿÿÿÿ.baklinux-x86-64.so.2 has duplicate cookie 99
 
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: dead.letterÿÿÿÿ Ìÿÿÿÿ.tkcvsnasas.hmac-sha-1 has duplicate cookie 204
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: Documents2-gnome2 has duplicate cookie 88
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: Documents2-gnome2 has duplicate cookie 88
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: .localpplet.conf has duplicate cookie 47
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: .localpplet.conf has duplicate cookie 47
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: .gqviewrc-defaultlMediaSDK_60e has duplicate cookie 245
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: .gqviewrc-defaultlMediaSDK_60e has duplicate cookie 245
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: .gqviewrc-defaultlMediaSDK_60e has duplicate cookie 245
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: .gqviewrc-defaultlMediaSDK_60e has duplicate cookie 245
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: old_src_for_production_pre_bfgs_update_moved_04_03_2012.sav{v? has duplicate cookie 109
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: old_src_for_production_pre_bfgs_update_moved_04_03_2012.sav{v? has duplicate cookie 109
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: old_src_for_production_pre_bfgs_update_moved_04_03_2012.sav{v? has duplicate cookie 109
NFS: directory /simon contains a readdir loop.Please contact your server vendor.  The file: old_src_for_production_pre_bfgs_update_moved_04_03_2012.sav{v? has duplicate cookie 109

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Amar Tumballi 2012-08-02 17:05:24 UTC
want to double check if the backend is XFS. If its ext4, there is a possibility of running into 64bit d_off issue.

Kris, can you have a look on this one and help Veda with solution?

Comment 2 Veda Shankar 2012-08-02 17:11:41 UTC
The backend is XFS.

Comment 4 Vijay Bellur 2012-08-02 17:34:59 UTC
Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=770250

Comment 5 Krishna Srinivas 2012-08-03 08:02:40 UTC
Veda,

We had hit this too in our testing: https://bugzilla.redhat.com/show_bug.cgi?id=814052 pointing to bug 790729.

Can you see if he is seeing this in RHEL 6.2 clients only or any other clients too?

Comment 6 Veda Shankar 2012-08-10 04:02:00 UTC
1) 
The RHEL bug report says that this error is fixed in kernel-2.6.32-235.el6.
The customer is seeing these messages even on clients running a newer kernel   2.6.32-279.1.1.

2)The customer sees these errors on both Redhat 6.2 and 6.3 clients.

3)
On clients running Redhat 5.5, we see the following error messages:
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1
lockd: unexpected unlock status: 1

Comment 7 Krishna Srinivas 2012-08-14 09:39:05 UTC
(In reply to comment #6)
> 1) 
> The RHEL bug report says that this error is fixed in kernel-2.6.32-235.el6.
> The customer is seeing these messages even on clients running a newer kernel
> 2.6.32-279.1.1.
> 
> 2)The customer sees these errors on both Redhat 6.2 and 6.3 clients.
> 
> 3)
> On clients running Redhat 5.5, we see the following error messages:
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1
> lockd: unexpected unlock status: 1

So the customer has not seen readdir loop error on RH 5.5? Can you check the nfs log on the serverside for more info on the lockd/unlock error messages?

Do you know of a way to reproduce the readdir loop error in house? Do you know what application runs on the mount point to cause this error?

On the mountpoint, does doing "ls" in the affected directory (/simon) result in error? if so can you do "echo 3 > /proc/sys/vm/drop_caches" and see if "ls" still sees problem?

Comment 8 Veda Shankar 2012-08-15 02:38:49 UTC
Hi Krishna,

Could you please respond to item#1 as to why this happens even with the newer version of the kernel?

Comment 9 Krishna Srinivas 2012-08-16 06:38:53 UTC
Veda, because it might be a bug in glusterfs and not in the kernel. We need to dig deeper to find out (and hence my questions in the previous comment)

Comment 10 Krishna Srinivas 2012-11-02 12:06:00 UTC
We need more info on my previous questions:

------------

So the customer has not seen readdir loop error on RH 5.5? Can you check the nfs log on the serverside for more info on the lockd/unlock error messages?

Do you know of a way to reproduce the readdir loop error in house? Do you know what application runs on the mount point to cause this error?

On the mountpoint, does doing "ls" in the affected directory (/simon) result in error? if so can you do "echo 3 > /proc/sys/vm/drop_caches" and see if "ls" still sees problem?
------------

If this is really a bug in glusterfs, there might be one situation which causes this - suppose there is readdirp going on, in the process if we delete and create a file, if the new file has the same offset as the deleted file (which was already read during readdirp) we might see this behavior. But I am not sure, we need a way to reproduce this bug to be sure.

Comment 11 Vivek Agarwal 2013-07-12 16:44:29 UTC
Based  on discussion with Sayan and Scott closing this


Note You need to log in before you can comment on or make changes to this bug.