Bug 905933 - GlusterFS 3.3.1: NFS Too many levels of symbolic links/duplicate cookie
Summary: GlusterFS 3.3.1: NFS Too many levels of symbolic links/duplicate cookie
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: GlusterFS Bugs list
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-30 14:01 UTC by Justin Albstmeijer
Modified: 2014-10-20 14:07 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-10-20 14:07:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Justin Albstmeijer 2013-01-30 14:01:48 UTC
Description of problem:

While doing NFS fail-over tests on 2 replicated GlusterFS nodes, the NFS clients produce errors

ls: reading directory test1/: Too many levels of symbolic links
Jan 30 13:41:58 xx kernel: [492498.250168] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359547768463822182 has duplicate cookie 1150239009500529518

Version-Release number of selected component (if applicable):

kernel-2.6.32-279.19.1.el6
glusterfs-3.3.1-1.el6.x86_64
glusterfs-geo-replication-3.3.1-1.el6.x86_64
glusterfs-server-3.3.1-1.el6.x86_64
glusterfs-rdma-3.3.1-1.el6.x86_64
glusterfs-fuse-3.3.1-1.el6.x86_64


How reproducible:

running a 'touch' loop and moving vips around

Steps to Reproduce:

start state:

gluster node1 has vip1
gluster node2 has vip1

nfs test client1 mounts from vip1
nfs test client2 mounts from vip2
nfs test client3 mounts from vip1
nfs test client4 mounts from vip2

nfs test client1 does a while true touch loop in nfs folder test1
nfs test client2 does a while true touch loop in nfs folder test2
nfs test client3 does a ls count watch loop in nfs folders test1 and test2
nfs test client4 does a ls count watch loop in nfs folders test1 and test2

test commands:
while true; do touch `date +%s%N`; sleep 1 ;done
watch 'echo -n "test1 ";ls test1/ | wc -l; echo -n "test2  "; ls test2/ | wc -l'

tests:

move vip1 to gluster node2
move vip2 to gluster node1

move vip1 to gluster node1
move vip2 to gluster node2
  
Actual results:

Clients do not loose their nfs mount, even if it sometimes takes a few minutes to recover while hanging.


ls: reading directory test1/: Too many levels of symbolic links
ls: reading directory test1/: Too many levels of symbolic links
ls: reading directory test1/: Too many levels of symbolic links
ls: reading directory test1/: Too many levels of symbolic links
ls: reading directory test2/: Too many levels of symbolic links
ls: reading directory test2/: Too many levels of symbolic links
ls: reading directory test2/: Too many levels of symbolic links
ls: reading directory test2/: Too many levels of symbolic links
ls: reading directory test2/: Too many levels of symbolic links


Jan 30 13:38:40 xx kernel: [492300.447793] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359548658577264637-64.so.2 has duplicate cookie 2975285876436019500
Jan 30 13:38:40 xx kernel: [492300.448100] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359548658577264637-64.so.2 has duplicate cookie 2975285876436019500
Jan 30 13:41:58 xx kernel: [492498.250168] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359547768463822182 has duplicate cookie 1150239009500529518
Jan 30 13:41:58 xx kernel: [492498.250377] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359547768463822182 has duplicate cookie 1150239009500529518
Jan 30 13:42:47 xx kernel: [492547.683880] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359549713888590789<F9>Lq<99>><FB><CF><BA>    ^X<98><AD>^\1359549372849716119^A<A0><C3>^D<9E><FC>f<B8>[ has duplicate cookie 5323278086631868562
Jan 30 13:42:47 xx kernel: [492547.683987] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359549713888590789<F9>Lq<99>><FB><CF><BA>    ^X<98><AD>^\1359549372849716119^A<A0><C3>^D<9E><FC>f<B8>[ has duplicate cookie 5323278086631868562
Jan 30 13:42:50 xx kernel: [492550.672743] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359549516038123212<E6> has duplicate cookie 7680743801850642182
Jan 30 13:42:50 xx kernel: [492550.672904] NFS: directory content/test1 contains a readdir loop.Please contact your server vendor.  The file: 1359549516038123212<E6> has duplicate cookie 7680743801850642182
Jan 30 13:42:51 xx kernel: [492551.182458] NFS: directory content/test2 contains a readdir loop.Please contact your server vendor.  The file: 1359547974591455205-64.so.2 has duplicate cookie 8076066796481300716
Jan 30 13:42:51 xx kernel: [492551.182569] NFS: directory content/test2 contains a readdir loop.Please contact your server vendor.  The file: 1359547974591455205-64.so.2 has duplicate cookie 8076066796481300716
Jan 30 13:43:00 xx kernel: [492560.072294] NFS: directory content/test2 contains a readdir loop.Please contact your server vendor.  The file: 1359548346453256184 has duplicate cookie 1703179529881495508
Jan 30 13:43:00 xx kernel: [492560.072600] NFS: directory content/test2 contains a readdir loop.Please contact your server vendor.  The file: 1359548346453256184 has duplicate cookie 1703179529881495508

Expected results:

Maybe some hick-ups during fail-over but no errors.

Additional info:

I'm testing on Amazon EC2. So the ip fail-over is being done by detaching and attaching secondary ips.
File system on bricks is ext4.

Comment 2 Justin Albstmeijer 2013-09-12 08:54:19 UTC
We swithed to detachting and attaching a dedicated NFS network interface (Eni) on failover to the other server instead of only detachting and attaching the ip number.
This way of failover does not show this problem.

If anyone is still interested in debugging the issue and wants me to test a version of GlusterFS that includes a patch potentially fixing the initial problem, I'm willing to test this.

Comment 3 Vivek Agarwal 2013-09-12 09:15:20 UTC
We had investigated this and did not find this to be an issue:
Tested the issue in kernel-2.6.32-358.el6.x86_64 and the issue is not seen here. 

The "Too many levels of symbolic links" error is most likely due to bug in NFS client. Can you please check if you face the issue with the latest kernel?

Comment 4 Niels de Vos 2014-09-30 14:45:46 UTC
This could indeed be an issue with the NFS-client (provide by the Linux kernel). Could you let us know if you can still reproduce this problem on more recent kernel versions?

If you can reproduce this, can you let us know the exact steps how to do so? One of the most important things would be the number of files in the directory. capturing a tcpdump on the NFS-client (with "-s 0") and matching logs should help in analysing this behaviour too.

Comment 5 Justin Albstmeijer 2014-10-20 13:45:27 UTC
I have no test setup anymore to reproduce these tests.
For me the problem was solved by using a different way of moving the ip between the NFS servers like described on 2013-09-12 04:54:19

Comment 6 Niels de Vos 2014-10-20 14:07:42 UTC
Okay, thanks. I'll close this out for now. If this issue happens to return again, please open a new bug.


Note You need to log in before you can comment on or make changes to this bug.