Bug 1357257 - observing " Too many levels of symbolic links" after adding bricks and then issuing a replace brick
Summary: observing " Too many levels of symbolic links" after adding bricks and then i...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.8.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On: 1356068
Blocks: glusterfs-3.8.2
TreeView+ depends on / blocked
 
Reported: 2016-07-17 09:57 UTC by Niels de Vos
Modified: 2016-08-12 09:47 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.8.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1356068
Environment:
Last Closed: 2016-08-12 09:47:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Niels de Vos 2016-07-17 09:57:18 UTC
+++ This bug was initially created as a clone of Bug #1356068 +++

Description of problem:
======================
As part of bug verification of bz#1278284, I was trying to do a add brick on an ec volume followed by a replace brick. On doing that I observed " Too many levels of symbolic links" on rhel6.7 client

When We do a add-brick to a pure ec volume, and then follow that with a new dir creates and then file creates under new dir, and then do a replace brick of any of the bricks (old set or new set),we see the above mentioned error on client.

I observerd this only on one of the clients(seen on rhel 6.7)

Was able to reproduce consistently and didn't observe on a replace brick issued on a pure ec volume.

Version-Release number of selected component (if applicable):
all?

Steps to Reproduce:
==================
1.create a pure-ec volume (or even distributed)(say 1x(4+2))
2.mount volume on two clients in my case one was rhel7.2 and other was rhel6.7 both using different nfs servers
3.now create some files and dirs from both mnts
4. Now add bricks in this case the (4+2) ie 6 bricks
5. Now create a new dir and in that dir create another dir for each clients and pump in IOs from both clients into their respective dirs(use dd command)
6. while Ios are in progress replace any of the bricks and it can be seen that on rhel 6.7 client you get too many levels of symblic links



--- Additional comment from Ashish Pandey on 2016-05-20 05:48:02 EDT ---


I could recreate this issue with plain EC volume AND plain distributed volume.

I placed  different gf_msg in posix_create and _open.
I can see that the moment replace brick happens, connection to *ALL* the bricks restart. So at that point of time NO brick was actually connected to client.

I mounted the volume using nfs and started creating files in a loop file-1 to file-100.
The moment replace brick happens, the file which should be created at that moment in for loop will NOT be created. It is this file for which we get "Too many symbolic ..."
error message.
In brick logs we can see that the file-75 was created. "disconnect happens on all the brick".  No log message for file-76. and then  file-77 created and so on....

This is the logs from brick which was NOT replaced.

2016-05-20 07:29:27.942286] E [MSGID: 113069] [posix.c:3113osix_writev] 0-plain-posix: ************writev on 15  [Invalid argument]
[2016-05-20 07:29:28.019197] E [MSGID: 113069] [posix.c:2604osix_create] 0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-74 fd = 15  [No such file or directory]
[2016-05-20 07:29:28.205203] E [MSGID: 113069] [posix.c:3113osix_writev] 0-plain-posix: ************writev on 15  [Invalid argument]
[2016-05-20 07:29:28.685664] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-plain-server: disconnecting connection from rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
[2016-05-20 07:29:28.685827] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-plain-server: Shutting down connection rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
[2016-05-20 07:29:29.693590] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2016-05-20 07:29:29.748094] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2016-05-20 07:29:30.592393] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2016-05-20 07:29:30.592638] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2016-05-20 07:29:30.663883] I [login.c:81:gf_auth] 0-auth/login: allowed user names: d02e404c-4feb-48fb-990d-619f6685d477
[2016-05-20 07:29:30.663934] I [MSGID: 115029] [server-handshake.c:612:server_setvolume] 0-plain-server: accepted client from rhs3-16396-2016/05/20-07:29:29:744958-plain-client-1-0-0 (version: 3.7.5)
[2016-05-20 07:29:31.856194] I [dict.c:473ict_get] (-->/usr/local/lib/libglusterfs.so.0(default_getxattr_cbk+0xc2) [0x7f80d7f31362] -->/usr/local/lib/glusterfs/3.7.5/xlator/features/marker.so(marker_getxattr_cbk+0xde) [0x7f80cc4adebe] -->/usr/local/lib/libglusterfs.so.0(dict_get+0x63) [0x7f80d7f1ad43] ) 0-dict: !this || key=() [Invalid argument]
[2016-05-20 07:29:32.693587] E [MSGID: 113069] [posix.c:2604osix_create] 0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-81 fd = 15  [No such file or directory]
[2016-05-20 07:29:32.868530] E [MSGID: 113069] [posix.c:3113osix_writev] 0-

  --------------------------------------------
512+0 records out
524288 bytes (524 kB) copied, 0.601281 s, 872 kB/s
dd: closing output file `TEST/118/file1-75': Stale file handle
dd: opening `TEST/118/file1-76': Too many levels of symbolic links
512+0 records in
512+0 records out
----------------------------------------------


I also tried it with PLAIN distributed volume and getting the same error message on mount point and on brick logs
In fact the above logs are for plain distributed volume.


I think this is happening because glusterd_svcs_stop function is getting called by glusterd_op_replace_brick function.

glusterd_svcs_stop will stop nfs service along with shd, quotad and others.

This causes all the connection to the bricks to be disconnected which will be connected again.
But for this duration bricks will be unreachable. 

Do we need to stop and start the nfs for replace bricks?

--- Additional comment from Atin Mukherjee on 2016-05-24 05:43:45 EDT ---

(In reply to Ashish Pandey from comment #9)
> I could recreate this issue with plain EC volume AND plain distributed
> volume.
> 
> I placed  different gf_msg in posix_create and _open.
> I can see that the moment replace brick happens, connection to *ALL* the
> bricks restart. So at that point of time NO brick was actually connected to
> client.
> 
> I mounted the volume using nfs and started creating files in a loop file-1
> to file-100.
> The moment replace brick happens, the file which should be created at that
> moment in for loop will NOT be created. It is this file for which we get
> "Too many symbolic ..."
> error message.
> In brick logs we can see that the file-75 was created. "disconnect happens
> on all the brick".  No log message for file-76. and then  file-77 created
> and so on....
> 
> This is the logs from brick which was NOT replaced.
> 
> 2016-05-20 07:29:27.942286] E [MSGID: 113069] [posix.c:3113osix_writev]
> 0-plain-posix: ************writev on 15  [Invalid argument]
> [2016-05-20 07:29:28.019197] E [MSGID: 113069] [posix.c:2604osix_create]
> 0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-74 fd =
> 15  [No such file or directory]
> [2016-05-20 07:29:28.205203] E [MSGID: 113069] [posix.c:3113osix_writev]
> 0-plain-posix: ************writev on 15  [Invalid argument]
> [2016-05-20 07:29:28.685664] I [MSGID: 115036]
> [server.c:552:server_rpc_notify] 0-plain-server: disconnecting connection
> from rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
> [2016-05-20 07:29:28.685827] I [MSGID: 101055]
> [client_t.c:420:gf_client_unref] 0-plain-server: Shutting down connection
> rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
> [2016-05-20 07:29:29.693590] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
> Volume file changed
> [2016-05-20 07:29:29.748094] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
> Volume file changed
> [2016-05-20 07:29:30.592393] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-05-20 07:29:30.592638] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-05-20 07:29:30.663883] I [login.c:81:gf_auth] 0-auth/login: allowed
> user names: d02e404c-4feb-48fb-990d-619f6685d477
> [2016-05-20 07:29:30.663934] I [MSGID: 115029]
> [server-handshake.c:612:server_setvolume] 0-plain-server: accepted client
> from rhs3-16396-2016/05/20-07:29:29:744958-plain-client-1-0-0 (version:
> 3.7.5)
> [2016-05-20 07:29:31.856194] I [dict.c:473ict_get]
> (-->/usr/local/lib/libglusterfs.so.0(default_getxattr_cbk+0xc2)
> [0x7f80d7f31362]
> -->/usr/local/lib/glusterfs/3.7.5/xlator/features/marker.
> so(marker_getxattr_cbk+0xde) [0x7f80cc4adebe]
> -->/usr/local/lib/libglusterfs.so.0(dict_get+0x63) [0x7f80d7f1ad43] )
> 0-dict: !this || key=() [Invalid argument]
> [2016-05-20 07:29:32.693587] E [MSGID: 113069] [posix.c:2604osix_create]
> 0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-81 fd =
> 15  [No such file or directory]
> [2016-05-20 07:29:32.868530] E [MSGID: 113069] [posix.c:3113osix_writev] 0-
> 
>   --------------------------------------------
> 512+0 records out
> 524288 bytes (524 kB) copied, 0.601281 s, 872 kB/s
> dd: closing output file `TEST/118/file1-75': Stale file handle
> dd: opening `TEST/118/file1-76': Too many levels of symbolic links
> 512+0 records in
> 512+0 records out
> ----------------------------------------------
> 
> 
> I also tried it with PLAIN distributed volume and getting the same error
> message on mount point and on brick logs
> In fact the above logs are for plain distributed volume.
> 
> 
> I think this is happening because glusterd_svcs_stop function is getting
> called by glusterd_op_replace_brick function.
> 
> glusterd_svcs_stop will stop nfs service along with shd, quotad and others.
We'd need to restart all the daemons in that case and that's why you see that glusterd issues a glusterd_svcs_stop () before performing the replace brick functionality and then it invokes glusterd_svcs_manager () post the replace brick operation such that all the clients get notified about the topology change.
> 
> This causes all the connection to the bricks to be disconnected which will
> be connected again.
> But for this duration bricks will be unreachable. 
> 
> Do we need to stop and start the nfs for replace bricks?

--- Additional comment from Ashish Pandey on 2016-05-24 05:59:35 EDT ---

This issue is happening with plain distribute volume too.
Somehow, nfs is propagating wrong errno, ELOOP, instead of ENOTCON which gives 
this error message.

--- Additional comment from Soumya Koduri on 2016-06-13 07:16:30 EDT ---

gluster-NFS server doesn't seem to be sending ELOOP error anytime. It is not a standard NFSv3 error. Most probably this error is generated on the NFS-client. One of the reasons could be when the NFS-client detects a loop during readdir from bug770250 .i.e, if the directory cookie (dir_cookie) of a child directory is same as any of the parent directory's cookie in the client page cache. 

I am guessing probably when gluster-NFS server is restarted, the cookie generated and sent by the server post reboot could have collided with the value it might had sent before. A pkt trace collected from the client should help us here.

Also please check if there is below log message in the kernel logs on the client machine.

[root@skoduri n]# dmesg | grep readdir
[21615.224218] NFS: directory /n contains a readdir loop.Please contact your server vendor.  The file: . has duplicate cookie 1457221137058955403
[root@skoduri n]#

--- Additional comment from Ashish Pandey on 2016-06-15 05:32:05 EDT ---

Soumya,

This issue is happening with plain distributed volume also. No need of afr or ec volume.
Just create a distributed volume , mount it using nfs and do IO . Replace a brick as mentioned above and you can hit the issue.

I could have provided all the info you asked but I think if you recreate it, you can check all the things from nfs-ganesha point of you in real time.

--- Additional comment from Kaleb KEITHLEY on 2016-06-15 09:50:24 EDT ---

Please provide a test case

--- Additional comment from Soumya Koduri on 2016-07-12 06:17:20 EDT ---

After doing some experiments, here is what seem to have happened. Client is throwing this error when the server doesn't respond to its lookup request for any file. I ran following test to confirm that -

Created 1*2 volume -

Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.122.201:/vol1                 49152     0          Y       4249 
Brick 192.168.122.201:/vol1_r               49153     0          Y       4254 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
 


[root@skoduri mnt]# for i in `seq 1 100`; do mkdir dir.$i ; for j in `seq 1 100`; do dd if=/dev/urandom of=dir.$i/testfile.$i.$j bs=1k count=$j & done ;wait ; done


Now put a breakpoint in the nfs process in lookup code path "nfs_lookup()"

(gdb) b nfs_lookup
Breakpoint 1 at 0x7f603dae81a0: file nfs-generics.c, line 82.
(gdb) c
Continuing.
[Switching to Thread 0x7f603e48e700 (LWP 7737)]

Breakpoint 1, nfs_lookup (nfsx=0x7f6038019610, xl=0x7f603800ebc0, nfu=nfu@entry=0x7f603e48bae0, 
    pathloc=pathloc@entry=0x7f60340b9d10, cbk=cbk@entry=0x7f603daf4030 <nfs3svc_lookup_cbk>, 
    local=local@entry=0x7f60340b9860) at nfs-generics.c:82
warning: Source file is more recent than executable.
82	        if ((!nfsx) || (!xl) || (!pathloc) || (!nfu))
(gdb) p *pathloc
$1 = {path = 0x7f60381d1d60 "/dir.4/testfile.4.10", name = 0x7f60381d1d67 "testfile.4.10", inode = 0x7f603515c9f4, 
  parent = 0x7f603515bfac, gfid = '\000' <repeats 15 times>, pargfid = "[Z-\317$/LV\262\374\005\214\360Y\246g"}
(gdb)   

Now restart nfs process -

[root@dhcp35-197 glusterfs]# gluster v start vol1 force
volume start: vol1: success
[root@dhcp35-197 glusterfs]# 


I could see ELOOP error thrown -

[skoduri@skoduri ~]$ sudo tail -f /var/log/messages | grep nfs | grep loop

Jul 12 15:32:30 skoduri kernel: [149340.627189] VFS: Lookup of 'testfile.4.10' in nfs 0:57 would have caused loop


[root@skoduri skoduri]# tshark -r /tmp/nfs-ELOOP.pcap nfs.name == "testfile.4.10"
Running as user "root" and group "root". This could be dangerous.
11807  21.644349 192.168.122.1 -> 192.168.122.201 NFS 232 V3 LOOKUP Call, DH: 0xac64f17e/testfile.4.10
12032  41.336656 192.168.122.1 -> 192.168.122.201 NFS 14548 V3 WRITE Call, FH: 0x98e32213 Offset: 0 Len: 18432 FILE_SYNC
[root@skoduri skoduri]# 

Note for some weird reason, the second WRITE call seems to be amalgamation of two different packets (1st one being WRITE on FH 0x98e32213 followed by lookup of 0xac64f17e/testfile.4.10).

As can be seen there is not reply sent to the lookup on 0xac64f17e/testfile.4.10

Will attach pkt trace and rpcdebug log messages.


--- Additional comment from Soumya Koduri on 2016-07-13 05:09:09 EDT ---

Inputs from Niels - 

>>>
The affected file is testfile.4.10. In Wireshark we can find the LOOKUP
procedure with a filter like 'nfs.name == "testfile.4.10"'. This results
in the LOOKUP Call, twice...

- frame 11807 is a plain LOOKUP
- frame 12032 is a WRITE Call, but it also carries the same LOOKUP again

The reply on the LOOKUP, found with "rpc.xid == 0xf4ef4e62" is in frame
12049.

The contents of the LOOKUP Reply in the obj_attributes have the details
from a directory, whereas testfile.4.10 really should be a file, or
probably NFS3ERR_NOENT. The directory has the same 'fileid' (used as
inode by the NFS-client) as the parent directory of testfile.4.10. This
obviously is an incorrect reply.

The fileid/inode from the LOOKUP Reply matches the one in the messages
file too (12897189535768225383).

This looks like a bug in Gluster/NFS to me... Hooray?

If we can capture a tcpdump on the NFS-server, we can see the GlusterFS
protocol too. This then makes it possible to dissect the filehandle in
volume-id + GFID, which we then can match with the replies that the
bricks send.
<<<

--- Additional comment from Soumya Koduri on 2016-07-13 05:37:14 EDT ---

As suggested by Niels above, have taken pkt trace on the server side (copied to the same machine mentioned above "/root/bug1328451/nfs-ELOOP-server.pcap"

568 152.036090 192.168.122.1 -> 192.168.122.201 NFS 228 V3 LOOKUP Call, DH: 0x84b243c9/testfile.1
569 152.036116 192.168.122.201 -> 192.168.122.1 TCP 68 2049→666 [ACK] Seq=1 Ack=161 Win=30080 Len=0 TSval=4495843 TSecr=10310504
570 152.036717 192.168.122.201 -> 192.168.122.201 GlusterFS 340 V330 LOOKUP Call, Filename: (nameless, by GFID)
571 152.037099 192.168.122.201 -> 192.168.122.201 GlusterFS 360 V330 LOOKUP Reply (Call In 570)
572 152.037126 192.168.122.201 -> 192.168.122.201 TCP 68 49145→49155 [ACK] Seq=2045 Ack=1549 Win=50176 Len=0 TSval=4495845 TSecr=4495845
573 152.037331 192.168.122.201 -> 192.168.122.201 GlusterFS 296 V330 LOOKUP Call, Filename: (nameless, by GFID)
574 152.037643 192.168.122.201 -> 192.168.122.201 GlusterFS 360 V330 LOOKUP Reply (Call In 573)
575 152.037845 192.168.122.201 -> 192.168.122.1 NFS 332 V3 LOOKUP Reply (Call In 568), FH: 0x84b243c9


From the pkt trace, looks like when nfs server received LOOKUP of 0x84b243c9/testfile.1, it performed glusterfs nameless LOOKUP of root gfid followed by dir1 gfid. But there is no subsequent LOOKUP done on testfile. That's the reason we could see dir1 attributes in the NFS LOOKUP reply

From gdb(/code), 

Post nameless lookup of dir1 gfid, 

(gdb) bt
#0  nfs3_lookup_resume (carg=0x7fffe07b906c) at nfs3.c:1438
#1  0x00007fffea5358f4 in nfs3_fh_resolve_entry_hard (cs=cs@entry=0x7fffe07b906c) at nfs3-helpers.c:3789
#2  0x00007fffea535b19 in nfs3_fh_resolve_inode_lookup_cbk (frame=<optimized out>, cookie=<optimized out>, 
    this=0x7fffe4019610, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, 
    buf=0x7fffe82a6104, xattr=0x7ffff55581e4, postparent=0x7fffe82a6334) at nfs3-helpers.c:3714
#3  0x00007fffea50d611 in nfs_fop_lookup_cbk (frame=0x7ffff5db8a5c, cookie=0x7fffe4018650, this=<optimized out>, 
    op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7fffe82a6104, 
    xattr=0x7ffff55581e4, postparent=0x7fffe82a6334) at nfs-fops.c:430
#4  0x00007ffff7e1a897 in io_stats_lookup_cbk (frame=0x7ffff5db8b30, cookie=<optimized out>, this=<optimized out>, 
    op_ret=0, op_errno=117, inode=0x7fffe8f97128, buf=0x7fffe82a6104, xdata=0x7ffff55581e4, 
    postparent=0x7fffe82a6334) at io-stats.c:2116
#5  0x00007fffea59a710 in dht_discover_complete (this=this@entry=0x7fffe4016a30, 
    discover_frame=discover_frame@entry=0x7ffff5db87e0) at dht-common.c:383
#6  0x00007fffea59b38b in dht_discover_cbk (frame=0x7ffff5db87e0, cookie=<optimized out>, this=0x7fffe4016a30, 
    op_ret=<optimized out>, op_errno=22, inode=0x7fffe8f97128, stbuf=0x7fffdaefda70, xattr=0x7ffff55581e4, 
    postparent=0x7fffdaefdae0) at dht-common.c:512
#7  0x00007fffea6832dc in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, 
    myframe=0x7ffff5db8490) at client-rpc-fops.c:2937
#8  0x00007ffff7efde90 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fffe405b0f0, pollin=pollin@entry=0x7fffd4003aa0)
    at rpc-clnt.c:759
#9  0x00007ffff7efe16c in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fffe405b120, event=<optimized out>, 
    data=0x7fffd4003aa0) at rpc-clnt.c:920
#10 0x00007ffff7efa543 in rpc_transport_notify (this=this@entry=0x7fffe406ae30, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fffd4003aa0) at rpc-transport.c:541
#11 0x00007ffff7e431a3 in socket_event_poll_in (this=this@entry=0x7fffe406ae30) at socket.c:2238
#12 0x00007ffff7e45b68 in socket_event_handler (fd=<optimized out>, idx=9, data=0x7fffe406ae30, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2351
#13 0x00007ffff7f8388a in event_dispatch_epoll_handler (event=0x7fffdaefdf10, event_pool=0x44ceb0)
    at event-epoll.c:571
#14 event_dispatch_epoll_worker (data=0x7fffe4041bf0) at event-epoll.c:674
#15 0x00007ffff72a260a in start_thread (arg=0x7fffdaefe700) at pthread_create.c:334
---Type <return> to continue, or q <return> to quit--- 
#16 0x00007ffff6b8fbbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) n
1441	        nfs_user_t                      nfu = {0, };
(gdb) 
1438	{
(gdb) 
1441	        nfs_user_t                      nfu = {0, };
(gdb) 
1438	{
(gdb) 
1445	        if (!carg) {
(gdb) 
1441	        nfs_user_t                      nfu = {0, };
(gdb) 
1443	        struct nfs3_fh                  newfh = {{0},};
(gdb) 
1441	        nfs_user_t                      nfu = {0, };
(gdb) 
1443	        struct nfs3_fh                  newfh = {{0},};
(gdb) 
1441	        nfs_user_t                      nfu = {0, };
(gdb) 
1443	        struct nfs3_fh                  newfh = {{0},};
(gdb) 
1445	        if (!carg) {
(gdb) 
1452	        nfs3_check_fh_auth_status (cs, stat, _gf_false, nfs3err);
(gdb) 
1453	        nfs3_check_fh_resolve_status (cs, stat, nfs3err);
(gdb) p cs->resolventry
$3 = 0x7fffe4131a70 "testfile.1"
(gdb) p cs->resolvefh
$4 = {ident = ":OGL", exportid = "\205\376E\272\361+HI\241\256\060t\005\024\066\242", 
  gfid = "\374K\332D\t\310Ho\263-\263\241%R\260\277", mountid = "\265B\202\327", '\000' <repeats 11 times>, 
  padding = '\000' <repeats 11 times>}
(gdb) n
1454	        cs->parent = cs->resolvefh;
(gdb) 
1456		if (cs->hardresolved) {
(gdb) 
1458			nfs3_fh_build_child_fh (&cs->parent, &cs->stbuf, &newfh);
(gdb) 
1440	        int                             ret = -EFAULT;
(gdb) p cs->hardresolved
$5 = 1
(gdb) n
1457			stat = NFS3_OK;
(gdb) 
1458			nfs3_fh_build_child_fh (&cs->parent, &cs->stbuf, &newfh);
(gdb) 
1459			goto nfs3err;
(gdb) 
1470	                nfs3_log_common_res (rpcsvc_request_xid (cs->req),
(gdb) 
1473	                nfs3_lookup_reply (cs->req, stat, &newfh, &cs->stbuf,
(gdb) 
1475	                nfs3_call_state_wipe (cs);
(gdb) 



In 

int
nfs3_fh_resolve_entry_hard (nfs3_call_state_t *cs)
{
        int             ret = -EFAULT;
        nfs_user_t      nfu = {0, };

        if (!cs)
                return ret;

        nfs_loc_wipe (&cs->resolvedloc);
        nfs_user_root_create (&nfu);
        gf_msg_trace (GF_NFS3, 0, "FH hard resolution: gfid: %s "
                      ", entry: %s", uuid_utoa (cs->resolvefh.gfid),
                      cs->resolventry);

        ret = nfs_entry_loc_fill (cs->nfsx, cs->vol->itable, cs->resolvefh.gfid,
                                  cs->resolventry, &cs->resolvedloc,
                                  NFS_RESOLVE_CREATE);

>>>>> Since entry is not present in the inode table, ret is '-2'.

        if (ret == -2) {
                gf_msg_trace (GF_NFS3, 0, "Entry needs lookup: %s",
                              cs->resolvedloc.path);
                /* If the NFS op is lookup, let the resume callback
                 * handle the sending of the lookup fop. Similarly,
                 * if the NFS op is create, let the create call
                 * go ahead in the resume callback so that an EEXIST gets
                 * handled at posix without an extra fop at this point.
                 */
                if (nfs3_lookup_op (cs) ||
                    (nfs3_create_op (cs) && !nfs3_create_exclusive_op (cs))) {
                        cs->lookuptype = GF_NFS3_FRESH;
                        cs->resolve_ret = 0;
                        nfs3_call_resume (cs);

>>>>> Since it is LOOKUP fop, we call resume_fn - nfs3_lookup_resume() which responds with the stat of pargfid ('dir1' in this case). 

                } else {
                        cs->hardresolved = 1;
                        nfs_lookup (cs->nfsx, cs->vol, &nfu, &cs->resolvedloc,
                                    nfs3_fh_resolve_entry_lookup_cbk, cs);
                }
                ret = 0;


The fix seems to be either fall through above else case for LOOKUP fop or check for cs->resolveentry and resolve it in "nfs3_lookup_resume" as well.

--- Additional comment from Soumya Koduri on 2016-07-13 13:05:53 CEST ---

This seems to be a bug in the nameless lookup resolution of the Gluster/NFS server code-path (typically happens post restart). NFS xlator seems to be bailing out post LOOKUP of parent directory FH (without resolving the child file entry) resulting in sending stat of parent dir. 

NFS-client on receiving the attributes of an entry same as it parent, threw an error "Too many levels of symbolic links" to the application.

--- Additional comment from Vijay Bellur on 2016-07-17 09:58:54 CEST ---

COMMIT: http://review.gluster.org/14911 committed in master by Niels de Vos (ndevos) 
------
commit 3c485cb896837c8e362fd0b094325002ce806ac4
Author: Soumya Koduri <skoduri>
Date:   Wed Jul 13 16:24:31 2016 +0530

    nfs: Reset cs->resolvedhard while resolving an entry
    
    If an entry is not found in the inode table, nfs xlator should be
    resolving it by sending an explicit lookup to the brick process.
    But currently its broken in case of NFS3_LOOKUP fop where in the server
    bails out early resulting in sending pargfid attributes to the client.
    To fix the same reset 'cs->resolvedhard' so that an explicit lookup
    is done for the entry in the resume_fn "nfs3_lookup_resume()".
    
    Change-Id: I999f8bca7ad008526c174d13f69886dc809d9552
    Signed-off-by: Soumya Koduri <skoduri>
    BUG: 1356068
    Reviewed-on: http://review.gluster.org/14911
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>

Comment 1 Vijay Bellur 2016-07-18 10:11:26 UTC
REVIEW: http://review.gluster.org/14941 (nfs: Reset cs->resolvedhard while resolving an entry) posted (#2) for review on release-3.8 by soumya k (skoduri)

Comment 2 Vijay Bellur 2016-07-18 20:56:23 UTC
COMMIT: http://review.gluster.org/14941 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 951aa4d0837007249046b8ca8362d0d2024cb252
Author: Soumya Koduri <skoduri>
Date:   Wed Jul 13 16:24:31 2016 +0530

    nfs: Reset cs->resolvedhard while resolving an entry
    
    If an entry is not found in the inode table, nfs xlator should be
    resolving it by sending an explicit lookup to the brick process.
    But currently its broken in case of NFS3_LOOKUP fop where in the server
    bails out early resulting in sending pargfid attributes to the client.
    To fix the same reset 'cs->resolvedhard' so that an explicit lookup
    is done for the entry in the resume_fn "nfs3_lookup_resume()".
    
    This is backport of the below mainline patch -
    http://review.gluster.org/14911
    
    BUG: 1357257
    
    >Change-Id: I999f8bca7ad008526c174d13f69886dc809d9552
    >Signed-off-by: Soumya Koduri <skoduri>
    >BUG: 1356068
    >Reviewed-on: http://review.gluster.org/14911
    >CentOS-regression: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >Smoke: Gluster Build System <jenkins.org>
    >Reviewed-by: Niels de Vos <ndevos>
    >(cherry picked from commit 3c485cb896837c8e362fd0b094325002ce806ac4)
    
    Change-Id: Ifeb21887810115369ca2ae6c8c3d3619d4e6c066
    Reviewed-on: http://review.gluster.org/14941
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Tested-by: soumya k <skoduri>
    Reviewed-by: jiademing.dd <iesool>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Niels de Vos 2016-08-12 09:47:30 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.2, please open a new bug report.

glusterfs-3.8.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/announce/2016-August/000058.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.