Description of problem: ====================== As part of bug verification of bz#1278284, I was trying to do a add brick on an ec volume followed by a replace brick. On doing that I observed " Too many levels of symbolic links" on rhel6.7 client When We do a add-brick to a pure ec volume, and then follow that with a new dir creates and then file creates under new dir, and then do a replace brick of any of the bricks (old set or new set),we see the above mentioned error on client. I observerd this only on one of the clients(seen on rhel 6.7) Was able to reproduce consistently and didn't observe on a replace brick issued on a pure ec volume. Version-Release number of selected component (if applicable): =================== server side: [root@network bricks]# rpm -qa|grep gluster glusterfs-client-xlators-3.7.9-1.el7rhgs.x86_64 glusterfs-server-3.7.9-1.el7rhgs.x86_64 python-gluster-3.7.5-19.el7rhgs.noarch gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64 vdsm-gluster-4.16.30-1.3.el7rhgs.noarch glusterfs-3.7.9-1.el7rhgs.x86_64 glusterfs-api-3.7.9-1.el7rhgs.x86_64 glusterfs-cli-3.7.9-1.el7rhgs.x86_64 glusterfs-geo-replication-3.7.9-1.el7rhgs.x86_64 glusterfs-debuginfo-3.7.9-1.el7rhgs.x86_64 gluster-nagios-common-0.2.3-1.el7rhgs.noarch glusterfs-libs-3.7.9-1.el7rhgs.x86_64 glusterfs-fuse-3.7.9-1.el7rhgs.x86_64 glusterfs-rdma-3.7.9-1.el7rhgs.x86_64 [root@network bricks]# cat /etc/redhat-* cat: /etc/redhat-access-insights: Is a directory Red Hat Enterprise Linux Server release 7.2 (Maipo) client side: [root@nchilaka-rhel6 67]# cat /etc/redhat-* cat: /etc/redhat-access-insights: Is a directory cat: /etc/redhat-lsb: Is a directory Red Hat Enterprise Linux Server release 6.7 (Santiago) (another client being used: rhel7.2) Steps to Reproduce: ================== 1.create a pure-ec volume (or even distributed)(say 1x(4+2)) 2.mount volume on two clients in my case one was rhel7.2 and other was rhel6.7 both using different nfs servers 3.now create some files and dirs from both mnts 4. Now add bricks in this case the (4+2) ie 6 bricks 5. Now create a new dir and in that dir create another dir for each clients and pump in IOs from both clients into their respective dirs(use dd command) 6. while Ios are in progress replace any of the bricks and it can be seen that on rhel 6.7 client you get too many levels of symblic links Attached are sos reports
by mistake added "3.1.3" to fixed in version.... I should be putting that for internal whiteboard tracking. hence making the req. changes
logs available @ rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/nchilaka/bug.1328451 I have even put snippet of nfs server log using which the vol was mounted on rhel6.7 there is also a log of commands executed in same location
I was able to reproduce the issue(this time client being 7.2) Cluster nodes:10.70.35.191/27/98/64/44/144 nfs clients:6.7:10.70.42.106 (nfs server was 10.70.35.114) 7.2:10.70.35.103 (nfs server was 10.70.35.64) Replaced below brick: gluster v replace-brick pure-ec 10.70.35.27:/rhs/brick4/pure-ec 10.70.35.27:/rhs/brick3/pure-ec commit force 1024000 bytes (1.0 MB) copied, 0.081829 s, 12.5 MB/s 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 0.0848642 s, 12.1 MB/s 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 6.112 s, 168 kB/s dd: failed to open ‘replace72.177’: Too many levels of symbolic links 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 0.111717 s, 9.2 MB/s 1000+0 records in
This issue is happening with plain distribute volume too. Somehow, nfs is propagating wrong errno, ELOOP, instead of ENOTCON which gives this error message.
gluster-NFS server doesn't seem to be sending ELOOP error anytime. It is not a standard NFSv3 error. Most probably this error is generated on the NFS-client. One of the reasons could be when the NFS-client detects a loop during readdir from bug770250 .i.e, if the directory cookie (dir_cookie) of a child directory is same as any of the parent directory's cookie in the client page cache. I am guessing probably when gluster-NFS server is restarted, the cookie generated and sent by the server post reboot could have collided with the value it might had sent before. A pkt trace collected from the client should help us here. Also please check if there is below log message in the kernel logs on the client machine. [root@skoduri n]# dmesg | grep readdir [21615.224218] NFS: directory /n contains a readdir loop.Please contact your server vendor. The file: . has duplicate cookie 1457221137058955403 [root@skoduri n]#
Soumya, This issue is happening with plain distributed volume also. No need of afr or ec volume. Just create a distributed volume , mount it using nfs and do IO . Replace a brick as mentioned above and you can hit the issue. I could have provided all the info you asked but I think if you recreate it, you can check all the things from nfs-ganesha point of you in real time.
Please provide a test case
After doing some experiments, here is what seem to have happened. Client is throwing this error when the server doesn't respond to its lookup request for any file. I ran following test to confirm that - Created 1*2 volume - Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.122.201:/vol1 49152 0 Y 4249 Brick 192.168.122.201:/vol1_r 49153 0 Y 4254 Self-heal Daemon on localhost N/A N/A N N/A [root@skoduri mnt]# for i in `seq 1 100`; do mkdir dir.$i ; for j in `seq 1 100`; do dd if=/dev/urandom of=dir.$i/testfile.$i.$j bs=1k count=$j & done ;wait ; done Now put a breakpoint in the nfs process in lookup code path "nfs_lookup()" (gdb) b nfs_lookup Breakpoint 1 at 0x7f603dae81a0: file nfs-generics.c, line 82. (gdb) c Continuing. [Switching to Thread 0x7f603e48e700 (LWP 7737)] Breakpoint 1, nfs_lookup (nfsx=0x7f6038019610, xl=0x7f603800ebc0, nfu=nfu@entry=0x7f603e48bae0, pathloc=pathloc@entry=0x7f60340b9d10, cbk=cbk@entry=0x7f603daf4030 <nfs3svc_lookup_cbk>, local=local@entry=0x7f60340b9860) at nfs-generics.c:82 warning: Source file is more recent than executable. 82 if ((!nfsx) || (!xl) || (!pathloc) || (!nfu)) (gdb) p *pathloc $1 = {path = 0x7f60381d1d60 "/dir.4/testfile.4.10", name = 0x7f60381d1d67 "testfile.4.10", inode = 0x7f603515c9f4, parent = 0x7f603515bfac, gfid = '\000' <repeats 15 times>, pargfid = "[Z-\317$/LV\262\374\005\214\360Y\246g"} (gdb) Now restart nfs process - [root@dhcp35-197 glusterfs]# gluster v start vol1 force volume start: vol1: success [root@dhcp35-197 glusterfs]# I could see ELOOP error thrown - [skoduri@skoduri ~]$ sudo tail -f /var/log/messages | grep nfs | grep loop Jul 12 15:32:30 skoduri kernel: [149340.627189] VFS: Lookup of 'testfile.4.10' in nfs 0:57 would have caused loop [root@skoduri skoduri]# tshark -r /tmp/nfs-ELOOP.pcap nfs.name == "testfile.4.10" Running as user "root" and group "root". This could be dangerous. 11807 21.644349 192.168.122.1 -> 192.168.122.201 NFS 232 V3 LOOKUP Call, DH: 0xac64f17e/testfile.4.10 12032 41.336656 192.168.122.1 -> 192.168.122.201 NFS 14548 V3 WRITE Call, FH: 0x98e32213 Offset: 0 Len: 18432 FILE_SYNC [root@skoduri skoduri]# Note for some weird reason, the second WRITE call seems to be amalgamation of two different packets (1st one being WRITE on FH 0x98e32213 followed by lookup of 0xac64f17e/testfile.4.10). As can be seen there is not reply sent to the lookup on 0xac64f17e/testfile.4.10 Will attach pkt trace and rpcdebug log messages.
Inputs from Niels - >>> The affected file is testfile.4.10. In Wireshark we can find the LOOKUP procedure with a filter like 'nfs.name == "testfile.4.10"'. This results in the LOOKUP Call, twice... - frame 11807 is a plain LOOKUP - frame 12032 is a WRITE Call, but it also carries the same LOOKUP again The reply on the LOOKUP, found with "rpc.xid == 0xf4ef4e62" is in frame 12049. The contents of the LOOKUP Reply in the obj_attributes have the details from a directory, whereas testfile.4.10 really should be a file, or probably NFS3ERR_NOENT. The directory has the same 'fileid' (used as inode by the NFS-client) as the parent directory of testfile.4.10. This obviously is an incorrect reply. The fileid/inode from the LOOKUP Reply matches the one in the messages file too (12897189535768225383). This looks like a bug in Gluster/NFS to me... Hooray? If we can capture a tcpdump on the NFS-server, we can see the GlusterFS protocol too. This then makes it possible to dissect the filehandle in volume-id + GFID, which we then can match with the replies that the bricks send. <<<
As suggested by Niels above, have taken pkt trace on the server side (copied to the same machine mentioned above "/root/bug1328451/nfs-ELOOP-server.pcap" 568 152.036090 192.168.122.1 -> 192.168.122.201 NFS 228 V3 LOOKUP Call, DH: 0x84b243c9/testfile.1 569 152.036116 192.168.122.201 -> 192.168.122.1 TCP 68 2049→666 [ACK] Seq=1 Ack=161 Win=30080 Len=0 TSval=4495843 TSecr=10310504 570 152.036717 192.168.122.201 -> 192.168.122.201 GlusterFS 340 V330 LOOKUP Call, Filename: (nameless, by GFID) 571 152.037099 192.168.122.201 -> 192.168.122.201 GlusterFS 360 V330 LOOKUP Reply (Call In 570) 572 152.037126 192.168.122.201 -> 192.168.122.201 TCP 68 49145→49155 [ACK] Seq=2045 Ack=1549 Win=50176 Len=0 TSval=4495845 TSecr=4495845 573 152.037331 192.168.122.201 -> 192.168.122.201 GlusterFS 296 V330 LOOKUP Call, Filename: (nameless, by GFID) 574 152.037643 192.168.122.201 -> 192.168.122.201 GlusterFS 360 V330 LOOKUP Reply (Call In 573) 575 152.037845 192.168.122.201 -> 192.168.122.1 NFS 332 V3 LOOKUP Reply (Call In 568), FH: 0x84b243c9 From the pkt trace, looks like when nfs server received LOOKUP of 0x84b243c9/testfile.1, it performed glusterfs nameless LOOKUP of root gfid followed by dir1 gfid. But there is no subsequent LOOKUP done on testfile. That's the reason we could see dir1 attributes in the NFS LOOKUP reply From gdb(/code), Post nameless lookup of dir1 gfid, (gdb) bt #0 nfs3_lookup_resume (carg=0x7fffe07b906c) at nfs3.c:1438 #1 0x00007fffea5358f4 in nfs3_fh_resolve_entry_hard (cs=cs@entry=0x7fffe07b906c) at nfs3-helpers.c:3789 #2 0x00007fffea535b19 in nfs3_fh_resolve_inode_lookup_cbk (frame=<optimized out>, cookie=<optimized out>, this=0x7fffe4019610, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7fffe82a6104, xattr=0x7ffff55581e4, postparent=0x7fffe82a6334) at nfs3-helpers.c:3714 #3 0x00007fffea50d611 in nfs_fop_lookup_cbk (frame=0x7ffff5db8a5c, cookie=0x7fffe4018650, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7fffe82a6104, xattr=0x7ffff55581e4, postparent=0x7fffe82a6334) at nfs-fops.c:430 #4 0x00007ffff7e1a897 in io_stats_lookup_cbk (frame=0x7ffff5db8b30, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=117, inode=0x7fffe8f97128, buf=0x7fffe82a6104, xdata=0x7ffff55581e4, postparent=0x7fffe82a6334) at io-stats.c:2116 #5 0x00007fffea59a710 in dht_discover_complete (this=this@entry=0x7fffe4016a30, discover_frame=discover_frame@entry=0x7ffff5db87e0) at dht-common.c:383 #6 0x00007fffea59b38b in dht_discover_cbk (frame=0x7ffff5db87e0, cookie=<optimized out>, this=0x7fffe4016a30, op_ret=<optimized out>, op_errno=22, inode=0x7fffe8f97128, stbuf=0x7fffdaefda70, xattr=0x7ffff55581e4, postparent=0x7fffdaefdae0) at dht-common.c:512 #7 0x00007fffea6832dc in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ffff5db8490) at client-rpc-fops.c:2937 #8 0x00007ffff7efde90 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fffe405b0f0, pollin=pollin@entry=0x7fffd4003aa0) at rpc-clnt.c:759 #9 0x00007ffff7efe16c in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fffe405b120, event=<optimized out>, data=0x7fffd4003aa0) at rpc-clnt.c:920 #10 0x00007ffff7efa543 in rpc_transport_notify (this=this@entry=0x7fffe406ae30, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fffd4003aa0) at rpc-transport.c:541 #11 0x00007ffff7e431a3 in socket_event_poll_in (this=this@entry=0x7fffe406ae30) at socket.c:2238 #12 0x00007ffff7e45b68 in socket_event_handler (fd=<optimized out>, idx=9, data=0x7fffe406ae30, poll_in=1, poll_out=0, poll_err=0) at socket.c:2351 #13 0x00007ffff7f8388a in event_dispatch_epoll_handler (event=0x7fffdaefdf10, event_pool=0x44ceb0) at event-epoll.c:571 #14 event_dispatch_epoll_worker (data=0x7fffe4041bf0) at event-epoll.c:674 #15 0x00007ffff72a260a in start_thread (arg=0x7fffdaefe700) at pthread_create.c:334 ---Type <return> to continue, or q <return> to quit--- #16 0x00007ffff6b8fbbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) n 1441 nfs_user_t nfu = {0, }; (gdb) 1438 { (gdb) 1441 nfs_user_t nfu = {0, }; (gdb) 1438 { (gdb) 1445 if (!carg) { (gdb) 1441 nfs_user_t nfu = {0, }; (gdb) 1443 struct nfs3_fh newfh = {{0},}; (gdb) 1441 nfs_user_t nfu = {0, }; (gdb) 1443 struct nfs3_fh newfh = {{0},}; (gdb) 1441 nfs_user_t nfu = {0, }; (gdb) 1443 struct nfs3_fh newfh = {{0},}; (gdb) 1445 if (!carg) { (gdb) 1452 nfs3_check_fh_auth_status (cs, stat, _gf_false, nfs3err); (gdb) 1453 nfs3_check_fh_resolve_status (cs, stat, nfs3err); (gdb) p cs->resolventry $3 = 0x7fffe4131a70 "testfile.1" (gdb) p cs->resolvefh $4 = {ident = ":OGL", exportid = "\205\376E\272\361+HI\241\256\060t\005\024\066\242", gfid = "\374K\332D\t\310Ho\263-\263\241%R\260\277", mountid = "\265B\202\327", '\000' <repeats 11 times>, padding = '\000' <repeats 11 times>} (gdb) n 1454 cs->parent = cs->resolvefh; (gdb) 1456 if (cs->hardresolved) { (gdb) 1458 nfs3_fh_build_child_fh (&cs->parent, &cs->stbuf, &newfh); (gdb) 1440 int ret = -EFAULT; (gdb) p cs->hardresolved $5 = 1 (gdb) n 1457 stat = NFS3_OK; (gdb) 1458 nfs3_fh_build_child_fh (&cs->parent, &cs->stbuf, &newfh); (gdb) 1459 goto nfs3err; (gdb) 1470 nfs3_log_common_res (rpcsvc_request_xid (cs->req), (gdb) 1473 nfs3_lookup_reply (cs->req, stat, &newfh, &cs->stbuf, (gdb) 1475 nfs3_call_state_wipe (cs); (gdb) In int nfs3_fh_resolve_entry_hard (nfs3_call_state_t *cs) { int ret = -EFAULT; nfs_user_t nfu = {0, }; if (!cs) return ret; nfs_loc_wipe (&cs->resolvedloc); nfs_user_root_create (&nfu); gf_msg_trace (GF_NFS3, 0, "FH hard resolution: gfid: %s " ", entry: %s", uuid_utoa (cs->resolvefh.gfid), cs->resolventry); ret = nfs_entry_loc_fill (cs->nfsx, cs->vol->itable, cs->resolvefh.gfid, cs->resolventry, &cs->resolvedloc, NFS_RESOLVE_CREATE); >>>>> Since entry is not present in the inode table, ret is '-2'. if (ret == -2) { gf_msg_trace (GF_NFS3, 0, "Entry needs lookup: %s", cs->resolvedloc.path); /* If the NFS op is lookup, let the resume callback * handle the sending of the lookup fop. Similarly, * if the NFS op is create, let the create call * go ahead in the resume callback so that an EEXIST gets * handled at posix without an extra fop at this point. */ if (nfs3_lookup_op (cs) || (nfs3_create_op (cs) && !nfs3_create_exclusive_op (cs))) { cs->lookuptype = GF_NFS3_FRESH; cs->resolve_ret = 0; nfs3_call_resume (cs); >>>>> Since it is LOOKUP fop, we call resume_fn - nfs3_lookup_resume() which responds with the stat of pargfid ('dir1' in this case). } else { cs->hardresolved = 1; nfs_lookup (cs->nfsx, cs->vol, &nfu, &cs->resolvedloc, nfs3_fh_resolve_entry_lookup_cbk, cs); } ret = 0; The fix seems to be either fall through above else case for LOOKUP fop or check for cs->resolveentry and resolve it in "nfs3_lookup_resume" as well.
Upstream mainline : http://review.gluster.org/14911 Upstream 3.8 : http://review.gluster.org/14941 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
QATP: ====== QATP: === TC#1(from my understanding of http://review.gluster.org/#/c/14941/) 1)create a distributed volume with 2 bricks and mount volume on two gnfs clients say c1 and c2 2)created a dir say dir1 3)now create files f{1..10}, and in backend check which file hashes to which brick 4)delete all files 5) now from backend create a file f1 on the right brick(which it hashes to as you got the info from step3) 6)Now from the clients c1 and c2 do a "ls" expected: ls must not return anything 7)now do on c1 "ls f1" expected:ls must not tell file is not there or any other error instead it should display f1 and must be visible on all clients post this op TC#2: same as step1,2,3,4 from tc#1 5)now from backend create a file f1 on the brick which it DOESNT hash to 6) now from client c1 and c2 do "ls f1" expected: lookup must pass backend brick which the file was supposed to be hashed to must have a link to file created 7)now delete f1 from client 8)then do a ls f1 expected: no eio error must be seen TC#3: ==== 1)mount a distributed volume on gnfs clients c1 and c2 2)now from c1 create a file f1 under dirs a/b/c/d 3)now restart gnfs servers using vol force start 4)now from c2 of a ls a/b/c/d/f1 expected:no too many symbolic links errors must be seen tc#4(the case on which the bz is raised): ===== 1.create a pure-ec volume (or even distributed)(say 1x(4+2)) 2.mount volume on two clients in my case one was rhel7.2 and other was rhel6.7 both using different nfs servers 3.now create some files and dirs from both mnts 4. Now add bricks in this case the (4+2) ie 6 bricks 5. Now create a new dir and in that dir create another dir for each clients and pump in IOs from both clients into their respective dirs(use dd command) 6. while Ios are in progress replace any of the bricks Expected: no symbolic link errors must be seen TC#5: Just create a distributed volume , mount it using nfs and do IO . Replace a brick while ios are happening expected: no symbolic link errors must be seen
Validation of QATP: ================= TC#1==>passed (however doing just an ls doesnt display the file unlike fuse mount--> will be raising a bug for the same) TC#2===>fail as doing a rm -rf of that file then displays input/output error on c2 will raise a bug TC#3===>pass tc#4===>pass tc#5==>pass as 3,4,5 passed, which are mainly the steps for the bug. hence moving to verified [root@dhcp35-37 ~]# rpm -qa|grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-server-3.8.4-3.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-3.el7rhgs.x86_64 glusterfs-api-3.8.4-3.el7rhgs.x86_64 glusterfs-libs-3.8.4-3.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64 nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64 glusterfs-cli-3.8.4-3.el7rhgs.x86_64 python-gluster-3.8.4-3.el7rhgs.noarch glusterfs-devel-3.8.4-3.el7rhgs.x86_64 glusterfs-events-3.8.4-3.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-3.el7rhgs.x86_64 glusterfs-fuse-3.8.4-3.el7rhgs.x86_64 glusterfs-api-devel-3.8.4-3.el7rhgs.x86_64 glusterfs-rdma-3.8.4-3.el7rhgs.x86_64 gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64 glusterfs-3.8.4-3.el7rhgs.x86_64
1396355 stat of a deleted file throwing Input/output error 1396354 negative case: lookup using ls is not detecting a file created from backend(unlike fuse mount) 1396113 many file related inconsistencies with gnfs (probably due to caching and invalidating cache) raised the above bugs
for EC testcase in polarion (tc#4) refer polarion tc id RHG3-11788 - BZ#1328451 - after adding bricks and then issuing a replace brick must pass without any IO errors like bserving " Too many levels of symbolic links" after adding bricks and then issuing a replace brick
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html