Description of problem: 2x2 distributed replicate volume. 1 fuse and 1 nfs client. fuse client was running sanity script and nfs client was running rdd, fs-perf-test one after another in a loop. Brought a brick down, after some time brought it up. Volume set operations were going parallely. glustershd crashed with the below backtrace. Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /etc/'. Program terminated with signal 11, Segmentation fault. #0 0x00007f5aacc291a3 in __is_dentry_cyclic (dentry=0x0) at ../../../libglusterfs/src/inode.c:217 217 dentry->inode); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 (gdb) bt #0 0x00007f5aacc291a3 in __is_dentry_cyclic (dentry=0x0) at ../../../libglusterfs/src/inode.c:217 #1 0x00007f5aacc2a7ce in __inode_link (inode=0x7f5a84964c6c, parent=0x9c189c, name=0x0, iatt=0x7f5a8c2014f0) at ../../../libglusterfs/src/inode.c:816 #2 0x00007f5aacc2a8cc in inode_link (inode=0x7f5a84964c6c, parent=0x9c189c, name=0x0, iatt=0x7f5a8c2014f0) at ../../../libglusterfs/src/inode.c:847 #3 0x00007f5aa866a5de in _process_entries (this=0x98ded0, parentloc=0x7f5a8c201730, entries=0x7f5a8c201620, offset=0x7f5a8c2016d0, crawl_data=0x7f5a8c0013a0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:754 #4 0x00007f5aa866a93b in _crawl_directory (fd=0x9c759c, loc=0x7f5a8c201730, crawl_data=0x7f5a8c0013a0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:814 #5 0x00007f5aa866aea0 in afr_dir_crawl (data=0x7f5a8c0013a0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:943 #6 0x00007f5aa866b16a in afr_dir_exclusive_crawl (data=0x7f5a8c0013a0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:996 #7 0x00007f5aacc54753 in synctask_wrap (old_task=0x7f5a8c001400) at ../../../libglusterfs/src/syncop.c:144 #8 0x00000034c2243690 in ?? () from /lib64/libc.so.6 #9 0x0000000000000000 in ?? () (gdb) f 0 #0 0x00007f5aacc291a3 in __is_dentry_cyclic (dentry=0x0) at ../../../libglusterfs/src/inode.c:217 217 dentry->inode); (gdb) p dentry $1 = (dentry_t *) 0x0 (gdb) f 1 #1 0x00007f5aacc2a7ce in __inode_link (inode=0x7f5a84964c6c, parent=0x9c189c, name=0x0, iatt=0x7f5a8c2014f0) at ../../../libglusterfs/src/inode.c:816 816 if (old_inode && __is_dentry_cyclic (dentry)) { (gdb) l 811 if (parent) { 812 old_dentry = __dentry_grep (table, parent, name); 813 814 if (!old_dentry || old_dentry->inode != link_inode) { 815 dentry = __dentry_create (link_inode, parent, name); 816 if (old_inode && __is_dentry_cyclic (dentry)) { 817 __dentry_unset (dentry); 818 return NULL; 819 } 820 __dentry_hash (dentry); (gdb) p link_inode $2 = (inode_t *) 0x9c2020 (gdb) p parent $3 = (inode_t *) 0x9c189c (gdb) p name $4 = 0x0 (gdb) p *link_inode $5 = {table = 0x9c1690, gfid = "\377A\024L听\022l>\325\321", <incomplete sequence \344>, lock = 1, nlookup = 0, ref = 1, ia_type = IA_IFDIR, fd_list = {next = 0x9c2050, prev = 0x9c2050}, dentry_list = {next = 0x9c2060, prev = 0x9c2060}, hash = { next = 0x7f5aa3586e70, prev = 0x7f5aa3586e70}, list = {next = 0x9c1a24, prev = 0x9c5238}, _ctx = 0x7f5a84000c30} (gdb) f 3 #3 0x00007f5aa866a5de in _process_entries (this=0x98ded0, parentloc=0x7f5a8c201730, entries=0x7f5a8c201620, offset=0x7f5a8c2016d0, crawl_data=0x7f5a8c0013a0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:754 754 inode_link (entry_loc.inode, parentloc->inode, NULL, &iattr); (gdb) p entry_loc $6 = {path = 0x7f5a840219d0 "/run6686", name = 0x7f5a840219d1 "run6686", inode = 0x7f5a84964c6c, parent = 0x9c189c, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, "\001"} (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. with clients running some tests, bring a brick down 2. bring it up and let self-heal happen 3. parallely keep running volume set operations while above operations are going on. Actual results: glustershd crashed Expected results: glustershd should not crash Additional info: th: /sync_field/file-965, reason: lookup detected pending operations [2012-03-07 07:46:24.969746] I [afr-self-heal-algorithm.c:131:sh_loop_driver_done] 0-mirror-replicate-0: diff self-heal on /sync_field /file-965: completed. (21 blocks of 51 were different (41.18%)) [2012-03-07 07:46:24.972670] I [afr-self-heal-common.c:2028:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background data self- heal completed on /sync_field/file-965 [2012-03-07 07:46:24.974567] I [afr-common.c:1290:afr_launch_self_heal] 0-mirror-replicate-0: background data self-heal triggered. pa th: /sync_field/file-974, reason: lookup detected pending operations [2012-03-07 07:46:26.251466] I [afr-self-heal-algorithm.c:131:sh_loop_driver_done] 0-mirror-replicate-0: diff self-heal on /sync_field /file-974: completed. (22 blocks of 50 were different (44.00%)) [2012-03-07 07:46:26.294914] I [afr-self-heal-common.c:2028:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background data self-heal completed on /sync_field/file-974 [2012-03-07 07:46:26.507573] I [afr-self-heald.c:949:afr_dir_crawl] 0-mirror-replicate-0: Crawl completed on mirror-replicate-0 [2012-03-07 07:46:26.509695] I [afr-self-heald.c:890:afr_find_child_position] 0-mirror-replicate-0: child mirror-client-0 is local [2012-03-07 07:46:26.739686] W [inode.c:487:__dentry_create] (-->/usr/local/lib/glusterfs/3.3.0qa25/xlator/cluster/replicate.so(+0x5e5de) [0x7f5aa866a5de] (-->/usr/local/lib/libglusterfs.so.0(inode_link+0xc2) [0x7f5aacc2a8cc] (-->/usr/local/lib/libglusterfs.so.0(+0x347b7) [0x7f5aacc2a7b7]))) 0-mirror-replicate-0: inode || parent || name not found pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-03-07 07:46:26 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0qa25 /lib64/libc.so.6[0x34c2232980] /usr/local/lib/libglusterfs.so.0(+0x331a3)[0x7f5aacc291a3] /usr/local/lib/libglusterfs.so.0(+0x347ce)[0x7f5aacc2a7ce]
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.
CHANGE: http://review.gluster.com/2893 (cluster/afr: handle sending NULL dentry name for inode link in self-heal-daemon) merged in master by Vijay Bellur (vijay)
Checked with glusterfs-3.3.0qa33. glustershd did not crash because of the NULL dentry.