Created attachment 1231575 [details] BT for one of the core Description of problem: ======================= With a data set of deep directory of depth 800 and a file in each level, added few bricks to the volume. After add-brick only the newly added bricks process were up and the remaining brick process got crashed generating cores. Version-Release number of selected component (if applicable): 3.8.4-8.el7rhgs.x86_64 How reproducible: ================= Only once Steps to Reproduce: =================== 1) Create a Distributed-Disperse volume and start it. 2) FUSE mount it on a client. 3) Create a deep directory of depth 800 and a file in each level. 4) Add few bricks to the volume. Actual results: =============== After add-brick, only the newly added brick were up and running. The remaining brick process got crashed generating cores. Expected results: ================= There should not be any crashes. Generated bt has more than 240 lines, so attached bt to this BZ
(gdb) bt #0 posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, type=type@entry=1, gfid=gfid@entry=0x7fc77defda80 "\325\366U\373@\267E\233\267h\037\203\216\016\217\350\071\071", handle_size=handle_size@entry=72, priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374) at posix-handle.c:171 #1 0x00007fc7f299a936 in posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, type=type@entry=1, gfid=gfid@entry=0x7fc77defec30 "\347F\337\314\374\363H\b\243|\022\322Lrٙ8f", handle_size=handle_size@entry=72, priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374) at posix-handle.c:193 #2 0x00007fc7f299a936 in posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, type=type@entry=1, gfid=gfid@entry=0x7fc77deffde0 "0\273\017\343\332\315L`\234\206\177\034\357\237\027\217a6", handle_size=handle_size@entry=72, priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, ---Type <return> to continue, or q <return> to quit--- Based on the above stack trace and discussion with Raghavendra G, assigning to Quota team to take the first look. Feel free to assign to posix if you find that the issue is in posix.
This is a segfault due to stack overflow. │0x7fc7f299a7d8 <posix_make_ancestryfromgfid+488> andq $0xfffffffffffffff0,-0xf8(%rbp) │ >│0x7fc7f299a7e0 <posix_make_ancestryfromgfid+496> callq 0x7fc7f2974270 <uuid_utoa@plt> │ │0x7fc7f299a7e5 <posix_make_ancestryfromgfid+501> mov %rax,0x18(%rsp) │ │0x7fc7f299a7ea <posix_make_ancestryfromgfid+506> lea 0x7d2d(%rip),%r8 # 0x7fc7f29a251e alloca calls (4k size each )from recursive posix_make_ancestryfromgfid has contributed to stack usage. Replacing alloca with malloc,free should sufficiently free up the stack.
Hi, As per core dump glsuterfsd is crashed in iot_worker thread and current usable stack size is more than 1 MB(1046528) and as per io-thread source code configured stack size for iot is 1 MB(IOT_THREAD_STACK_SIZE) so it crashed due to reach the stack size limit. (gdb) f 0 #0 posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, type=type@entry=1, gfid=gfid@entry=0x7fc77defda80 "\325\366U\373@\267E\233\267h\037\203\216\016\217\350\071\071", handle_size=handle_size@entry=72, priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374) at posix-handle.c:171 171 snprintf (dir_handle, handle_size, "%s/%s/%02x/%02x/%s", (gdb) p $sp $1 = (void *) 0x7fc77defb780 (gdb) f 244 #244 0x00007fc7fed4973d in clone () from /lib64/libc.so.6 (gdb) p $sp $2 = (void *) 0x7fc77dffaf80 (gdb) p 0x7fc77dffaf80 - 0x7fc77defb780 $3 = 1046528 (gdb) I think we need to increase stack size to avoid this crash. Regards Mohit Agrawal
Hi, I think correct way is to resolve the problem before call sys_readlink we should call sys_lstat to check information about the link. After know the buffer(sb) size we can call alloca based on required size to save the link. In current code it is blindly allocated 4k(PATH_MAX) on stack to store linkname instead of checking how much space is required ,in this approach space wastes so to save the space we can call sys_lstat before sys_readlink. >>>>>>>>>>>>>>>>>>>>>>>> dir_handle = alloca (handle_size); linkname = alloca (PATH_MAX); snprintf (dir_handle, handle_size, "%s/%s/%02x/%02x/%s", priv_base_path, GF_HIDDEN_PATH, gfid[0], gfid[1], uuid_utoa (gfid)); len = sys_readlink (dir_handle, linkname, PATH_MAX); if (len < 0) { gf_msg (this->name, (errno == ENOENT || errno == ESTALE) ? GF_LOG_DEBUG:GF_LOG_ERROR, errno, P_MSG_READLINK_FAILED, "could not read the link from " "the gfid handle %s ", dir_handle); ret = -1; *op_errno = errno; goto out; } >>>>>>>>>>>>>>>>>>>>>>>> Regards Mohit Agrawal
Doing a sys_lstat before sys_readlink would add to number of syscalls we make. If we don't reallocate linkname in each frame, using 4k size buffer won't hurt considering 2 alternatives here, 1) make the function iterative and reuse same buffer. we will have to maintain our stack/list of gfid values. 2) add linkname to arguments of posix_make_ancestryfromgfid, call alloca only if linkname is NULL. So recursive calls wont reallocate linkname , It will use the same buffer allocated in bottom most frame. Sticking with (1) as it is cleaner fix.
upstream patch http://review.gluster.org/#/c/16192 posted for review
downstream patch : https://code.engineering.redhat.com/gerrit/93563
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html