Bug 1404633 - GlusterFS process crashed after add-brick
Summary: GlusterFS process crashed after add-brick
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: quota
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Sanoj Unnikrishnan
QA Contact: Anil Shah
URL:
Whiteboard:
Depends On:
Blocks: 1351528 1405775
TreeView+ depends on / blocked
 
Reported: 2016-12-14 10:10 UTC by Prasad Desala
Modified: 2017-03-23 05:56 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.8.4-10
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1405775 (view as bug list)
Environment:
Last Closed: 2017-03-23 05:56:52 UTC
Embargoed:


Attachments (Terms of Use)
BT for one of the core (137.44 KB, text/plain)
2016-12-14 10:10 UTC, Prasad Desala
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Prasad Desala 2016-12-14 10:10:39 UTC
Created attachment 1231575 [details]
BT for one of the core

Description of problem:
=======================
With a data set of deep directory of depth 800 and a file in each level, added few bricks to the volume. After add-brick only the newly added bricks process were up and the remaining brick process got crashed generating cores. 

Version-Release number of selected component (if applicable):
3.8.4-8.el7rhgs.x86_64

How reproducible:
=================
Only once

Steps to Reproduce:
===================
1) Create a Distributed-Disperse volume and start it.
2) FUSE mount it on a client.
3) Create a deep directory of depth 800 and a file in each level. 
4) Add few bricks to the volume.

Actual results:
===============
After add-brick, only the newly added brick were up and running. The remaining brick process got crashed generating cores.

Expected results:
=================
There should not be any crashes.

Generated bt has more than 240 lines, so attached bt to this BZ

Comment 3 Pranith Kumar K 2016-12-14 10:24:55 UTC
(gdb) bt
#0  posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, 
    path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, 
    type=type@entry=1, 
    gfid=gfid@entry=0x7fc77defda80 "\325\366U\373@\267E\233\267h\037\203\216\016\217\350\071\071", handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
    xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374)
    at posix-handle.c:171
#1  0x00007fc7f299a936 in posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, 
    path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, 
    type=type@entry=1, 
    gfid=gfid@entry=0x7fc77defec30 "\347F\337\314\374\363H\b\243|\022\322Lrٙ8f", 
    handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
    xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374)
    at posix-handle.c:193
#2  0x00007fc7f299a936 in posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, 
    path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, 
    type=type@entry=1, 
    gfid=gfid@entry=0x7fc77deffde0 "0\273\017\343\332\315L`\234\206\177\034\357\237\027\217a6", handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
---Type <return> to continue, or q <return> to quit---

Based on the above stack trace and discussion with Raghavendra G, assigning to Quota team to take the first look.
Feel free to assign to posix if you find that the issue is in posix.

Comment 4 Sanoj Unnikrishnan 2016-12-14 12:40:14 UTC
This is a segfault due to stack overflow.
 
   │0x7fc7f299a7d8 <posix_make_ancestryfromgfid+488> andq   $0xfffffffffffffff0,-0xf8(%rbp)            │
  >│0x7fc7f299a7e0 <posix_make_ancestryfromgfid+496> callq  0x7fc7f2974270 <uuid_utoa@plt>             │
   │0x7fc7f299a7e5 <posix_make_ancestryfromgfid+501> mov    %rax,0x18(%rsp)                            │
   │0x7fc7f299a7ea <posix_make_ancestryfromgfid+506> lea    0x7d2d(%rip),%r8        # 0x7fc7f29a251e   

alloca calls (4k size each )from recursive posix_make_ancestryfromgfid has contributed to stack usage. 
Replacing alloca with malloc,free should sufficiently free up the stack.

Comment 5 Mohit Agrawal 2016-12-15 13:05:36 UTC
Hi,

As per core dump glsuterfsd is crashed in iot_worker thread and current usable stack size is more than 1 MB(1046528) and as per io-thread source code configured stack size for iot is 1 MB(IOT_THREAD_STACK_SIZE) so it crashed due to reach the stack size limit.


(gdb) f 0
#0  posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, path=path@entry=0x7fc77dff66d0 "", 
    pathsize=pathsize@entry=4097, head=head@entry=0x0, type=type@entry=1, 
    gfid=gfid@entry=0x7fc77defda80 "\325\366U\373@\267E\233\267h\037\203\216\016\217\350\071\071", 
    handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
    xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374) at posix-handle.c:171
171	        snprintf (dir_handle, handle_size, "%s/%s/%02x/%02x/%s",
(gdb) p $sp
$1 = (void *) 0x7fc77defb780
(gdb) f 244
#244 0x00007fc7fed4973d in clone () from /lib64/libc.so.6
(gdb) p $sp
$2 = (void *) 0x7fc77dffaf80
(gdb) p 0x7fc77dffaf80 - 0x7fc77defb780
$3 = 1046528
(gdb) 


I think we need to increase stack size to avoid this crash.


Regards
Mohit Agrawal

Comment 6 Mohit Agrawal 2016-12-16 04:06:12 UTC
Hi,

I think correct way is to resolve the problem before call sys_readlink we should call sys_lstat to check information about the link.

After know the buffer(sb) size we can call alloca based on required size to save the link.

In current code it is blindly allocated 4k(PATH_MAX) on stack to store linkname instead of checking how much space is required ,in this approach space wastes so to save the space we can call sys_lstat before sys_readlink. 

>>>>>>>>>>>>>>>>>>>>>>>>

        dir_handle = alloca (handle_size);
        linkname   = alloca (PATH_MAX);
        snprintf (dir_handle, handle_size, "%s/%s/%02x/%02x/%s",
                  priv_base_path, GF_HIDDEN_PATH, gfid[0], gfid[1],
                  uuid_utoa (gfid));

        len = sys_readlink (dir_handle, linkname, PATH_MAX);
        if (len < 0) {
                gf_msg (this->name, (errno == ENOENT || errno == ESTALE)
                        ? GF_LOG_DEBUG:GF_LOG_ERROR, errno,
                        P_MSG_READLINK_FAILED, "could not read the link from "
                        "the gfid handle %s ", dir_handle);
                ret = -1;
                *op_errno = errno;
                goto out;
        }

>>>>>>>>>>>>>>>>>>>>>>>>


Regards
Mohit Agrawal

Comment 7 Sanoj Unnikrishnan 2016-12-16 05:46:15 UTC
Doing a sys_lstat before sys_readlink would add to number of syscalls we make.
If we don't reallocate linkname in each frame, using 4k size buffer won't hurt

considering 2 alternatives here,
1) make the function iterative and reuse same buffer. we will have to maintain our stack/list of gfid values.
2) add linkname to arguments of posix_make_ancestryfromgfid,  call alloca only if linkname is NULL. So recursive calls wont reallocate linkname , It will use the same buffer allocated in bottom most frame. 

Sticking with (1) as it is cleaner fix.

Comment 10 Atin Mukherjee 2016-12-19 06:42:53 UTC
upstream patch http://review.gluster.org/#/c/16192 posted for review

Comment 11 Atin Mukherjee 2016-12-22 06:59:08 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/93563

Comment 15 errata-xmlrpc 2017-03-23 05:56:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.