Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1404633 - GlusterFS process crashed after add-brick
GlusterFS process crashed after add-brick
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: quota (Show other bugs)
3.2
Unspecified Unspecified
unspecified Severity high
: ---
: RHGS 3.2.0
Assigned To: Sanoj Unnikrishnan
Anil Shah
:
Depends On:
Blocks: 1351528 1405775
  Show dependency treegraph
 
Reported: 2016-12-14 05:10 EST by Prasad Desala
Modified: 2017-03-23 01:56 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-10
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1405775 (view as bug list)
Environment:
Last Closed: 2017-03-23 01:56:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
BT for one of the core (137.44 KB, text/plain)
2016-12-14 05:10 EST, Prasad Desala
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Prasad Desala 2016-12-14 05:10:39 EST
Created attachment 1231575 [details]
BT for one of the core

Description of problem:
=======================
With a data set of deep directory of depth 800 and a file in each level, added few bricks to the volume. After add-brick only the newly added bricks process were up and the remaining brick process got crashed generating cores. 

Version-Release number of selected component (if applicable):
3.8.4-8.el7rhgs.x86_64

How reproducible:
=================
Only once

Steps to Reproduce:
===================
1) Create a Distributed-Disperse volume and start it.
2) FUSE mount it on a client.
3) Create a deep directory of depth 800 and a file in each level. 
4) Add few bricks to the volume.

Actual results:
===============
After add-brick, only the newly added brick were up and running. The remaining brick process got crashed generating cores.

Expected results:
=================
There should not be any crashes.

Generated bt has more than 240 lines, so attached bt to this BZ
Comment 3 Pranith Kumar K 2016-12-14 05:24:55 EST
(gdb) bt
#0  posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, 
    path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, 
    type=type@entry=1, 
    gfid=gfid@entry=0x7fc77defda80 "\325\366U\373@\267E\233\267h\037\203\216\016\217\350\071\071", handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
    xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374)
    at posix-handle.c:171
#1  0x00007fc7f299a936 in posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, 
    path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, 
    type=type@entry=1, 
    gfid=gfid@entry=0x7fc77defec30 "\347F\337\314\374\363H\b\243|\022\322Lrٙ8f", 
    handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
    xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374)
    at posix-handle.c:193
#2  0x00007fc7f299a936 in posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, 
    path=path@entry=0x7fc77dff66d0 "", pathsize=pathsize@entry=4097, head=head@entry=0x0, 
    type=type@entry=1, 
    gfid=gfid@entry=0x7fc77deffde0 "0\273\017\343\332\315L`\234\206\177\034\357\237\027\217a6", handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
---Type <return> to continue, or q <return> to quit---

Based on the above stack trace and discussion with Raghavendra G, assigning to Quota team to take the first look.
Feel free to assign to posix if you find that the issue is in posix.
Comment 4 Sanoj Unnikrishnan 2016-12-14 07:40:14 EST
This is a segfault due to stack overflow.
 
   │0x7fc7f299a7d8 <posix_make_ancestryfromgfid+488> andq   $0xfffffffffffffff0,-0xf8(%rbp)            │
  >│0x7fc7f299a7e0 <posix_make_ancestryfromgfid+496> callq  0x7fc7f2974270 <uuid_utoa@plt>             │
   │0x7fc7f299a7e5 <posix_make_ancestryfromgfid+501> mov    %rax,0x18(%rsp)                            │
   │0x7fc7f299a7ea <posix_make_ancestryfromgfid+506> lea    0x7d2d(%rip),%r8        # 0x7fc7f29a251e   

alloca calls (4k size each )from recursive posix_make_ancestryfromgfid has contributed to stack usage. 
Replacing alloca with malloc,free should sufficiently free up the stack.
Comment 5 Mohit Agrawal 2016-12-15 08:05:36 EST
Hi,

As per core dump glsuterfsd is crashed in iot_worker thread and current usable stack size is more than 1 MB(1046528) and as per io-thread source code configured stack size for iot is 1 MB(IOT_THREAD_STACK_SIZE) so it crashed due to reach the stack size limit.


(gdb) f 0
#0  posix_make_ancestryfromgfid (this=this@entry=0x7fc7ec006cb0, path=path@entry=0x7fc77dff66d0 "", 
    pathsize=pathsize@entry=4097, head=head@entry=0x0, type=type@entry=1, 
    gfid=gfid@entry=0x7fc77defda80 "\325\366U\373@\267E\233\267h\037\203\216\016\217\350\071\071", 
    handle_size=handle_size@entry=72, 
    priv_base_path=priv_base_path@entry=0x7fc7ec0b1f40 "/bricks/brick1/b1", 
    itable=itable@entry=0x7fc7ec0ef7e0, parent=parent@entry=0x7fc77dff66c8, 
    xdata=xdata@entry=0x7fc7ec16268c, op_errno=op_errno@entry=0x7fc77dff8374) at posix-handle.c:171
171	        snprintf (dir_handle, handle_size, "%s/%s/%02x/%02x/%s",
(gdb) p $sp
$1 = (void *) 0x7fc77defb780
(gdb) f 244
#244 0x00007fc7fed4973d in clone () from /lib64/libc.so.6
(gdb) p $sp
$2 = (void *) 0x7fc77dffaf80
(gdb) p 0x7fc77dffaf80 - 0x7fc77defb780
$3 = 1046528
(gdb) 


I think we need to increase stack size to avoid this crash.


Regards
Mohit Agrawal
Comment 6 Mohit Agrawal 2016-12-15 23:06:12 EST
Hi,

I think correct way is to resolve the problem before call sys_readlink we should call sys_lstat to check information about the link.

After know the buffer(sb) size we can call alloca based on required size to save the link.

In current code it is blindly allocated 4k(PATH_MAX) on stack to store linkname instead of checking how much space is required ,in this approach space wastes so to save the space we can call sys_lstat before sys_readlink. 

>>>>>>>>>>>>>>>>>>>>>>>>

        dir_handle = alloca (handle_size);
        linkname   = alloca (PATH_MAX);
        snprintf (dir_handle, handle_size, "%s/%s/%02x/%02x/%s",
                  priv_base_path, GF_HIDDEN_PATH, gfid[0], gfid[1],
                  uuid_utoa (gfid));

        len = sys_readlink (dir_handle, linkname, PATH_MAX);
        if (len < 0) {
                gf_msg (this->name, (errno == ENOENT || errno == ESTALE)
                        ? GF_LOG_DEBUG:GF_LOG_ERROR, errno,
                        P_MSG_READLINK_FAILED, "could not read the link from "
                        "the gfid handle %s ", dir_handle);
                ret = -1;
                *op_errno = errno;
                goto out;
        }

>>>>>>>>>>>>>>>>>>>>>>>>


Regards
Mohit Agrawal
Comment 7 Sanoj Unnikrishnan 2016-12-16 00:46:15 EST
Doing a sys_lstat before sys_readlink would add to number of syscalls we make.
If we don't reallocate linkname in each frame, using 4k size buffer won't hurt

considering 2 alternatives here,
1) make the function iterative and reuse same buffer. we will have to maintain our stack/list of gfid values.
2) add linkname to arguments of posix_make_ancestryfromgfid,  call alloca only if linkname is NULL. So recursive calls wont reallocate linkname , It will use the same buffer allocated in bottom most frame. 

Sticking with (1) as it is cleaner fix.
Comment 10 Atin Mukherjee 2016-12-19 01:42:53 EST
upstream patch http://review.gluster.org/#/c/16192 posted for review
Comment 11 Atin Mukherjee 2016-12-22 01:59:08 EST
downstream patch : https://code.engineering.redhat.com/gerrit/93563
Comment 15 errata-xmlrpc 2017-03-23 01:56:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.