Bug 1420202

Summary: glusterd is crashed at the time of stop volume
Product: [Community] GlusterFS Reporter: Mohit Agrawal <moagrawa>
Component: glusterdAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, kkeithle, moagrawa, nbalacha
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.11.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1420606 (view as bug list) Environment:
Last Closed: 2017-05-30 18:41:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420430, 1420606    

Description Mohit Agrawal 2017-02-08 06:16:06 UTC
Description of problem:
glusterd is crashed at the time of stop the volume

Version-Release number of selected component (if applicable):

glusterfs-3.11dev-0.56.git3cbf732.el7.x86_64
How reproducible:
Allways

Steps to Reproduce:
1.Setup 1*2 environment and start the volume
2.Stop the volume 
3.glusterd is crashed

Actual results:

glusterd is crashed
Expected results:
it should not crash

Additional info:

Comment 1 Atin Mukherjee 2017-02-08 06:20:51 UTC
(In reply to Mohit Agrawal from comment #0)
> Description of problem:
> glusterd is crashed at the time of stop the volume
> 
> Version-Release number of selected component (if applicable):
> 
> glusterfs-3.11dev-0.56.git3cbf732.el7.x86_64

Are you using some private rpms?

> How reproducible:
> Allways
> 
> Steps to Reproduce:
> 1.Setup 1*2 environment and start the volume
> 2.Stop the volume 
> 3.glusterd is crashed
> 
> Actual results:
> 
> glusterd is crashed
> Expected results:
> it should not crash
> 
> Additional info:

Comment 2 Mohit Agrawal 2017-02-08 06:29:02 UTC
Hi,

I have build rpm on latest upstream code to reproduce one another issue.


Regards
Mohit Agrawal

Comment 3 Mohit Agrawal 2017-02-08 06:34:42 UTC
Hi,

Below is the compile time error throwing at the time of source build 

>>>>>>>>>>>>>>>>>>

In function 'snprintf',
    inlined from 'glusterd_bricks_select_stop_volume' at glusterd-op-sm.c:6182:25:
/usr/include/bits/stdio2.h:64:3: warning: call to __builtin___snprintf_chk will always overflow destination buffer [enabled by default]
   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
   ^
In function 'snprintf',
    inlined from 'glusterd_bricks_select_remove_brick' at glusterd-op-sm.c:6291:25:
/usr/include/bits/stdio2.h:64:3: warning: call to __builtin___snprintf_chk will always overflow destination buffer [enabled by default]
   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,


>>>>>>>>>>>>>>>>>>

gluster v stop dist-repl
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Connection failed. Please check if gluster daemon is operational.


Below is the bt pattern for the same

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007f8d06e015f7 in raise () from /lib64/libc.so.6
#1  0x00007f8d06e02ce8 in abort () from /lib64/libc.so.6
#2  0x00007f8d06e41317 in __libc_message () from /lib64/libc.so.6
#3  0x00007f8d06ed9b37 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007f8d06ed7cf0 in __chk_fail () from /lib64/libc.so.6
#5  0x00007f8d06ed740b in __vsnprintf_chk () from /lib64/libc.so.6
#6  0x00007f8d06ed7328 in __snprintf_chk () from /lib64/libc.so.6
#7  0x00007f8cfd249e04 in snprintf (__fmt=0x7f8cfd346818 "%s/run/%s-%s.pid", 
    __n=4096, __s=0x7f8cec407310 "`t@\354\214\177") at /usr/include/bits/stdio2.h:64
#8  glusterd_bricks_select_stop_volume (dict=dict@entry=0x7f8ce4000c50, 
    op_errstr=op_errstr@entry=0x7f8cec409910, selected=selected@entry=0x7f8cec409850)
    at glusterd-op-sm.c:6182
#9  0x00007f8cfd257916 in glusterd_op_bricks_select (op=op@entry=GD_OP_STOP_VOLUME, 
    dict=dict@entry=0x7f8ce4000c50, op_errstr=op_errstr@entry=0x7f8cec409910, 
    selected=selected@entry=0x7f8cec409850, rsp_dict=rsp_dict@entry=0x7f8ce40177c0)
    at glusterd-op-sm.c:7645
#10 0x00007f8cfd2f42af in gd_brick_op_phase (op=GD_OP_STOP_VOLUME, 
    op_ctx=op_ctx@entry=0x7f8ce4005a00, req_dict=0x7f8ce4000c50, 
    op_errstr=op_errstr@entry=0x7f8cec409910) at glusterd-syncop.c:1685
#11 0x00007f8cfd2f4d33 in gd_sync_task_begin (op_ctx=op_ctx@entry=0x7f8ce4005a00, 
    req=req@entry=0x7f8cec0018b0) at glusterd-syncop.c:1937
#12 0x00007f8cfd2f5030 in glusterd_op_begin_synctask (req=req@entry=0x7f8cec0018b0, 
---Type <return> to continue, or q <return> to quit---
    op=op@entry=GD_OP_STOP_VOLUME, dict=0x7f8ce4005a00) at glusterd-syncop.c:2006
#13 0x00007f8cfd2dc47f in __glusterd_handle_cli_stop_volume (
    req=req@entry=0x7f8cec0018b0) at glusterd-volume-ops.c:628
#14 0x00007f8cfd23bfde in glusterd_big_locked_handler (req=0x7f8cec0018b0, 
    actor_fn=0x7f8cfd2dc280 <__glusterd_handle_cli_stop_volume>)
    at glusterd-handler.c:81
#15 0x00007f8d087526d0 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#16 0x00007f8d06e13110 in ?? () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
(gdb) 
(gdb) f 8
#8  glusterd_bricks_select_stop_volume (dict=dict@entry=0x7f8ce4000c50, 
    op_errstr=op_errstr@entry=0x7f8cec409910, selected=selected@entry=0x7f8cec409850)
    at glusterd-op-sm.c:6182
6182	                        GLUSTERD_GET_BRICK_PIDFILE (pidfile, volinfo,
(gdb) p sizeof(pidfile)
$1 = 1024

#define GLUSTERD_GET_BRICK_PIDFILE(pidfile,volinfo,brickinfo, priv) do {      \
                char exp_path[PATH_MAX] = {0,};                               \
                char volpath[PATH_MAX]  = {0,};                               \
                GLUSTERD_GET_VOLUME_DIR (volpath, volinfo, priv);             \
                GLUSTERD_REMOVE_SLASH_FROM_PATH (brickinfo->path, exp_path);  \
                snprintf (pidfile, PATH_MAX, "%s/run/%s-%s.pid",              \
                          volpath, brickinfo->hostname, exp_path);      \
        } 
>>>>>>>>>>>>>>>>>>>>>>>


RCA: glusterd is crashed because pidfile array size is 1024 and data copying by the function GLUSTERD_GET_BRICK_PIDFILE length is PATH_MAX(4096) , because of overflowing the array glusterd is crashed.

After increase the size of array in parent function (glusterd_bricks_select_stop_volume) issue will resolve.

Regards
Mohit Agrawal

Comment 4 Worker Ant 2017-02-08 06:51:46 UTC
REVIEW: https://review.gluster.org/16560 (glusterd: glusterd is crashed at the time of stop volume) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 5 Worker Ant 2017-02-08 08:48:21 UTC
REVIEW: https://review.gluster.org/16560 (glusterd: glusterd is crashed at the time of stop volume) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 6 Worker Ant 2017-02-08 16:48:04 UTC
COMMIT: https://review.gluster.org/16560 committed in master by Atin Mukherjee (amukherj) 
------
commit 9ac193a19b0ca6d6548aeafa5c915b26396f8697
Author: Mohit Agrawal <moagrawa>
Date:   Wed Feb 8 12:20:55 2017 +0530

    glusterd: glusterd is crashed at the time of stop volume
    
    Problem: glusterd is crashed at the time of stop volume due to
             overflow of pidfile array after build rpm with default options.
    
    Solution: To avoid the crash update the pidfile array size.
    
    Test:    To test the patch followed below procedure
             1) Setup 1*2 environment and start the volume
             2) Stop the volume
             Before apply the patch glusterd is crashed.
    
    Note:  The crash is happened only after build rpm with rpmbuild -ba
           <spec> because _FORTIFY_SOURCE is enabled. This option tries to
           figure out possible overflow scenarios like the bug here and
           crash the process.
    
    BUG: 1420202
    Change-Id: I58a006bc0727843a7ed02a10b4ebd5dca39eae67
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/16560
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Atin Mukherjee <amukherj>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 7 Atin Mukherjee 2017-02-08 17:03:01 UTC
*** Bug 1420430 has been marked as a duplicate of this bug. ***

Comment 8 Atin Mukherjee 2017-02-08 17:03:44 UTC
*** Bug 1420429 has been marked as a duplicate of this bug. ***

Comment 9 Shyamsundar 2017-05-30 18:41:34 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/