Bug 1315465 - glusterfs brick process crashed
Summary: glusterfs brick process crashed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1315552
TreeView+ depends on / blocked
 
Reported: 2016-03-07 19:59 UTC by Raghavendra Bhat
Modified: 2016-06-16 13:59 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1315552 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:59:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2016-03-07 19:59:52 UTC
Description of problem:

While running regression tests glusterfsd process crashed due to a bit-rot bug.

This is the backtrace of the core generated.

#0  0x00007f8d79409829 in br_stub_cleanup_local (local=0x0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:411
411	/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c: No such file or directory.
[Current thread is 1 (LWP 3513)]
(gdb) bt
#0  0x00007f8d79409829 in br_stub_cleanup_local (local=0x0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:411
#1  0x00007f8d79414751 in br_stub_unlink_cbk (frame=0x7f8d4004403c, cookie=0x7f8d400466bc, this=0x7f8d74011500, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2901
#2  0x00007f8d7961f05c in changelog_unlink_cbk (frame=0x7f8d400466bc, cookie=0x7f8d4000f89c, this=0x7f8d7400f8d0, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changelog/src/changelog.c:190
#3  0x00007f8d79ceb8aa in ctr_unlink_cbk (frame=0x7f8d4000f89c, cookie=0x7f8d4001fbdc, this=0x7f8d7400bc30, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changetimerecorder/src/changetimerecorder.c:1051
#4  0x00007f8d79f07544 in trash_common_unwind_cbk (frame=0x7f8d4001fbdc, cookie=0x7f8d4002471c, this=0x7f8d7400a220, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/trash/src/trash.c:565
#5  0x00007f8d7a737650 in posix_unlink (frame=0x7f8d4002471c, this=0x7f8d74007a80, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/storage/posix/src/posix.c:1915
#6  0x00007f8d79f0a489 in trash_unlink (frame=0x7f8d4001fbdc, this=0x7f8d7400a220, loc=0x7f8d740bbecc, xflags=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/trash/src/trash.c:1004
#7  0x00007f8d79cec179 in ctr_unlink (frame=0x7f8d4000f89c, this=0x7f8d7400bc30, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changetimerecorder/src/changetimerecorder.c:1140
#8  0x00007f8d7961fd1c in changelog_unlink (frame=0x7f8d400466bc, this=0x7f8d7400f8d0, loc=0x7f8d740bbecc, xflags=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changelog/src/changelog.c:319
#9  0x00007f8d79414a72 in br_stub_unlink (frame=0x7f8d4004403c, this=0x7f8d74011500, loc=0x7f8d740bbecc, flag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2930
#10 0x00007f8d791f9d3f in posix_acl_unlink (frame=0x7f8d400343ac, this=0x7f8d74012ad0, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/system/posix-acl/src/posix-acl.c:1403
#11 0x00007f8d8746d114 in default_unlink (frame=0x7f8d400343ac, this=0x7f8d74014050, loc=0x7f8d740bbecc, flags=0, xdata=0x7f8d7408b28c) at defaults.c:2665
#12 0x00007f8d78dc4eab in up_unlink (frame=0x7f8d400428fc, this=0x7f8d74015460, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/upcall/src/upcall.c:481
#13 0x00007f8d87469ef7 in default_unlink_resume (frame=0x7f8d7409bc2c, this=0x7f8d74016a50, loc=0x7f8d740bbecc, flags=0, xdata=0x7f8d7408b28c) at defaults.c:1958
#14 0x00007f8d873f6c4c in call_resume_wind (stub=0x7f8d740bbe7c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/call-stub.c:2131
#15 0x00007f8d873ffa1a in call_resume (stub=0x7f8d740bbe7c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/call-stub.c:2628
#16 0x00007f8d78bb9727 in iot_worker (data=0x7f8d74044ec0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/performance/io-threads/src/io-threads.c:210
#17 0x00007f8d866b8aa1 in start_thread () from ./lib64/libpthread.so.0
#18 0x00007f8d8602193d in clone () from ./lib64/libc.so.6


In br_stub_unlink_cbk, if the unlink operation failed (i.e. op_ret = -1), then it was unwinding directly without getting frame->local value.

After unwinding it tries to cleanup the NULL local and crashes.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Vijay Bellur 2016-03-07 20:03:39 UTC
REVIEW: http://review.gluster.org/13628 (features/bit-rot-stub: get frame->local before unwinding) posted (#1) for review on master by Raghavendra Bhat (raghavendra)

Comment 2 Vijay Bellur 2016-03-08 05:13:00 UTC
REVIEW: http://review.gluster.org/13628 (features/bit-rot-stub: get frame->local before unwinding) posted (#2) for review on master by Venky Shankar (vshankar)

Comment 3 Vijay Bellur 2016-03-09 15:14:16 UTC
COMMIT: http://review.gluster.org/13628 committed in master by Venky Shankar (vshankar) 
------
commit 8fd5a8e7a3cbcc8e98ddb2ec161ef14cd5a671aa
Author: Raghavendra Bhat <raghavendra>
Date:   Mon Mar 7 15:01:39 2016 -0500

    features/bit-rot-stub: get frame->local before unwinding
    
    In bit-rot-stub, if unlink fails, then it was unwinding
    directly. Then it was trying to cleanup local. But local
    would be NULL, since it was unwinding directly without getting
    the value of frame->local. The NULL cleanup of local was
    causing the brick process to crash.
    
    Change-Id: I8544ba73b2e8dc0c50b1a53ff8027d85588d087b
    BUG: 1315465
    Signed-off-by: Raghavendra Bhat <raghavendra>
    Signed-off-by: Venky Shankar <vshankar>
    Reviewed-on: http://review.gluster.org/13628
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Kotresh HR <khiremat>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 4 Niels de Vos 2016-06-16 13:59:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.