Bug 1315552 - glusterfs brick process crashed
glusterfs brick process crashed
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: bitrot (Show other bugs)
3.7.8
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Venky Shankar
bugs@gluster.org
:
Depends On: 1315465
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-07 22:43 EST by Venky Shankar
Modified: 2016-04-19 03:20 EDT (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1315465
Environment:
Last Closed: 2016-04-19 03:20:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Venky Shankar 2016-03-07 22:43:52 EST
+++ This bug was initially created as a clone of Bug #1315465 +++

Description of problem:

While running regression tests glusterfsd process crashed due to a bit-rot bug.

This is the backtrace of the core generated.

#0  0x00007f8d79409829 in br_stub_cleanup_local (local=0x0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:411
411	/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c: No such file or directory.
[Current thread is 1 (LWP 3513)]
(gdb) bt
#0  0x00007f8d79409829 in br_stub_cleanup_local (local=0x0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:411
#1  0x00007f8d79414751 in br_stub_unlink_cbk (frame=0x7f8d4004403c, cookie=0x7f8d400466bc, this=0x7f8d74011500, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2901
#2  0x00007f8d7961f05c in changelog_unlink_cbk (frame=0x7f8d400466bc, cookie=0x7f8d4000f89c, this=0x7f8d7400f8d0, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changelog/src/changelog.c:190
#3  0x00007f8d79ceb8aa in ctr_unlink_cbk (frame=0x7f8d4000f89c, cookie=0x7f8d4001fbdc, this=0x7f8d7400bc30, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changetimerecorder/src/changetimerecorder.c:1051
#4  0x00007f8d79f07544 in trash_common_unwind_cbk (frame=0x7f8d4001fbdc, cookie=0x7f8d4002471c, this=0x7f8d7400a220, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/trash/src/trash.c:565
#5  0x00007f8d7a737650 in posix_unlink (frame=0x7f8d4002471c, this=0x7f8d74007a80, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/storage/posix/src/posix.c:1915
#6  0x00007f8d79f0a489 in trash_unlink (frame=0x7f8d4001fbdc, this=0x7f8d7400a220, loc=0x7f8d740bbecc, xflags=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/trash/src/trash.c:1004
#7  0x00007f8d79cec179 in ctr_unlink (frame=0x7f8d4000f89c, this=0x7f8d7400bc30, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changetimerecorder/src/changetimerecorder.c:1140
#8  0x00007f8d7961fd1c in changelog_unlink (frame=0x7f8d400466bc, this=0x7f8d7400f8d0, loc=0x7f8d740bbecc, xflags=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changelog/src/changelog.c:319
#9  0x00007f8d79414a72 in br_stub_unlink (frame=0x7f8d4004403c, this=0x7f8d74011500, loc=0x7f8d740bbecc, flag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2930
#10 0x00007f8d791f9d3f in posix_acl_unlink (frame=0x7f8d400343ac, this=0x7f8d74012ad0, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/system/posix-acl/src/posix-acl.c:1403
#11 0x00007f8d8746d114 in default_unlink (frame=0x7f8d400343ac, this=0x7f8d74014050, loc=0x7f8d740bbecc, flags=0, xdata=0x7f8d7408b28c) at defaults.c:2665
#12 0x00007f8d78dc4eab in up_unlink (frame=0x7f8d400428fc, this=0x7f8d74015460, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/upcall/src/upcall.c:481
#13 0x00007f8d87469ef7 in default_unlink_resume (frame=0x7f8d7409bc2c, this=0x7f8d74016a50, loc=0x7f8d740bbecc, flags=0, xdata=0x7f8d7408b28c) at defaults.c:1958
#14 0x00007f8d873f6c4c in call_resume_wind (stub=0x7f8d740bbe7c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/call-stub.c:2131
#15 0x00007f8d873ffa1a in call_resume (stub=0x7f8d740bbe7c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/call-stub.c:2628
#16 0x00007f8d78bb9727 in iot_worker (data=0x7f8d74044ec0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/performance/io-threads/src/io-threads.c:210
#17 0x00007f8d866b8aa1 in start_thread () from ./lib64/libpthread.so.0
#18 0x00007f8d8602193d in clone () from ./lib64/libc.so.6


In br_stub_unlink_cbk, if the unlink operation failed (i.e. op_ret = -1), then it was unwinding directly without getting frame->local value.

After unwinding it tries to cleanup the NULL local and crashes.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-03-07 15:03:39 EST ---

REVIEW: http://review.gluster.org/13628 (features/bit-rot-stub: get frame->local before unwinding) posted (#1) for review on master by Raghavendra Bhat (raghavendra@redhat.com)
Comment 1 Vijay Bellur 2016-03-07 22:45:48 EST
REVIEW: http://review.gluster.org/13630 (features/bit-rot-stub: get frame->local before unwinding) posted (#1) for review on release-3.7 by Venky Shankar (vshankar@redhat.com)
Comment 2 Vijay Bellur 2016-03-08 03:49:57 EST
REVIEW: http://review.gluster.org/13630 (features/bit-rot-stub: get frame->local before unwinding) posted (#2) for review on release-3.7 by Venky Shankar (vshankar@redhat.com)
Comment 3 Vijay Bellur 2016-03-09 10:14:32 EST
REVIEW: http://review.gluster.org/13630 (features/bit-rot-stub: get frame->local before unwinding) posted (#3) for review on release-3.7 by Venky Shankar (vshankar@redhat.com)
Comment 4 Vijay Bellur 2016-03-09 14:29:33 EST
COMMIT: http://review.gluster.org/13630 committed in release-3.7 by Venky Shankar (vshankar@redhat.com) 
------
commit df1b06d24d5f699f397d7936dda740364c5126cd
Author: Raghavendra Bhat <raghavendra@redhat.com>
Date:   Mon Mar 7 15:01:39 2016 -0500

    features/bit-rot-stub: get frame->local before unwinding
    
    In bit-rot-stub, if unlink fails, then it was unwinding
    directly. Then it was trying to cleanup local. But local
    would be NULL, since it was unwinding directly without getting
    the value of frame->local. The NULL cleanup of local was
    causing the brick process to crash.
    
    Change-Id: I8544ba73b2e8dc0c50b1a53ff8027d85588d087b
    BUG: 1315552
    Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
    Signed-off-by: Venky Shankar <vshankar@redhat.com>
    Reviewed-on: http://review.gluster.org/13630
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Comment 5 Kaushal 2016-04-19 03:20:19 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.