Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1315552

Summary: glusterfs brick process crashed
Product: [Community] GlusterFS Reporter: Venky Shankar <vshankar>
Component: bitrotAssignee: Venky Shankar <vshankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact: bugs <bugs>
Priority: unspecified    
Version: 3.7.8CC: bugs, rabhat
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1315465 Environment:
Last Closed: 2016-04-19 07:20:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1315465    
Bug Blocks:    

Description Venky Shankar 2016-03-08 03:43:52 UTC
+++ This bug was initially created as a clone of Bug #1315465 +++

Description of problem:

While running regression tests glusterfsd process crashed due to a bit-rot bug.

This is the backtrace of the core generated.

#0  0x00007f8d79409829 in br_stub_cleanup_local (local=0x0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:411
411	/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c: No such file or directory.
[Current thread is 1 (LWP 3513)]
(gdb) bt
#0  0x00007f8d79409829 in br_stub_cleanup_local (local=0x0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:411
#1  0x00007f8d79414751 in br_stub_unlink_cbk (frame=0x7f8d4004403c, cookie=0x7f8d400466bc, this=0x7f8d74011500, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2901
#2  0x00007f8d7961f05c in changelog_unlink_cbk (frame=0x7f8d400466bc, cookie=0x7f8d4000f89c, this=0x7f8d7400f8d0, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changelog/src/changelog.c:190
#3  0x00007f8d79ceb8aa in ctr_unlink_cbk (frame=0x7f8d4000f89c, cookie=0x7f8d4001fbdc, this=0x7f8d7400bc30, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changetimerecorder/src/changetimerecorder.c:1051
#4  0x00007f8d79f07544 in trash_common_unwind_cbk (frame=0x7f8d4001fbdc, cookie=0x7f8d4002471c, this=0x7f8d7400a220, op_ret=-1, op_errno=2, preparent=0x7f8d703fb2e0, postparent=0x7f8d703fb270, xdata=0x0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/trash/src/trash.c:565
#5  0x00007f8d7a737650 in posix_unlink (frame=0x7f8d4002471c, this=0x7f8d74007a80, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/storage/posix/src/posix.c:1915
#6  0x00007f8d79f0a489 in trash_unlink (frame=0x7f8d4001fbdc, this=0x7f8d7400a220, loc=0x7f8d740bbecc, xflags=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/trash/src/trash.c:1004
#7  0x00007f8d79cec179 in ctr_unlink (frame=0x7f8d4000f89c, this=0x7f8d7400bc30, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changetimerecorder/src/changetimerecorder.c:1140
#8  0x00007f8d7961fd1c in changelog_unlink (frame=0x7f8d400466bc, this=0x7f8d7400f8d0, loc=0x7f8d740bbecc, xflags=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/changelog/src/changelog.c:319
#9  0x00007f8d79414a72 in br_stub_unlink (frame=0x7f8d4004403c, this=0x7f8d74011500, loc=0x7f8d740bbecc, flag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2930
#10 0x00007f8d791f9d3f in posix_acl_unlink (frame=0x7f8d400343ac, this=0x7f8d74012ad0, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/system/posix-acl/src/posix-acl.c:1403
#11 0x00007f8d8746d114 in default_unlink (frame=0x7f8d400343ac, this=0x7f8d74014050, loc=0x7f8d740bbecc, flags=0, xdata=0x7f8d7408b28c) at defaults.c:2665
#12 0x00007f8d78dc4eab in up_unlink (frame=0x7f8d400428fc, this=0x7f8d74015460, loc=0x7f8d740bbecc, xflag=0, xdata=0x7f8d7408b28c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/upcall/src/upcall.c:481
#13 0x00007f8d87469ef7 in default_unlink_resume (frame=0x7f8d7409bc2c, this=0x7f8d74016a50, loc=0x7f8d740bbecc, flags=0, xdata=0x7f8d7408b28c) at defaults.c:1958
#14 0x00007f8d873f6c4c in call_resume_wind (stub=0x7f8d740bbe7c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/call-stub.c:2131
#15 0x00007f8d873ffa1a in call_resume (stub=0x7f8d740bbe7c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/call-stub.c:2628
#16 0x00007f8d78bb9727 in iot_worker (data=0x7f8d74044ec0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/performance/io-threads/src/io-threads.c:210
#17 0x00007f8d866b8aa1 in start_thread () from ./lib64/libpthread.so.0
#18 0x00007f8d8602193d in clone () from ./lib64/libc.so.6


In br_stub_unlink_cbk, if the unlink operation failed (i.e. op_ret = -1), then it was unwinding directly without getting frame->local value.

After unwinding it tries to cleanup the NULL local and crashes.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-03-07 15:03:39 EST ---

REVIEW: http://review.gluster.org/13628 (features/bit-rot-stub: get frame->local before unwinding) posted (#1) for review on master by Raghavendra Bhat (raghavendra)

Comment 1 Vijay Bellur 2016-03-08 03:45:48 UTC
REVIEW: http://review.gluster.org/13630 (features/bit-rot-stub: get frame->local before unwinding) posted (#1) for review on release-3.7 by Venky Shankar (vshankar)

Comment 2 Vijay Bellur 2016-03-08 08:49:57 UTC
REVIEW: http://review.gluster.org/13630 (features/bit-rot-stub: get frame->local before unwinding) posted (#2) for review on release-3.7 by Venky Shankar (vshankar)

Comment 3 Vijay Bellur 2016-03-09 15:14:32 UTC
REVIEW: http://review.gluster.org/13630 (features/bit-rot-stub: get frame->local before unwinding) posted (#3) for review on release-3.7 by Venky Shankar (vshankar)

Comment 4 Vijay Bellur 2016-03-09 19:29:33 UTC
COMMIT: http://review.gluster.org/13630 committed in release-3.7 by Venky Shankar (vshankar) 
------
commit df1b06d24d5f699f397d7936dda740364c5126cd
Author: Raghavendra Bhat <raghavendra>
Date:   Mon Mar 7 15:01:39 2016 -0500

    features/bit-rot-stub: get frame->local before unwinding
    
    In bit-rot-stub, if unlink fails, then it was unwinding
    directly. Then it was trying to cleanup local. But local
    would be NULL, since it was unwinding directly without getting
    the value of frame->local. The NULL cleanup of local was
    causing the brick process to crash.
    
    Change-Id: I8544ba73b2e8dc0c50b1a53ff8027d85588d087b
    BUG: 1315552
    Signed-off-by: Raghavendra Bhat <raghavendra>
    Signed-off-by: Venky Shankar <vshankar>
    Reviewed-on: http://review.gluster.org/13630
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 5 Kaushal 2016-04-19 07:20:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user