Bug 1286058

Summary: Brick crashes because of race in bit-rot init
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: bitrotAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: byarlaga, pkarampu, rcyriac, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.5-8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1285616 Environment:
Last Closed: 2016-03-01 06:03:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1285616    
Bug Blocks: 1260783, 1285758    

Description RamaKasturi 2015-11-27 10:25:57 UTC
+++ This bug was initially created as a clone of Bug #1285616 +++

Description of problem:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fce642eb420 in pthread_mutex_lock () from ./lib64/libpthread.so.0
(gdb) bt
#0  0x00007fce642eb420 in pthread_mutex_lock () from ./lib64/libpthread.so.0
#1  0x00007fce52ee2e13 in br_stub_worker (data=0x7fce54010f90)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:337
#2  0x00007fce642e9a51 in start_thread () from ./lib64/libpthread.so.0
#3  0x00007fce63c5393d in clone () from ./lib64/libc.so.6
(gdb) fr 1
#1  0x00007fce52ee2e13 in br_stub_worker (data=0x7fce54010f90)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:337
337	/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c: No such file or directory.
(gdb) info locals
priv = 0x0
this = 0x7fce54010f90
stub = 0x0
ret = 0

init may not have initialized this->private by the time br_stub_worker starts running, leading to NULL dereference.
Version-Release number of selected component (if applicable):

How reproducible:
This is observed at https://build.gluster.org/job/rackspace-regression-2GB-triggered/16180/consoleFull

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2015-11-25 23:36:24 EST ---

REVIEW: http://review.gluster.org/12754 (features/bit-rot: Fix NULL dereference) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2015-11-26 04:03:33 EST ---

REVIEW: http://review.gluster.org/12754 (features/bit-rot: Fix NULL dereference) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2015-11-26 23:04:23 EST ---

COMMIT: http://review.gluster.org/12754 committed in master by Venky Shankar (vshankar) 
------
commit a1919e91279a6c691fbd3dd6c0d97e74e78ccf22
Author: Pranith Kumar K <pkarampu>
Date:   Thu Nov 26 09:58:39 2015 +0530

    features/bit-rot: Fix NULL dereference
    
    Problem:
    By the time br_stub_worker is accessing this->private in it's
    thread, 'init' may not have set 'this->private = priv'. This
    leads to NULL dereference leading to brick crash.
    
    Fix:
    Set this->private before launching these threads.
    
    Change-Id: Ic797eb195fdd0c70d19f28d0b97bc0181fd3dd2f
    BUG: 1285616
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/12754
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Venky Shankar <vshankar>

Comment 4 RamaKasturi 2015-12-08 07:14:54 UTC
verified and works fine with build glusterfs-3.7.5-9.el7rhgs.x86_64. Did not see any brick crash when volume stops and starts.

Comment 7 errata-xmlrpc 2016-03-01 06:03:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html