Bug 1226830 - Scrubber crash upon pause
Summary: Scrubber crash upon pause
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Venky Shankar
QA Contact: RajeshReddy
bugs@gluster.org
URL:
Whiteboard:
Depends On: 1226666 1231619 1232309
Blocks: 1231617
TreeView+ depends on / blocked
 
Reported: 2015-06-01 09:05 UTC by Venky Shankar
Modified: 2016-07-13 22:34 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.3
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1231617 (view as bug list)
Environment:
Last Closed: 2015-07-30 09:48:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Venky Shankar 2015-06-01 09:05:51 UTC
Description of problem:
Pausing scrubber results in scrubber process crashing at times.

Version-Release number of selected component (if applicable):
3.7.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Create & start a Gluster volume
2. Enable bitrot on the volume
3. Pause scrubber for this volume as per below:

# gluster volume bitrot <vol> scrub pause

Actual results:
Scrubber process crashes at times

Expected results:
Scrubber process should be running (although it should not scrub the filesystem for the volume)

BT (reported by anekkunt: http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html)

(gdb) bt
#0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0, expires=233889) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, pendingcheck=_gf_true)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703
#2  0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980, options=0x7f89d3bc9558) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673
#3  0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980, new_xl=0x7f89c409b460) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084
#4  0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070
#5  0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112
#6  0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30, newgraph=0x7f89c4098130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893
#7  0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932, newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,

Comment 1 Venky Shankar 2015-06-01 09:41:22 UTC
So, the crash is due to a race between CHILD_UP (where ->timer is initialized for the subvolume) and reconfigure() which tries to access ->timer to reschedule the scrub time.

Comment 2 Anand Avati 2015-06-11 14:53:00 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#3) for review on master by Venky Shankar (vshankar)

Comment 3 Anand Avati 2015-06-15 03:35:07 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#5) for review on master by Venky Shankar (vshankar)

Comment 4 Anand Avati 2015-07-09 10:13:14 UTC
COMMIT: http://review.gluster.org/11539 committed in release-3.7 by Raghavendra Bhat (raghavendra) 
------
commit 291aba1f1ba33831569acd879e3357c1fd01a5c8
Author: Venky Shankar <vshankar>
Date:   Tue Jun 2 21:23:48 2015 +0530

    features/bitrot: cleanup, v1
    
        Backport of http://review.gluster.org/11147
    
    This is a short series of patches (with other cleanups) aimed at
    cleaning up some of the incorrect assumptions taken in reconfigure()
    leading to crashes when subvolumes are not fully initialized (as
    reported here[1] on gluster-devel@). Furthermore, there is some
    amount of code cleanup to handle disconnection and cleanup up data
    structure (as part of subsequent patch).
    
    [1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html
    
    Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132
    BUG: 1226830
    Signed-off-by: Venky Shankar <vshankar>
    Reviewed-on: http://review.gluster.org/11539
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>

Comment 5 Kaushal 2015-07-30 09:48:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.