Bug 1231617

Summary: Scrubber crash upon pause
Product: [Community] GlusterFS Reporter: Venky Shankar <vshankar>
Component: bitrotAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact: bugs <bugs>
Priority: unspecified    
Version: mainlineCC: bugs, rmekala, smohan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1226830
: 1232307 (view as bug list) Environment:
Last Closed: 2016-06-16 13:11:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1226666, 1226830, 1231619, 1232309    
Bug Blocks:    

Description Venky Shankar 2015-06-15 05:50:29 UTC
+++ This bug was initially created as a clone of Bug #1226830 +++

Description of problem:
Pausing scrubber results in scrubber process crashing at times.

Version-Release number of selected component (if applicable):
3.7.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Create & start a Gluster volume
2. Enable bitrot on the volume
3. Pause scrubber for this volume as per below:

# gluster volume bitrot <vol> scrub pause

Actual results:
Scrubber process crashes at times

Expected results:
Scrubber process should be running (although it should not scrub the filesystem for the volume)

BT (reported by anekkunt: http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html)

(gdb) bt
#0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0, expires=233889) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, pendingcheck=_gf_true)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703
#2  0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980, options=0x7f89d3bc9558) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673
#3  0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980, new_xl=0x7f89c409b460) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084
#4  0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070
#5  0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112
#6  0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30, newgraph=0x7f89c4098130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893
#7  0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932, newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,

--- Additional comment from Venky Shankar on 2015-06-01 05:41:22 EDT ---

So, the crash is due to a race between CHILD_UP (where ->timer is initialized for the subvolume) and reconfigure() which tries to access ->timer to reschedule the scrub time.

--- Additional comment from Anand Avati on 2015-06-11 10:53:00 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#3) for review on master by Venky Shankar (vshankar)

--- Additional comment from Anand Avati on 2015-06-14 23:35:07 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#5) for review on master by Venky Shankar (vshankar)

Comment 1 Anand Avati 2015-06-15 05:53:31 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#6) for review on master by Venky Shankar (vshankar)

Comment 2 Anand Avati 2015-06-16 03:53:56 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#7) for review on master by Venky Shankar (vshankar)

Comment 3 Anand Avati 2015-06-16 06:35:46 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#8) for review on master by Venky Shankar (vshankar)

Comment 4 Anand Avati 2015-06-16 08:38:27 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#9) for review on master by Venky Shankar (vshankar)

Comment 5 Anand Avati 2015-06-16 09:42:46 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#10) for review on master by Venky Shankar (vshankar)

Comment 6 Anand Avati 2015-06-16 09:42:53 UTC
REVIEW: http://review.gluster.org/11248 (tests/bitrot: remove induced delay) posted (#1) for review on master by Venky Shankar (vshankar)

Comment 7 Anand Avati 2015-06-17 04:11:51 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#11) for review on master by Venky Shankar (vshankar)

Comment 8 Anand Avati 2015-06-17 08:53:18 UTC
REVIEW: http://review.gluster.org/11263 (Revert "tests/bitrot: Induce delay before invoking bitrot subcommands") posted (#3) for review on master by Venky Shankar (vshankar)

Comment 9 Anand Avati 2015-06-17 09:06:23 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#12) for review on master by Venky Shankar (vshankar)

Comment 10 Anand Avati 2015-06-17 17:30:42 UTC
REVIEW: http://review.gluster.org/11263 (Revert "tests/bitrot: Induce delay before invoking bitrot subcommands") posted (#6) for review on master by Venky Shankar (vshankar)

Comment 11 Anand Avati 2015-06-17 17:30:44 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#14) for review on master by Venky Shankar (vshankar)

Comment 12 Anand Avati 2015-06-18 00:36:30 UTC
REVIEW: http://review.gluster.org/11263 (Revert "tests/bitrot: Induce delay before invoking bitrot subcommands") posted (#7) for review on master by Venky Shankar (vshankar)

Comment 13 Anand Avati 2015-06-18 00:36:35 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#15) for review on master by Venky Shankar (vshankar)

Comment 14 Anand Avati 2015-06-20 06:48:20 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#16) for review on master by Venky Shankar (vshankar)

Comment 15 Anand Avati 2015-06-21 13:03:46 UTC
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#18) for review on master by Venky Shankar (vshankar)

Comment 16 Anand Avati 2015-06-25 11:42:46 UTC
COMMIT: http://review.gluster.org/11147 committed in master by Raghavendra Bhat (raghavendra) 
------
commit 17b838ce18e0eb9dbfe9a540a3006023b19276e7
Author: Venky Shankar <vshankar>
Date:   Tue Jun 2 21:23:48 2015 +0530

    features/bitrot: cleanup, v1
    
    This is a short series of patches (with other cleanups) aimed at
    cleaning up some of the incorrect assumptions taken in reconfigure()
    leading to crashes when subvolumes are not fully initialized (as
    reported here[1] on gluster-devel@). Furthermore, there is some
    amount of code cleanup to handle disconnection and cleanup up data
    structure (as part of subsequent patch).
    
    [1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html
    
    Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132
    BUG: 1231617
    Signed-off-by: Venky Shankar <vshankar>
    Reviewed-on: http://review.gluster.org/11147
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra Bhat <raghavendra>

Comment 17 Anand Avati 2015-06-25 11:44:54 UTC
COMMIT: http://review.gluster.org/11263 committed in master by Raghavendra Bhat (raghavendra) 
------
commit 367049879e149e2cd3ec3ba96de7f495a30de180
Author: Venky Shankar <vshankar>
Date:   Wed Jun 17 09:35:22 2015 +0530

    Revert "tests/bitrot: Induce delay before invoking bitrot subcommands"
    
    This reverts commit a615f6c078c76791318c2a58efcc8baef18c25db.
    
    Change-Id: I8b014a99686cd4ee07da9d26bca561b420c8bec7
    BUG: 1231617
    Signed-off-by: Venky Shankar <vshankar>
    Reviewed-on: http://review.gluster.org/11263
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra Bhat <raghavendra>

Comment 18 Niels de Vos 2016-06-16 13:11:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user