Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1226830

Summary:	Scrubber crash upon pause
Product:	[Community] GlusterFS	Reporter:	Venky Shankar <vshankar>
Component:	bitrot	Assignee:	Venky Shankar <vshankar>
Status:	CLOSED CURRENTRELEASE	QA Contact:	RajeshReddy <rmekala>
Severity:	unspecified	Docs Contact:	bugs <bugs>
Priority:	unspecified
Version:	3.7.0	CC:	anekkunt, bugs, ggarg, mzywusko, nsathyan
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.3	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1231617 (view as bug list)		Environment:
Last Closed:	2015-07-30 09:48:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1226666, 1231619, 1232309
Bug Blocks:	1231617

Description Venky Shankar 2015-06-01 09:05:51 UTC

Description of problem:
Pausing scrubber results in scrubber process crashing at times.

Version-Release number of selected component (if applicable):
3.7.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Create & start a Gluster volume
2. Enable bitrot on the volume
3. Pause scrubber for this volume as per below:

# gluster volume bitrot <vol> scrub pause

Actual results:
Scrubber process crashes at times

Expected results:
Scrubber process should be running (although it should not scrub the filesystem for the volume)

BT (reported by anekkunt: http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html)

(gdb) bt
#0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0, expires=233889) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, pendingcheck=_gf_true)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703
#2  0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980, options=0x7f89d3bc9558) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673
#3  0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980, new_xl=0x7f89c409b460) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084
#4  0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070
#5  0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112
#6  0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30, newgraph=0x7f89c4098130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893
#7  0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932, newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,

Comment 1 Venky Shankar 2015-06-01 09:41:22 UTC

So, the crash is due to a race between CHILD_UP (where ->timer is initialized for the subvolume) and reconfigure() which tries to access ->timer to reschedule the scrub time.

Comment 2 Anand Avati 2015-06-11 14:53:00 UTC

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#3) for review on master by Venky Shankar (vshankar)

Comment 3 Anand Avati 2015-06-15 03:35:07 UTC

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#5) for review on master by Venky Shankar (vshankar)

Comment 4 Anand Avati 2015-07-09 10:13:14 UTC

COMMIT: http://review.gluster.org/11539 committed in release-3.7 by Raghavendra Bhat (raghavendra) 
------
commit 291aba1f1ba33831569acd879e3357c1fd01a5c8
Author: Venky Shankar <vshankar>
Date:   Tue Jun 2 21:23:48 2015 +0530

    features/bitrot: cleanup, v1
    
        Backport of http://review.gluster.org/11147
    
    This is a short series of patches (with other cleanups) aimed at
    cleaning up some of the incorrect assumptions taken in reconfigure()
    leading to crashes when subvolumes are not fully initialized (as
    reported here[1] on gluster-devel@). Furthermore, there is some
    amount of code cleanup to handle disconnection and cleanup up data
    structure (as part of subsequent patch).
    
    [1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html
    
    Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132
    BUG: 1226830
    Signed-off-by: Venky Shankar <vshankar>
    Reviewed-on: http://review.gluster.org/11539
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>

Comment 5 Kaushal 2015-07-30 09:48:32 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user