Description of problem: Pausing scrubber results in scrubber process crashing at times. Version-Release number of selected component (if applicable): 3.7.0 How reproducible: Sometimes Steps to Reproduce: 1. Create & start a Gluster volume 2. Enable bitrot on the volume 3. Pause scrubber for this volume as per below: # gluster volume bitrot <vol> scrub pause Actual results: Scrubber process crashes at times Expected results: Scrubber process should be running (although it should not scrub the filesystem for the volume) BT (reported by anekkunt: http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html) (gdb) bt #0 0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0, expires=233889) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239 #1 0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, pendingcheck=_gf_true) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703 #2 0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980, options=0x7f89d3bc9558) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673 #3 0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980, new_xl=0x7f89c409b460) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084 #4 0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070 #5 0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112 #6 0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30, newgraph=0x7f89c4098130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893 #7 0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932, newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,
So, the crash is due to a race between CHILD_UP (where ->timer is initialized for the subvolume) and reconfigure() which tries to access ->timer to reschedule the scrub time.
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#3) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#5) for review on master by Venky Shankar (vshankar)
COMMIT: http://review.gluster.org/11539 committed in release-3.7 by Raghavendra Bhat (raghavendra) ------ commit 291aba1f1ba33831569acd879e3357c1fd01a5c8 Author: Venky Shankar <vshankar> Date: Tue Jun 2 21:23:48 2015 +0530 features/bitrot: cleanup, v1 Backport of http://review.gluster.org/11147 This is a short series of patches (with other cleanups) aimed at cleaning up some of the incorrect assumptions taken in reconfigure() leading to crashes when subvolumes are not fully initialized (as reported here[1] on gluster-devel@). Furthermore, there is some amount of code cleanup to handle disconnection and cleanup up data structure (as part of subsequent patch). [1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132 BUG: 1226830 Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/11539 Reviewed-by: Raghavendra Bhat <raghavendra> Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report. glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user