+++ This bug was initially created as a clone of Bug #1226830 +++ Description of problem: Pausing scrubber results in scrubber process crashing at times. Version-Release number of selected component (if applicable): 3.7.0 How reproducible: Sometimes Steps to Reproduce: 1. Create & start a Gluster volume 2. Enable bitrot on the volume 3. Pause scrubber for this volume as per below: # gluster volume bitrot <vol> scrub pause Actual results: Scrubber process crashes at times Expected results: Scrubber process should be running (although it should not scrub the filesystem for the volume) BT (reported by anekkunt: http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html) (gdb) bt #0 0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0, expires=233889) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239 #1 0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, pendingcheck=_gf_true) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703 #2 0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980, options=0x7f89d3bc9558) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673 #3 0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980, new_xl=0x7f89c409b460) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084 #4 0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070 #5 0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112 #6 0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30, newgraph=0x7f89c4098130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893 #7 0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932, newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010, --- Additional comment from Venky Shankar on 2015-06-01 05:41:22 EDT --- So, the crash is due to a race between CHILD_UP (where ->timer is initialized for the subvolume) and reconfigure() which tries to access ->timer to reschedule the scrub time. --- Additional comment from Anand Avati on 2015-06-11 10:53:00 EDT --- REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#3) for review on master by Venky Shankar (vshankar) --- Additional comment from Anand Avati on 2015-06-14 23:35:07 EDT --- REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#5) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#6) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#7) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#8) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#9) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#10) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11248 (tests/bitrot: remove induced delay) posted (#1) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#11) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11263 (Revert "tests/bitrot: Induce delay before invoking bitrot subcommands") posted (#3) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#12) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11263 (Revert "tests/bitrot: Induce delay before invoking bitrot subcommands") posted (#6) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#14) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11263 (Revert "tests/bitrot: Induce delay before invoking bitrot subcommands") posted (#7) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#15) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#16) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted (#18) for review on master by Venky Shankar (vshankar)
COMMIT: http://review.gluster.org/11147 committed in master by Raghavendra Bhat (raghavendra) ------ commit 17b838ce18e0eb9dbfe9a540a3006023b19276e7 Author: Venky Shankar <vshankar> Date: Tue Jun 2 21:23:48 2015 +0530 features/bitrot: cleanup, v1 This is a short series of patches (with other cleanups) aimed at cleaning up some of the incorrect assumptions taken in reconfigure() leading to crashes when subvolumes are not fully initialized (as reported here[1] on gluster-devel@). Furthermore, there is some amount of code cleanup to handle disconnection and cleanup up data structure (as part of subsequent patch). [1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132 BUG: 1231617 Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/11147 Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra>
COMMIT: http://review.gluster.org/11263 committed in master by Raghavendra Bhat (raghavendra) ------ commit 367049879e149e2cd3ec3ba96de7f495a30de180 Author: Venky Shankar <vshankar> Date: Wed Jun 17 09:35:22 2015 +0530 Revert "tests/bitrot: Induce delay before invoking bitrot subcommands" This reverts commit a615f6c078c76791318c2a58efcc8baef18c25db. Change-Id: I8b014a99686cd4ee07da9d26bca561b420c8bec7 BUG: 1231617 Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/11263 Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user