+++ This bug was initially created as a clone of Bug #1461845 +++ +++ This bug was initially created as a clone of Bug #1454596 +++ Description of problem: ======================= In a 4/6 node cluster for any kind of bitrot-enabled-volume, there have been times when the command 'gluster volume bitrot <volname> scrub ondemand' was executed, but that failed to trigger the scrubber process to start scrubbing. The command 'gluster volume bitrot <volname> scrub status' which should ideally show the progress of the scrub run per node, continues to display 'Scrubber pending to complete' for every node, with its overall state 'Active (Idle)' - proving that the command 'scrub ondemand' turned out to be a no-op. Have hit this multiple times in automation and once while testing manually. The scrub logs do show that the scrub ondemand was called, and that is followed with 'No change in volfile, continuing' logs. Version-Release number of selected component (if applicable): ============================================================ mainline How reproducible: ================ Multiple times Steps to Reproduce: ================== These might not be sure-shot ways to reproduce it, but these are the general steps that have been executed whenever this has been hit. 1. Have a bitrot enabled volume with data 2. Disable bitrot. Enable bitrot 3. Trigger scrub ondemand Additional info: =================== [2017-05-23 06:10:45.513449] I [MSGID: 118038] [bit-rot-scrub.c:1085:br_fsscan_ondemand] 0-ozone-bit-rot-0: Ondemand Scrubbing scheduled to run at 2017-05-23 06:10:46 [2017-05-23 06:10:45.605562] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed [2017-05-23 06:10:46.161784] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2017-05-23 06:10:46.840056] I [MSGID: 118044] [bit-rot-scrub.c:615:br_scrubber_log_time] 0-ozone-bit-rot-0: Scrubbing started at 2017-05-23 06:10:46 [2017-05-23 06:10:48.083396] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed [2017-05-23 06:10:48.644978] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# gluster peer status Number of Peers: 3 Hostname: dhcp47-165.lab.eng.blr.redhat.com Uuid: 834d66eb-fb65-4ea3-949a-e7cb4c198f2b State: Peer in Cluster (Connected) Hostname: dhcp47-162.lab.eng.blr.redhat.com Uuid: 95491d39-d83a-4053-b1d5-682ca7290bd2 State: Peer in Cluster (Connected) Hostname: dhcp47-157.lab.eng.blr.redhat.com Uuid: d0955c85-94d0-41ba-aea8-1ffde3575ea5 State: Peer in Cluster (Connected) [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# rpm -qa | grep gluster glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64 glusterfs-libs-3.8.4-25.el7rhgs.x86_64 glusterfs-fuse-3.8.4-25.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64 glusterfs-events-3.8.4-25.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch glusterfs-rdma-3.8.4-25.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-25.el7rhgs.x86_64 glusterfs-3.8.4-25.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64 gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-cli-3.8.4-25.el7rhgs.x86_64 glusterfs-server-3.8.4-25.el7rhgs.x86_64 python-gluster-3.8.4-25.el7rhgs.noarch glusterfs-api-3.8.4-25.el7rhgs.x86_64 [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# gluster v list distrep ozone [root@dhcp47-164 ~]# gluster v status Status of volume: distrep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/bricks/brick1/distrep_0 49152 0 Y 7697 Brick 10.70.47.164:/bricks/brick1/distrep_1 49153 0 Y 2021 Brick 10.70.47.162:/bricks/brick1/distrep_2 49153 0 Y 628 Brick 10.70.47.157:/bricks/brick1/distrep_3 49153 0 Y 31735 Self-heal Daemon on localhost N/A N/A Y 2041 Bitrot Daemon on localhost N/A N/A Y 2528 Scrubber Daemon on localhost N/A N/A Y 2538 Self-heal Daemon on dhcp47-165.lab.eng.blr. redhat.com N/A N/A Y 7785 Bitrot Daemon on dhcp47-165.lab.eng.blr.red hat.com N/A N/A Y 16837 Scrubber Daemon on dhcp47-165.lab.eng.blr.r edhat.com N/A N/A Y 16901 Self-heal Daemon on dhcp47-162.lab.eng.blr. redhat.com N/A N/A Y 648 Bitrot Daemon on dhcp47-162.lab.eng.blr.red hat.com N/A N/A Y 1350 Scrubber Daemon on dhcp47-162.lab.eng.blr.r edhat.com N/A N/A Y 1360 Self-heal Daemon on dhcp47-157.lab.eng.blr. redhat.com N/A N/A Y 31762 Bitrot Daemon on dhcp47-157.lab.eng.blr.red hat.com N/A N/A Y 32487 Scrubber Daemon on dhcp47-157.lab.eng.blr.r edhat.com N/A N/A Y 32505 Task Status of Volume distrep ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: ozone Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/bricks/brick0/ozone_0 49153 0 Y 12918 Brick 10.70.47.164:/bricks/brick0/ozone_1 49152 0 Y 32008 Brick 10.70.47.162:/bricks/brick0/ozone_2 49152 0 Y 31242 Brick 10.70.47.157:/bricks/brick0/ozone_3 49152 0 Y 30037 Self-heal Daemon on localhost N/A N/A Y 2041 Bitrot Daemon on localhost N/A N/A Y 2528 Scrubber Daemon on localhost N/A N/A Y 2538 Self-heal Daemon on dhcp47-162.lab.eng.blr. redhat.com N/A N/A Y 648 Bitrot Daemon on dhcp47-162.lab.eng.blr.red hat.com N/A N/A Y 1350 Scrubber Daemon on dhcp47-162.lab.eng.blr.r edhat.com N/A N/A Y 1360 Self-heal Daemon on dhcp47-165.lab.eng.blr. redhat.com N/A N/A Y 7785 Bitrot Daemon on dhcp47-165.lab.eng.blr.red hat.com N/A N/A Y 16837 Scrubber Daemon on dhcp47-165.lab.eng.blr.r edhat.com N/A N/A Y 16901 Self-heal Daemon on dhcp47-157.lab.eng.blr. redhat.com N/A N/A Y 31762 Bitrot Daemon on dhcp47-157.lab.eng.blr.red hat.com N/A N/A Y 32487 Scrubber Daemon on dhcp47-157.lab.eng.blr.r edhat.com N/A N/A Y 32505 Task Status of Volume ozone ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# gluster v info Volume Name: distrep Type: Distributed-Replicate Volume ID: 71537fad-fa85-4dac-b534-dd6edceba4e9 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/bricks/brick1/distrep_0 Brick2: 10.70.47.164:/bricks/brick1/distrep_1 Brick3: 10.70.47.162:/bricks/brick1/distrep_2 Brick4: 10.70.47.157:/bricks/brick1/distrep_3 Options Reconfigured: features.scrub: Active features.bitrot: on transport.address-family: inet nfs.disable: on Volume Name: ozone Type: Distributed-Replicate Volume ID: aba2693d-b771-4ef5-a0df-d0a2c8f77f9e Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/bricks/brick0/ozone_0 Brick2: 10.70.47.164:/bricks/brick0/ozone_1 Brick3: 10.70.47.162:/bricks/brick0/ozone_2 Brick4: 10.70.47.157:/bricks/brick0/ozone_3 Options Reconfigured: features.scrub-throttle: aggressive features.scrub-freq: hourly storage.batch-fsync-delay-usec: 0 nfs.disable: on transport.address-family: inet server.allow-insecure: on performance.cache-samba-metadata: on performance.nl-cache: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.parallel-readdir: on features.bitrot: on features.scrub: Active [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# --- Additional comment from Worker Ant on 2017-06-15 08:45:36 EDT --- REVIEW: https://review.gluster.org/17552 (feature/bitrot: Fix ondemand scrub) posted (#1) for review on master by Kotresh HR (khiremat) --- Additional comment from Worker Ant on 2017-06-16 02:01:53 EDT --- COMMIT: https://review.gluster.org/17552 committed in master by Atin Mukherjee (amukherj) ------ commit f0fb166078d59cab2a33583591b6448326247c40 Author: Kotresh HR <khiremat> Date: Thu Jun 15 08:31:06 2017 -0400 feature/bitrot: Fix ondemand scrub The flag which keeps tracks of whether the scrub frequency is changed from previous value should not be considered for on-demand scrubbing. It should be considered only for 'scrub-frequency' where it should not be re-scheduled if it is set to same value again. But in case ondemand scrub, it should start the scrub immediately no matter what the scrub-frequency. Reproducer: 1. Enable bitrot 2. Set scrub-throttle 3. Set ondemand scrub Make sure glusterd is not restarted while doing below steps Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b BUG: 1461845 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: https://review.gluster.org/17552 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra> NetBSD-regression: NetBSD Build System <jenkins.org>
REVIEW: https://review.gluster.org/17553 (feature/bitrot: Fix ondemand scrub) posted (#1) for review on release-3.10 by Kotresh HR (khiremat)
COMMIT: https://review.gluster.org/17553 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 8f8850ae8c7fd64db01ec19bfeb6ef4bd1911bd8 Author: Kotresh HR <khiremat> Date: Thu Jun 15 08:31:06 2017 -0400 feature/bitrot: Fix ondemand scrub The flag which keeps tracks of whether the scrub frequency is changed from previous value should not be considered for on-demand scrubbing. It should be considered only for 'scrub-frequency' where it should not be re-scheduled if it is set to same value again. But in case ondemand scrub, it should start the scrub immediately no matter what the scrub-frequency. Reproducer: 1. Enable bitrot 2. Set scrub-throttle 3. Set ondemand scrub Make sure glusterd is not restarted while doing below steps > Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b > BUG: 1461845 > Signed-off-by: Kotresh HR <khiremat> > Reviewed-on: https://review.gluster.org/17552 > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Raghavendra Bhat <raghavendra> > NetBSD-regression: NetBSD Build System <jenkins.org> (cherry picked from commit f0fb166078d59cab2a33583591b6448326247c40) Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b BUG: 1462080 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: https://review.gluster.org/17553 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Raghavendra Talur <rtalur>
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.