+++ This bug was initially created as a clone of Bug #1454596 +++ Description of problem: ======================= In a 4/6 node cluster for any kind of bitrot-enabled-volume, there have been times when the command 'gluster volume bitrot <volname> scrub ondemand' was executed, but that failed to trigger the scrubber process to start scrubbing. The command 'gluster volume bitrot <volname> scrub status' which should ideally show the progress of the scrub run per node, continues to display 'Scrubber pending to complete' for every node, with its overall state 'Active (Idle)' - proving that the command 'scrub ondemand' turned out to be a no-op. Have hit this multiple times in automation and once while testing manually. The scrub logs do show that the scrub ondemand was called, and that is followed with 'No change in volfile, continuing' logs. Version-Release number of selected component (if applicable): ============================================================ mainline How reproducible: ================ Multiple times Steps to Reproduce: ================== These might not be sure-shot ways to reproduce it, but these are the general steps that have been executed whenever this has been hit. 1. Have a bitrot enabled volume with data 2. Disable bitrot. Enable bitrot 3. Trigger scrub ondemand Additional info: =================== [2017-05-23 06:10:45.513449] I [MSGID: 118038] [bit-rot-scrub.c:1085:br_fsscan_ondemand] 0-ozone-bit-rot-0: Ondemand Scrubbing scheduled to run at 2017-05-23 06:10:46 [2017-05-23 06:10:45.605562] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed [2017-05-23 06:10:46.161784] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2017-05-23 06:10:46.840056] I [MSGID: 118044] [bit-rot-scrub.c:615:br_scrubber_log_time] 0-ozone-bit-rot-0: Scrubbing started at 2017-05-23 06:10:46 [2017-05-23 06:10:48.083396] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed [2017-05-23 06:10:48.644978] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# gluster peer status Number of Peers: 3 Hostname: dhcp47-165.lab.eng.blr.redhat.com Uuid: 834d66eb-fb65-4ea3-949a-e7cb4c198f2b State: Peer in Cluster (Connected) Hostname: dhcp47-162.lab.eng.blr.redhat.com Uuid: 95491d39-d83a-4053-b1d5-682ca7290bd2 State: Peer in Cluster (Connected) Hostname: dhcp47-157.lab.eng.blr.redhat.com Uuid: d0955c85-94d0-41ba-aea8-1ffde3575ea5 State: Peer in Cluster (Connected) [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# rpm -qa | grep gluster glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64 glusterfs-libs-3.8.4-25.el7rhgs.x86_64 glusterfs-fuse-3.8.4-25.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64 glusterfs-events-3.8.4-25.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch glusterfs-rdma-3.8.4-25.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-25.el7rhgs.x86_64 glusterfs-3.8.4-25.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64 gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-cli-3.8.4-25.el7rhgs.x86_64 glusterfs-server-3.8.4-25.el7rhgs.x86_64 python-gluster-3.8.4-25.el7rhgs.noarch glusterfs-api-3.8.4-25.el7rhgs.x86_64 [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# gluster v list distrep ozone [root@dhcp47-164 ~]# gluster v status Status of volume: distrep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/bricks/brick1/distrep_0 49152 0 Y 7697 Brick 10.70.47.164:/bricks/brick1/distrep_1 49153 0 Y 2021 Brick 10.70.47.162:/bricks/brick1/distrep_2 49153 0 Y 628 Brick 10.70.47.157:/bricks/brick1/distrep_3 49153 0 Y 31735 Self-heal Daemon on localhost N/A N/A Y 2041 Bitrot Daemon on localhost N/A N/A Y 2528 Scrubber Daemon on localhost N/A N/A Y 2538 Self-heal Daemon on dhcp47-165.lab.eng.blr. redhat.com N/A N/A Y 7785 Bitrot Daemon on dhcp47-165.lab.eng.blr.red hat.com N/A N/A Y 16837 Scrubber Daemon on dhcp47-165.lab.eng.blr.r edhat.com N/A N/A Y 16901 Self-heal Daemon on dhcp47-162.lab.eng.blr. redhat.com N/A N/A Y 648 Bitrot Daemon on dhcp47-162.lab.eng.blr.red hat.com N/A N/A Y 1350 Scrubber Daemon on dhcp47-162.lab.eng.blr.r edhat.com N/A N/A Y 1360 Self-heal Daemon on dhcp47-157.lab.eng.blr. redhat.com N/A N/A Y 31762 Bitrot Daemon on dhcp47-157.lab.eng.blr.red hat.com N/A N/A Y 32487 Scrubber Daemon on dhcp47-157.lab.eng.blr.r edhat.com N/A N/A Y 32505 Task Status of Volume distrep ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: ozone Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/bricks/brick0/ozone_0 49153 0 Y 12918 Brick 10.70.47.164:/bricks/brick0/ozone_1 49152 0 Y 32008 Brick 10.70.47.162:/bricks/brick0/ozone_2 49152 0 Y 31242 Brick 10.70.47.157:/bricks/brick0/ozone_3 49152 0 Y 30037 Self-heal Daemon on localhost N/A N/A Y 2041 Bitrot Daemon on localhost N/A N/A Y 2528 Scrubber Daemon on localhost N/A N/A Y 2538 Self-heal Daemon on dhcp47-162.lab.eng.blr. redhat.com N/A N/A Y 648 Bitrot Daemon on dhcp47-162.lab.eng.blr.red hat.com N/A N/A Y 1350 Scrubber Daemon on dhcp47-162.lab.eng.blr.r edhat.com N/A N/A Y 1360 Self-heal Daemon on dhcp47-165.lab.eng.blr. redhat.com N/A N/A Y 7785 Bitrot Daemon on dhcp47-165.lab.eng.blr.red hat.com N/A N/A Y 16837 Scrubber Daemon on dhcp47-165.lab.eng.blr.r edhat.com N/A N/A Y 16901 Self-heal Daemon on dhcp47-157.lab.eng.blr. redhat.com N/A N/A Y 31762 Bitrot Daemon on dhcp47-157.lab.eng.blr.red hat.com N/A N/A Y 32487 Scrubber Daemon on dhcp47-157.lab.eng.blr.r edhat.com N/A N/A Y 32505 Task Status of Volume ozone ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# gluster v info Volume Name: distrep Type: Distributed-Replicate Volume ID: 71537fad-fa85-4dac-b534-dd6edceba4e9 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/bricks/brick1/distrep_0 Brick2: 10.70.47.164:/bricks/brick1/distrep_1 Brick3: 10.70.47.162:/bricks/brick1/distrep_2 Brick4: 10.70.47.157:/bricks/brick1/distrep_3 Options Reconfigured: features.scrub: Active features.bitrot: on transport.address-family: inet nfs.disable: on Volume Name: ozone Type: Distributed-Replicate Volume ID: aba2693d-b771-4ef5-a0df-d0a2c8f77f9e Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/bricks/brick0/ozone_0 Brick2: 10.70.47.164:/bricks/brick0/ozone_1 Brick3: 10.70.47.162:/bricks/brick0/ozone_2 Brick4: 10.70.47.157:/bricks/brick0/ozone_3 Options Reconfigured: features.scrub-throttle: aggressive features.scrub-freq: hourly storage.batch-fsync-delay-usec: 0 nfs.disable: on transport.address-family: inet server.allow-insecure: on performance.cache-samba-metadata: on performance.nl-cache: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.parallel-readdir: on features.bitrot: on features.scrub: Active [root@dhcp47-164 ~]# [root@dhcp47-164 ~]# [root@dhcp47-164 ~]#
REVIEW: https://review.gluster.org/17552 (feature/bitrot: Fix ondemand scrub) posted (#1) for review on master by Kotresh HR (khiremat)
COMMIT: https://review.gluster.org/17552 committed in master by Atin Mukherjee (amukherj) ------ commit f0fb166078d59cab2a33583591b6448326247c40 Author: Kotresh HR <khiremat> Date: Thu Jun 15 08:31:06 2017 -0400 feature/bitrot: Fix ondemand scrub The flag which keeps tracks of whether the scrub frequency is changed from previous value should not be considered for on-demand scrubbing. It should be considered only for 'scrub-frequency' where it should not be re-scheduled if it is set to same value again. But in case ondemand scrub, it should start the scrub immediately no matter what the scrub-frequency. Reproducer: 1. Enable bitrot 2. Set scrub-throttle 3. Set ondemand scrub Make sure glusterd is not restarted while doing below steps Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b BUG: 1461845 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: https://review.gluster.org/17552 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra> NetBSD-regression: NetBSD Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/