Bug 1461845 - [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to trigger scrub
Summary: [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to trigger scrub
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
bugs@gluster.org
URL:
Whiteboard:
Depends On: 1454596
Blocks: 1462080 1462127
TreeView+ depends on / blocked
 
Reported: 2017-06-15 12:42 UTC by Kotresh HR
Modified: 2017-09-05 17:34 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1454596
: 1462080 1462127 (view as bug list)
Environment:
Last Closed: 2017-09-05 17:34:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kotresh HR 2017-06-15 12:42:40 UTC
+++ This bug was initially created as a clone of Bug #1454596 +++

Description of problem:
=======================
In a 4/6 node cluster for any kind of bitrot-enabled-volume, there have been times when the command 'gluster volume bitrot <volname> scrub ondemand' was executed, but that failed to trigger the scrubber process to start scrubbing. The command 'gluster volume bitrot <volname> scrub status' which should ideally show the progress of the scrub run per node, continues to display 'Scrubber pending to complete' for every node, with its overall state 'Active (Idle)' - proving that the command 'scrub ondemand' turned out to be a no-op. Have hit this multiple times in automation and once while testing manually. The scrub logs do show that the scrub ondemand was called, and that is followed with 'No change in volfile, continuing' logs. 

Version-Release number of selected component (if applicable):
============================================================
mainline


How reproducible:
================
Multiple times


Steps to Reproduce:
==================
These might not be sure-shot ways to reproduce it, but these are the general steps that have been executed whenever this has been hit.
1. Have a bitrot enabled volume with data
2. Disable bitrot. Enable bitrot
3. Trigger scrub ondemand


Additional info:
===================

[2017-05-23 06:10:45.513449] I [MSGID: 118038] [bit-rot-scrub.c:1085:br_fsscan_ondemand] 0-ozone-bit-rot-0: Ondemand Scrubbing scheduled to run at 2017-05-23 06:10:46
[2017-05-23 06:10:45.605562] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2017-05-23 06:10:46.161784] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2017-05-23 06:10:46.840056] I [MSGID: 118044] [bit-rot-scrub.c:615:br_scrubber_log_time] 0-ozone-bit-rot-0: Scrubbing started at 2017-05-23 06:10:46
[2017-05-23 06:10:48.083396] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2017-05-23 06:10:48.644978] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# gluster peer status
Number of Peers: 3

Hostname: dhcp47-165.lab.eng.blr.redhat.com
Uuid: 834d66eb-fb65-4ea3-949a-e7cb4c198f2b
State: Peer in Cluster (Connected)

Hostname: dhcp47-162.lab.eng.blr.redhat.com
Uuid: 95491d39-d83a-4053-b1d5-682ca7290bd2
State: Peer in Cluster (Connected)

Hostname: dhcp47-157.lab.eng.blr.redhat.com
Uuid: d0955c85-94d0-41ba-aea8-1ffde3575ea5
State: Peer in Cluster (Connected)
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64
glusterfs-libs-3.8.4-25.el7rhgs.x86_64
glusterfs-fuse-3.8.4-25.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64
glusterfs-events-3.8.4-25.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-rdma-3.8.4-25.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-25.el7rhgs.x86_64
glusterfs-3.8.4-25.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-25.el7rhgs.x86_64
glusterfs-server-3.8.4-25.el7rhgs.x86_64
python-gluster-3.8.4-25.el7rhgs.noarch
glusterfs-api-3.8.4-25.el7rhgs.x86_64
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# gluster v list
distrep
ozone
[root@dhcp47-164 ~]# gluster v status
Status of volume: distrep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick1/distrep_0 49152     0          Y       7697 
Brick 10.70.47.164:/bricks/brick1/distrep_1 49153     0          Y       2021 
Brick 10.70.47.162:/bricks/brick1/distrep_2 49153     0          Y       628  
Brick 10.70.47.157:/bricks/brick1/distrep_3 49153     0          Y       31735
Self-heal Daemon on localhost               N/A       N/A        Y       2041 
Bitrot Daemon on localhost                  N/A       N/A        Y       2528 
Scrubber Daemon on localhost                N/A       N/A        Y       2538 
Self-heal Daemon on dhcp47-165.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       7785 
Bitrot Daemon on dhcp47-165.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       16837
Scrubber Daemon on dhcp47-165.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       16901
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       648  
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       1350 
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       1360 
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       31762
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       32487
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       32505
 
Task Status of Volume distrep
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: ozone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick0/ozone_0   49153     0          Y       12918
Brick 10.70.47.164:/bricks/brick0/ozone_1   49152     0          Y       32008
Brick 10.70.47.162:/bricks/brick0/ozone_2   49152     0          Y       31242
Brick 10.70.47.157:/bricks/brick0/ozone_3   49152     0          Y       30037
Self-heal Daemon on localhost               N/A       N/A        Y       2041 
Bitrot Daemon on localhost                  N/A       N/A        Y       2528 
Scrubber Daemon on localhost                N/A       N/A        Y       2538 
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       648  
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       1350 
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       1360 
Self-heal Daemon on dhcp47-165.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       7785 
Bitrot Daemon on dhcp47-165.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       16837
Scrubber Daemon on dhcp47-165.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       16901
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       31762
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       32487
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       32505
 
Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# gluster v info
 
Volume Name: distrep
Type: Distributed-Replicate
Volume ID: 71537fad-fa85-4dac-b534-dd6edceba4e9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick1/distrep_0
Brick2: 10.70.47.164:/bricks/brick1/distrep_1
Brick3: 10.70.47.162:/bricks/brick1/distrep_2
Brick4: 10.70.47.157:/bricks/brick1/distrep_3
Options Reconfigured:
features.scrub: Active
features.bitrot: on
transport.address-family: inet
nfs.disable: on
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: aba2693d-b771-4ef5-a0df-d0a2c8f77f9e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick0/ozone_0
Brick2: 10.70.47.164:/bricks/brick0/ozone_1
Brick3: 10.70.47.162:/bricks/brick0/ozone_2
Brick4: 10.70.47.157:/bricks/brick0/ozone_3
Options Reconfigured:
features.scrub-throttle: aggressive
features.scrub-freq: hourly
storage.batch-fsync-delay-usec: 0
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
performance.cache-samba-metadata: on
performance.nl-cache: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.parallel-readdir: on
features.bitrot: on
features.scrub: Active
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]# 
[root@dhcp47-164 ~]#

Comment 1 Worker Ant 2017-06-15 12:45:36 UTC
REVIEW: https://review.gluster.org/17552 (feature/bitrot: Fix ondemand scrub) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 2 Worker Ant 2017-06-16 06:01:53 UTC
COMMIT: https://review.gluster.org/17552 committed in master by Atin Mukherjee (amukherj) 
------
commit f0fb166078d59cab2a33583591b6448326247c40
Author: Kotresh HR <khiremat>
Date:   Thu Jun 15 08:31:06 2017 -0400

    feature/bitrot: Fix ondemand scrub
    
    The flag which keeps tracks of whether the scrub
    frequency is changed from previous value should
    not be considered for on-demand scrubbing. It
    should be considered only for 'scrub-frequency'
    where it should not be re-scheduled if it is
    set to same value again. But in case ondemand
    scrub, it should start the scrub immediately
    no matter what the scrub-frequency.
    
    Reproducer:
    1. Enable bitrot
    2. Set scrub-throttle
    3. Set ondemand scrub
    Make sure glusterd is not restarted while doing
    below steps
    
    Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b
    BUG: 1461845
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: https://review.gluster.org/17552
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra Bhat <raghavendra>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 3 Shyamsundar 2017-09-05 17:34:18 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.