Bug 1356851 - [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' and 'files skipped'
Summary: [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' an...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
bugs@gluster.org
URL:
Whiteboard:
Depends On: 1337450
Blocks: 1357973 1357975
TreeView+ depends on / blocked
 
Reported: 2016-07-15 07:13 UTC by Kotresh HR
Modified: 2017-03-08 09:32 UTC (History)
5 users (show)

Fixed In Version: v3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1337450
: 1357973 (view as bug list)
Environment:
Last Closed: 2017-03-08 09:32:06 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kotresh HR 2016-07-15 07:13:53 UTC
+++ This bug was initially created as a clone of Bug #1337450 +++

Description of problem:
========================

In a sharded volume, where every file is split into multiple shards, the scrubber runs and validates every file (and its shards), but instead of incrementing once for every file, it does once for every shard. The same gets reflected in the scrub status output for the fields 'files scrubbed' and 'files skipped' - which is misleading to the user as the number there is much more than the total number of files created. 


Version-Release number of selected component (if applicable):
===========================================================


How reproducible:
================= 
Always


Steps to Reproduce:
=====================

1. Have a dist-rep volume, and enable sharding.
2. Create 100 1MB files and validate the scrub status output after its run.
3. Create 5 4G files and wait for the next scrub run.
4. Validate the scrub status output after the scrubber has finished running.

Actual results:
================
'files scrubbed' and 'files skipped' show the number much more than the total number of files created.


Expected results:
=================
All the fields should be in line with the data actually created.


Additional info:
==================

[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.9-4.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.7.9-4.el7rhgs.x86_64
glusterfs-api-3.7.9-4.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-3.7.9-4.el7rhgs.x86_64
glusterfs-cli-3.7.9-4.el7rhgs.x86_64
glusterfs-server-3.7.9-4.el7rhgs.x86_64
glusterfs-fuse-3.7.9-4.el7rhgs.x86_64
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.35.85
Uuid: c9550322-c0ef-45e6-ad20-f38658a5ce54
State: Peer in Cluster (Connected)

Hostname: 10.70.35.137
Uuid: 35426000-dad1-416f-b145-f25049f5036e
State: Peer in Cluster (Connected)

Hostname: 10.70.35.13
Uuid: a756f3da-7896-4970-a77d-4829e603f773
State: Peer in Cluster (Connected)
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster v info
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: d79e220b-acde-4d13-b9d5-f37ec741c117
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.70.35.210:/bricks/brick1/ozone
Brick2: 10.70.35.85:/bricks/brick1/ozone
Brick3: 10.70.35.137:/bricks/brick1/ozone
Brick4: 10.70.35.210:/bricks/brick2/ozone
Brick5: 10.70.35.85:/bricks/brick2/ozone
Brick6: 10.70.35.137:/bricks/brick2/ozone
Brick7: 10.70.35.210:/bricks/brick3/ozone
Brick8: 10.70.35.85:/bricks/brick3/ozone
Brick9: 10.70.35.137:/bricks/brick3/ozone
Options Reconfigured:
features.shard: on
features.scrub-throttle: normal
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster  v status
Status of volume: ozone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.210:/bricks/brick1/ozone     49152     0          Y       3255 
Brick 10.70.35.85:/bricks/brick1/ozone      49152     0          Y       15549
Brick 10.70.35.137:/bricks/brick1/ozone     49152     0          Y       32158
Brick 10.70.35.210:/bricks/brick2/ozone     49153     0          Y       3261 
Brick 10.70.35.85:/bricks/brick2/ozone      49153     0          Y       15557
Brick 10.70.35.137:/bricks/brick2/ozone     49153     0          Y       32164
Brick 10.70.35.210:/bricks/brick3/ozone     49154     0          Y       3270 
Brick 10.70.35.85:/bricks/brick3/ozone      49154     0          Y       15564
Brick 10.70.35.137:/bricks/brick3/ozone     49154     0          Y       32171
NFS Server on localhost                     2049      0          Y       24614
Self-heal Daemon on localhost               N/A       N/A        Y       3248 
Bitrot Daemon on localhost                  N/A       N/A        Y       8545 
Scrubber Daemon on localhost                N/A       N/A        Y       8551 
NFS Server on 10.70.35.13                   2049      0          Y       6082 
Self-heal Daemon on 10.70.35.13             N/A       N/A        Y       21680
Bitrot Daemon on 10.70.35.13                N/A       N/A        N       N/A  
Scrubber Daemon on 10.70.35.13              N/A       N/A        N       N/A  
NFS Server on 10.70.35.85                   2049      0          Y       9515 
Self-heal Daemon on 10.70.35.85             N/A       N/A        Y       15542
Bitrot Daemon on 10.70.35.85                N/A       N/A        Y       18642
Scrubber Daemon on 10.70.35.85              N/A       N/A        Y       18648
NFS Server on 10.70.35.137                  2049      0          Y       26213
Self-heal Daemon on 10.70.35.137            N/A       N/A        Y       32153
Bitrot Daemon on 10.70.35.137               N/A       N/A        Y       2919 
Scrubber Daemon on 10.70.35.137             N/A       N/A        Y       2925 
 
Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster v bitrot ozone scrub status

Volume name : ozone

State of scrub: Active

Scrub impact: normal

Scrub frequency: hourly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node: localhost

Number of Scrubbed files: 4930

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 07:40:18

Duration of last scrub (D:M:H:M:S): 0:0:30:35

Error count: 1

Corrupted object's [GFID]:

2be8fc38-db5e-464b-b741-616377994cc8


=========================================================

Node: 10.70.35.85

Number of Scrubbed files: 5139

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 08:49:49

Duration of last scrub (D:M:H:M:S): 0:0:29:39

Error count: 1

Corrupted object's [GFID]:

ce5e7a94-cba6-4e65-a7bb-82b1ec396eef


=========================================================

Node: 10.70.35.137

Number of Scrubbed files: 5138

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 09:02:46

Duration of last scrub (D:M:H:M:S): 0:0:31:57

Error count: 0

=========================================================

[root@dhcp35-210 ~]# 


=============
CLIENT LOGS
==============

[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# cd /mnt/ozone
[root@dhcp35-30 ozone]# df -k .
Filesystem          1K-blocks     Used Available Use% Mounted on
10.70.35.137:/ozone  62553600 21098496  41455104  34% /mnt/ozone
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# ls -a
.  ..  1m_files  4g_files  .trashcan
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# ls -l 1m_files/ | wc -l
21
[root@dhcp35-30 ozone]# ls -l 4g_files/ | wc -l
6
[root@dhcp35-30 ozone]#

Comment 1 Vijay Bellur 2016-07-15 07:16:58 UTC
REVIEW: http://review.gluster.org/14927 (feature/bitrot: Fix scrub status with sharded volume) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 2 Vijay Bellur 2016-07-19 14:41:14 UTC
COMMIT: http://review.gluster.org/14927 committed in master by Vijay Bellur (vbellur) 
------
commit 1929141da34d36f537e9798e3618e0e3bdc61eb6
Author: Kotresh HR <khiremat>
Date:   Thu Jul 14 12:30:12 2016 +0530

    feature/bitrot: Fix scrub status with sharded volume
    
    Bitrot scrubs each shard entries separately. Scrub
    statistics was counting each shard entry which is
    incorrect. This patch skips the statistics count
    for sharded entries.
    
    Change-Id: I184c315a4bc7f2cccabc506eef083ee926ec26d3
    BUG: 1356851
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/14927
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jdarcy>
    Reviewed-by: Vijay Bellur <vbellur>


Note You need to log in before you can comment on or make changes to this bug.