1357975 – [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' and 'files skipped'

Bug 1357975 - [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' and 'files skipped'

Summary: [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' an...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	bitrot
Sub Component:
Version:	3.8.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kotresh HR
QA Contact:
Docs Contact:	bugs@gluster.org
URL:
Whiteboard:
Depends On:	1337450 1356851 1357973
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-19 17:41 UTC by Kotresh HR
Modified:	2016-09-20 05:04 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.2
Clone Of:	1357973
Environment:
Last Closed:	2016-08-12 09:47:30 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kotresh HR 2016-07-19 17:41:13 UTC

+++ This bug was initially created as a clone of Bug #1357973 +++

+++ This bug was initially created as a clone of Bug #1356851 +++

+++ This bug was initially created as a clone of Bug #1337450 +++

Description of problem:
========================

In a sharded volume, where every file is split into multiple shards, the scrubber runs and validates every file (and its shards), but instead of incrementing once for every file, it does once for every shard. The same gets reflected in the scrub status output for the fields 'files scrubbed' and 'files skipped' - which is misleading to the user as the number there is much more than the total number of files created. 


Version-Release number of selected component (if applicable):
===========================================================


How reproducible:
================= 
Always


Steps to Reproduce:
=====================

1. Have a dist-rep volume, and enable sharding.
2. Create 100 1MB files and validate the scrub status output after its run.
3. Create 5 4G files and wait for the next scrub run.
4. Validate the scrub status output after the scrubber has finished running.

Actual results:
================
'files scrubbed' and 'files skipped' show the number much more than the total number of files created.


Expected results:
=================
All the fields should be in line with the data actually created.


Additional info:
==================

[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.9-4.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.7.9-4.el7rhgs.x86_64
glusterfs-api-3.7.9-4.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-3.7.9-4.el7rhgs.x86_64
glusterfs-cli-3.7.9-4.el7rhgs.x86_64
glusterfs-server-3.7.9-4.el7rhgs.x86_64
glusterfs-fuse-3.7.9-4.el7rhgs.x86_64
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.35.85
Uuid: c9550322-c0ef-45e6-ad20-f38658a5ce54
State: Peer in Cluster (Connected)

Hostname: 10.70.35.137
Uuid: 35426000-dad1-416f-b145-f25049f5036e
State: Peer in Cluster (Connected)

Hostname: 10.70.35.13
Uuid: a756f3da-7896-4970-a77d-4829e603f773
State: Peer in Cluster (Connected)
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster v info
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: d79e220b-acde-4d13-b9d5-f37ec741c117
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.70.35.210:/bricks/brick1/ozone
Brick2: 10.70.35.85:/bricks/brick1/ozone
Brick3: 10.70.35.137:/bricks/brick1/ozone
Brick4: 10.70.35.210:/bricks/brick2/ozone
Brick5: 10.70.35.85:/bricks/brick2/ozone
Brick6: 10.70.35.137:/bricks/brick2/ozone
Brick7: 10.70.35.210:/bricks/brick3/ozone
Brick8: 10.70.35.85:/bricks/brick3/ozone
Brick9: 10.70.35.137:/bricks/brick3/ozone
Options Reconfigured:
features.shard: on
features.scrub-throttle: normal
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster  v status
Status of volume: ozone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.210:/bricks/brick1/ozone     49152     0          Y       3255 
Brick 10.70.35.85:/bricks/brick1/ozone      49152     0          Y       15549
Brick 10.70.35.137:/bricks/brick1/ozone     49152     0          Y       32158
Brick 10.70.35.210:/bricks/brick2/ozone     49153     0          Y       3261 
Brick 10.70.35.85:/bricks/brick2/ozone      49153     0          Y       15557
Brick 10.70.35.137:/bricks/brick2/ozone     49153     0          Y       32164
Brick 10.70.35.210:/bricks/brick3/ozone     49154     0          Y       3270 
Brick 10.70.35.85:/bricks/brick3/ozone      49154     0          Y       15564
Brick 10.70.35.137:/bricks/brick3/ozone     49154     0          Y       32171
NFS Server on localhost                     2049      0          Y       24614
Self-heal Daemon on localhost               N/A       N/A        Y       3248 
Bitrot Daemon on localhost                  N/A       N/A        Y       8545 
Scrubber Daemon on localhost                N/A       N/A        Y       8551 
NFS Server on 10.70.35.13                   2049      0          Y       6082 
Self-heal Daemon on 10.70.35.13             N/A       N/A        Y       21680
Bitrot Daemon on 10.70.35.13                N/A       N/A        N       N/A  
Scrubber Daemon on 10.70.35.13              N/A       N/A        N       N/A  
NFS Server on 10.70.35.85                   2049      0          Y       9515 
Self-heal Daemon on 10.70.35.85             N/A       N/A        Y       15542
Bitrot Daemon on 10.70.35.85                N/A       N/A        Y       18642
Scrubber Daemon on 10.70.35.85              N/A       N/A        Y       18648
NFS Server on 10.70.35.137                  2049      0          Y       26213
Self-heal Daemon on 10.70.35.137            N/A       N/A        Y       32153
Bitrot Daemon on 10.70.35.137               N/A       N/A        Y       2919 
Scrubber Daemon on 10.70.35.137             N/A       N/A        Y       2925 
 
Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-210 ~]# 
[root@dhcp35-210 ~]# gluster v bitrot ozone scrub status

Volume name : ozone

State of scrub: Active

Scrub impact: normal

Scrub frequency: hourly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node: localhost

Number of Scrubbed files: 4930

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 07:40:18

Duration of last scrub (D:M:H:M:S): 0:0:30:35

Error count: 1

Corrupted object's [GFID]:

2be8fc38-db5e-464b-b741-616377994cc8


=========================================================

Node: 10.70.35.85

Number of Scrubbed files: 5139

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 08:49:49

Duration of last scrub (D:M:H:M:S): 0:0:29:39

Error count: 1

Corrupted object's [GFID]:

ce5e7a94-cba6-4e65-a7bb-82b1ec396eef


=========================================================

Node: 10.70.35.137

Number of Scrubbed files: 5138

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 09:02:46

Duration of last scrub (D:M:H:M:S): 0:0:31:57

Error count: 0

=========================================================

[root@dhcp35-210 ~]# 


=============
CLIENT LOGS
==============

[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# cd /mnt/ozone
[root@dhcp35-30 ozone]# df -k .
Filesystem          1K-blocks     Used Available Use% Mounted on
10.70.35.137:/ozone  62553600 21098496  41455104  34% /mnt/ozone
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# ls -a
.  ..  1m_files  4g_files  .trashcan
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# 
[root@dhcp35-30 ozone]# ls -l 1m_files/ | wc -l
21
[root@dhcp35-30 ozone]# ls -l 4g_files/ | wc -l
6
[root@dhcp35-30 ozone]#

Comment 1 Vijay Bellur 2016-07-19 17:42:44 UTC

REVIEW: http://review.gluster.org/14959 (feature/bitrot: Fix scrub status with sharded volume) posted (#1) for review on release-3.8 by Kotresh HR (khiremat)

Comment 2 Vijay Bellur 2016-07-20 15:08:52 UTC

COMMIT: http://review.gluster.org/14959 committed in release-3.8 by Jeff Darcy (jdarcy) 
------
commit f733aa3e62aa0fadbb91b34ecaf639d3e3a4338c
Author: Kotresh HR <khiremat>
Date:   Thu Jul 14 12:30:12 2016 +0530

    feature/bitrot: Fix scrub status with sharded volume
    
    Backport of: http://review.gluster.org/14927
    
    Bitrot scrubs each shard entries separately. Scrub
    statistics was counting each shard entry which is
    incorrect. This patch skips the statistics count
    for sharded entries.
    
    Change-Id: I184c315a4bc7f2cccabc506eef083ee926ec26d3
    BUG: 1357975
    Signed-off-by: Kotresh HR <khiremat>
    (cherry picked from commit 1929141da34d36f537e9798e3618e0e3bdc61eb6)
    Reviewed-on: http://review.gluster.org/14959
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 3 Niels de Vos 2016-08-12 09:47:30 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.2, please open a new bug report.

glusterfs-3.8.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/announce/2016-August/000058.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.