+++ This bug was initially created as a clone of Bug #1357973 +++ +++ This bug was initially created as a clone of Bug #1356851 +++ +++ This bug was initially created as a clone of Bug #1337450 +++ Description of problem: ======================== In a sharded volume, where every file is split into multiple shards, the scrubber runs and validates every file (and its shards), but instead of incrementing once for every file, it does once for every shard. The same gets reflected in the scrub status output for the fields 'files scrubbed' and 'files skipped' - which is misleading to the user as the number there is much more than the total number of files created. Version-Release number of selected component (if applicable): =========================================================== How reproducible: ================= Always Steps to Reproduce: ===================== 1. Have a dist-rep volume, and enable sharding. 2. Create 100 1MB files and validate the scrub status output after its run. 3. Create 5 4G files and wait for the next scrub run. 4. Validate the scrub status output after the scrubber has finished running. Actual results: ================ 'files scrubbed' and 'files skipped' show the number much more than the total number of files created. Expected results: ================= All the fields should be in line with the data actually created. Additional info: ================== [root@dhcp35-210 ~]# [root@dhcp35-210 ~]# rpm -qa | grep gluster glusterfs-client-xlators-3.7.9-4.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-libs-3.7.9-4.el7rhgs.x86_64 glusterfs-api-3.7.9-4.el7rhgs.x86_64 gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64 python-gluster-3.7.5-19.el7rhgs.noarch glusterfs-3.7.9-4.el7rhgs.x86_64 glusterfs-cli-3.7.9-4.el7rhgs.x86_64 glusterfs-server-3.7.9-4.el7rhgs.x86_64 glusterfs-fuse-3.7.9-4.el7rhgs.x86_64 [root@dhcp35-210 ~]# [root@dhcp35-210 ~]# [root@dhcp35-210 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.35.85 Uuid: c9550322-c0ef-45e6-ad20-f38658a5ce54 State: Peer in Cluster (Connected) Hostname: 10.70.35.137 Uuid: 35426000-dad1-416f-b145-f25049f5036e State: Peer in Cluster (Connected) Hostname: 10.70.35.13 Uuid: a756f3da-7896-4970-a77d-4829e603f773 State: Peer in Cluster (Connected) [root@dhcp35-210 ~]# [root@dhcp35-210 ~]# gluster v info Volume Name: ozone Type: Distributed-Replicate Volume ID: d79e220b-acde-4d13-b9d5-f37ec741c117 Status: Started Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: 10.70.35.210:/bricks/brick1/ozone Brick2: 10.70.35.85:/bricks/brick1/ozone Brick3: 10.70.35.137:/bricks/brick1/ozone Brick4: 10.70.35.210:/bricks/brick2/ozone Brick5: 10.70.35.85:/bricks/brick2/ozone Brick6: 10.70.35.137:/bricks/brick2/ozone Brick7: 10.70.35.210:/bricks/brick3/ozone Brick8: 10.70.35.85:/bricks/brick3/ozone Brick9: 10.70.35.137:/bricks/brick3/ozone Options Reconfigured: features.shard: on features.scrub-throttle: normal features.scrub-freq: hourly features.scrub: Active features.bitrot: on performance.readdir-ahead: on [root@dhcp35-210 ~]# [root@dhcp35-210 ~]# gluster v status Status of volume: ozone Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.210:/bricks/brick1/ozone 49152 0 Y 3255 Brick 10.70.35.85:/bricks/brick1/ozone 49152 0 Y 15549 Brick 10.70.35.137:/bricks/brick1/ozone 49152 0 Y 32158 Brick 10.70.35.210:/bricks/brick2/ozone 49153 0 Y 3261 Brick 10.70.35.85:/bricks/brick2/ozone 49153 0 Y 15557 Brick 10.70.35.137:/bricks/brick2/ozone 49153 0 Y 32164 Brick 10.70.35.210:/bricks/brick3/ozone 49154 0 Y 3270 Brick 10.70.35.85:/bricks/brick3/ozone 49154 0 Y 15564 Brick 10.70.35.137:/bricks/brick3/ozone 49154 0 Y 32171 NFS Server on localhost 2049 0 Y 24614 Self-heal Daemon on localhost N/A N/A Y 3248 Bitrot Daemon on localhost N/A N/A Y 8545 Scrubber Daemon on localhost N/A N/A Y 8551 NFS Server on 10.70.35.13 2049 0 Y 6082 Self-heal Daemon on 10.70.35.13 N/A N/A Y 21680 Bitrot Daemon on 10.70.35.13 N/A N/A N N/A Scrubber Daemon on 10.70.35.13 N/A N/A N N/A NFS Server on 10.70.35.85 2049 0 Y 9515 Self-heal Daemon on 10.70.35.85 N/A N/A Y 15542 Bitrot Daemon on 10.70.35.85 N/A N/A Y 18642 Scrubber Daemon on 10.70.35.85 N/A N/A Y 18648 NFS Server on 10.70.35.137 2049 0 Y 26213 Self-heal Daemon on 10.70.35.137 N/A N/A Y 32153 Bitrot Daemon on 10.70.35.137 N/A N/A Y 2919 Scrubber Daemon on 10.70.35.137 N/A N/A Y 2925 Task Status of Volume ozone ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-210 ~]# [root@dhcp35-210 ~]# gluster v bitrot ozone scrub status Volume name : ozone State of scrub: Active Scrub impact: normal Scrub frequency: hourly Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node: localhost Number of Scrubbed files: 4930 Number of Skipped files: 0 Last completed scrub time: 2016-05-19 07:40:18 Duration of last scrub (D:M:H:M:S): 0:0:30:35 Error count: 1 Corrupted object's [GFID]: 2be8fc38-db5e-464b-b741-616377994cc8 ========================================================= Node: 10.70.35.85 Number of Scrubbed files: 5139 Number of Skipped files: 0 Last completed scrub time: 2016-05-19 08:49:49 Duration of last scrub (D:M:H:M:S): 0:0:29:39 Error count: 1 Corrupted object's [GFID]: ce5e7a94-cba6-4e65-a7bb-82b1ec396eef ========================================================= Node: 10.70.35.137 Number of Scrubbed files: 5138 Number of Skipped files: 0 Last completed scrub time: 2016-05-19 09:02:46 Duration of last scrub (D:M:H:M:S): 0:0:31:57 Error count: 0 ========================================================= [root@dhcp35-210 ~]# ============= CLIENT LOGS ============== [root@dhcp35-30 ~]# [root@dhcp35-30 ~]# cd /mnt/ozone [root@dhcp35-30 ozone]# df -k . Filesystem 1K-blocks Used Available Use% Mounted on 10.70.35.137:/ozone 62553600 21098496 41455104 34% /mnt/ozone [root@dhcp35-30 ozone]# [root@dhcp35-30 ozone]# [root@dhcp35-30 ozone]# ls -a . .. 1m_files 4g_files .trashcan [root@dhcp35-30 ozone]# [root@dhcp35-30 ozone]# [root@dhcp35-30 ozone]# ls -l 1m_files/ | wc -l 21 [root@dhcp35-30 ozone]# ls -l 4g_files/ | wc -l 6 [root@dhcp35-30 ozone]#
REVIEW: http://review.gluster.org/14959 (feature/bitrot: Fix scrub status with sharded volume) posted (#1) for review on release-3.8 by Kotresh HR (khiremat)
COMMIT: http://review.gluster.org/14959 committed in release-3.8 by Jeff Darcy (jdarcy) ------ commit f733aa3e62aa0fadbb91b34ecaf639d3e3a4338c Author: Kotresh HR <khiremat> Date: Thu Jul 14 12:30:12 2016 +0530 feature/bitrot: Fix scrub status with sharded volume Backport of: http://review.gluster.org/14927 Bitrot scrubs each shard entries separately. Scrub statistics was counting each shard entry which is incorrect. This patch skips the statistics count for sharded entries. Change-Id: I184c315a4bc7f2cccabc506eef083ee926ec26d3 BUG: 1357975 Signed-off-by: Kotresh HR <khiremat> (cherry picked from commit 1929141da34d36f537e9798e3618e0e3bdc61eb6) Reviewed-on: http://review.gluster.org/14959 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jdarcy>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.2, please open a new bug report. glusterfs-3.8.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/announce/2016-August/000058.html [2] https://www.gluster.org/pipermail/gluster-users/