1337444 – [Bitrot]: Scrub status- Certain fields continue to show previous run's details, even if the current run is in progress

Bug 1337444 - [Bitrot]: Scrub status- Certain fields continue to show previous run's details, even if the current run is in progress

Summary: [Bitrot]: Scrub status- Certain fields continue to show previous run's detail...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	bitrot
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Kotresh HR
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351522 1352871 1355635 1355639
TreeView+	depends on / blocked

Reported:	2016-05-19 08:43 UTC by Sweta Anandpara
Modified:	2017-03-23 05:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1352871 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:31:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Sweta Anandpara 2016-05-19 08:43:51 UTC

Description of problem:
========================

In scenarios where scrubber would take more time to finish scrubbing files, a 'scrub status' output when scrubbing is in progress, displays the previous run's information for 'last completed scrub time' and 'duration of last scrub'. This provides an incorrect view to the user, giving the impression that scrubbing has completed. We should ideally have the field 'State of scrub' set to 'In progress' and the above mentioned fields set to '-' . The other two fields (files scrubbed, files skipped) correctly show the present run's details.


Version-Release number of selected component (if applicable):
=============================================================
3.7.9-3

How reproducible:
================
Always


Steps to Reproduce:
====================

1. Have a 4node cluster, with a dist-rep volume and sharding enabled. Set the scrub frequency to 'hourly'
2. Create 100 1MB files and wait for the scrubber to finish its run.
3. View the scrub status output for the validity of the fields that it shows.
4. Create 5 4GB files and wait for the next run of scrubbing to start. 
5. When the scrubbing is in progress (as seen from scrub.log), issue a 'gluster volume bitrot <volname> scrub status'

Actual results:
===============
Scrub status shows 4 fields with respect to every node. 2 fields are updated as per the current run, and 2 as per the previous run.


Expected results:
=================
Either all the 4 fields must reflect the current run, OR all 4 fields must reflect the previous run. Preferable would be to let the user know that scrubbing is in progress, and update the fields accordingly.

Comment 2 Kotresh HR 2016-07-05 10:48:32 UTC

The two fields 'last completed scrub time' and 'duration of last scrub' is actually related to last scrub run as the name it self indicates. In order to 
avoid confusion, the following patch shows whether the scrub is actually is "In progress" or "Idle"

http://review.gluster.org/14864 (upstream)

Comment 3 Kotresh HR 2016-07-12 09:19:03 UTC

Release branch patches:

http://review.gluster.org/#/c/14900/ (release-3.7)

http://review.gluster.org/#/c/14901/ (release-3.8)

Comment 6 Atin Mukherjee 2016-09-17 14:30:11 UTC

Upstream mainline : http://review.gluster.org/14864
Upstream 3.8 : http://review.gluster.org/14901

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 9 Sweta Anandpara 2016-10-03 11:10:42 UTC

Tested and verified this in the build 3.8.4-2. While the scrubber is in the middle of a run, it appropriately shows the 'State of scrub' as ACTIVE, but 'in progress'.

The other fields of 'last completed scrubbed time' and 'duration of last scrub' are left as is.. and it continues to show the previous run's details, as expected.

Moving this BZ to verified in 3.2.0. Pasted below is the command output.

[root@dhcp35-104 ozone]#
[root@dhcp35-104 ozone]#
[root@dhcp35-104 ozone]# gluster peer status
Number of Peers: 3

Hostname: dhcp35-115.lab.eng.blr.redhat.com
Uuid: 6ac165c0-317f-42ad-8262-953995171dbb
State: Peer in Cluster (Connected)

Hostname: dhcp35-101.lab.eng.blr.redhat.com
Uuid: a3bd23b9-f70a-47f5-9c95-7a271f5f1e18
State: Peer in Cluster (Connected)

Hostname: 10.70.35.100
Uuid: fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5
State: Peer in Cluster (Connected)
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# rpm -qa | grep gluster
glusterfs-server-3.8.4-2.el6rhs.x86_64
python-gluster-3.8.4-2.el6rhs.noarch
glusterfs-events-3.8.4-2.el6rhs.x86_64
glusterfs-3.8.4-2.el6rhs.x86_64
glusterfs-fuse-3.8.4-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64
gluster-nagios-common-0.2.4-1.el6rhs.noarch
glusterfs-debuginfo-3.8.4-1.el6rhs.x86_64
glusterfs-client-xlators-3.8.4-2.el6rhs.x86_64
glusterfs-cli-3.8.4-2.el6rhs.x86_64
glusterfs-geo-replication-3.8.4-2.el6rhs.x86_64
gluster-nagios-addons-0.2.8-1.el6rhs.x86_64
vdsm-gluster-4.16.30-1.5.el6rhs.noarch
glusterfs-api-3.8.4-2.el6rhs.x86_64
glusterfs-api-devel-3.8.4-2.el6rhs.x86_64
nfs-ganesha-gluster-2.3.1-8.el6rhs.x86_64
glusterfs-libs-3.8.4-2.el6rhs.x86_64
glusterfs-devel-3.8.4-2.el6rhs.x86_64
glusterfs-rdma-3.8.4-2.el6rhs.x86_64
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# gluster v info
 
Volume Name: nash
Type: Disperse
Volume ID: 5bcd289c-9d38-4f36-9228-b3ec00d869e8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.35.115:/bricks/brick1/nash
Brick2: 10.70.35.100:/bricks/brick1/nash
Brick3: 10.70.35.101:/bricks/brick1/nash
Brick4: 10.70.35.104:/bricks/brick1/nash
Brick5: 10.70.35.101:/bricks/brick2/nash
Brick6: 10.70.35.104:/bricks/brick2/nash
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.bitrot: on
features.scrub: Active
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: a9540783-ab5d-4b58-9c69-e81dbe986559
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.35.115:/bricks/brick1/ozone
Brick2: 10.70.35.100:/bricks/brick1/ozone
Brick3: 10.70.35.101:/bricks/brick1/ozone
Brick4: 10.70.35.115:/bricks/brick2/ozone
Brick5: 10.70.35.100:/bricks/brick2/ozone
Brick6: 10.70.35.104:/bricks/brick2/ozone
Options Reconfigured:
transport.address-family: inet
features.bitrot: on
features.scrub: Active
features.scrub-freq: minute
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 

[root@dhcp35-104 ozone]# gluster v bitrot ozone scrub ondemand
volume bitrot: success
[root@dhcp35-104 ozone]# gluster v bitrot ozone scrub status

Volume name : ozone

State of scrub: Active (In Progress)

Scrub impact: lazy

Scrub frequency: minute

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node: localhost

Number of Scrubbed files: 1

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 11:03:07

Duration of last scrub (D:M:H:M:S): 0:0:0:2

Error count: 0


=========================================================

Node: dhcp35-101.lab.eng.blr.redhat.com

Number of Scrubbed files: 0

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 11:02:30

Duration of last scrub (D:M:H:M:S): 0:0:0:4

Error count: 0


=========================================================

Node: 10.70.35.100

Number of Scrubbed files: 0

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 10:55:56

Duration of last scrub (D:M:H:M:S): 0:0:0:3

Error count: 1

Corrupted object's [GFID]:

5793c862-a24c-41ab-b708-64de6da40ba0


=========================================================

Node: dhcp35-115.lab.eng.blr.redhat.com

Number of Scrubbed files: 2

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 10:55:57

Duration of last scrub (D:M:H:M:S): 0:0:0:3

Error count: 0

=========================================================

[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# 
[root@dhcp35-104 ozone]# gluster v bitrot ozone scrub status

Volume name : ozone

State of scrub: Active (Idle)

Scrub impact: lazy

Scrub frequency: minute

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node: localhost

Number of Scrubbed files: 1

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 11:03:07

Duration of last scrub (D:M:H:M:S): 0:0:0:2

Error count: 0


=========================================================

Node: dhcp35-101.lab.eng.blr.redhat.com

Number of Scrubbed files: 3

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 11:03:10

Duration of last scrub (D:M:H:M:S): 0:0:0:4

Error count: 0


=========================================================

Node: 10.70.35.100

Number of Scrubbed files: 3

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 11:03:08

Duration of last scrub (D:M:H:M:S): 0:0:0:3

Error count: 2

Corrupted object's [GFID]:

a9384c25-93fd-412d-a8b1-442d0f3c16c4

5793c862-a24c-41ab-b708-64de6da40ba0


=========================================================

Node: dhcp35-115.lab.eng.blr.redhat.com

Number of Scrubbed files: 4

Number of Skipped files: 0

Last completed scrub time: 2016-10-03 11:03:08

Duration of last scrub (D:M:H:M:S): 0:0:0:3

Error count: 0

=========================================================

[root@dhcp35-104 ozone]#

Comment 11 errata-xmlrpc 2017-03-23 05:31:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.