Description of problem: ======================== In scenarios where scrubber would take more time to finish scrubbing files, a 'scrub status' output when scrubbing is in progress, displays the previous run's information for 'last completed scrub time' and 'duration of last scrub'. This provides an incorrect view to the user, giving the impression that scrubbing has completed. We should ideally have the field 'State of scrub' set to 'In progress' and the above mentioned fields set to '-' . The other two fields (files scrubbed, files skipped) correctly show the present run's details. Version-Release number of selected component (if applicable): ============================================================= 3.7.9-3 How reproducible: ================ Always Steps to Reproduce: ==================== 1. Have a 4node cluster, with a dist-rep volume and sharding enabled. Set the scrub frequency to 'hourly' 2. Create 100 1MB files and wait for the scrubber to finish its run. 3. View the scrub status output for the validity of the fields that it shows. 4. Create 5 4GB files and wait for the next run of scrubbing to start. 5. When the scrubbing is in progress (as seen from scrub.log), issue a 'gluster volume bitrot <volname> scrub status' Actual results: =============== Scrub status shows 4 fields with respect to every node. 2 fields are updated as per the current run, and 2 as per the previous run. Expected results: ================= Either all the 4 fields must reflect the current run, OR all 4 fields must reflect the previous run. Preferable would be to let the user know that scrubbing is in progress, and update the fields accordingly.
The two fields 'last completed scrub time' and 'duration of last scrub' is actually related to last scrub run as the name it self indicates. In order to avoid confusion, the following patch shows whether the scrub is actually is "In progress" or "Idle" http://review.gluster.org/14864 (upstream)
Release branch patches: http://review.gluster.org/#/c/14900/ (release-3.7) http://review.gluster.org/#/c/14901/ (release-3.8)
Upstream mainline : http://review.gluster.org/14864 Upstream 3.8 : http://review.gluster.org/14901 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Tested and verified this in the build 3.8.4-2. While the scrubber is in the middle of a run, it appropriately shows the 'State of scrub' as ACTIVE, but 'in progress'. The other fields of 'last completed scrubbed time' and 'duration of last scrub' are left as is.. and it continues to show the previous run's details, as expected. Moving this BZ to verified in 3.2.0. Pasted below is the command output. [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# gluster peer status Number of Peers: 3 Hostname: dhcp35-115.lab.eng.blr.redhat.com Uuid: 6ac165c0-317f-42ad-8262-953995171dbb State: Peer in Cluster (Connected) Hostname: dhcp35-101.lab.eng.blr.redhat.com Uuid: a3bd23b9-f70a-47f5-9c95-7a271f5f1e18 State: Peer in Cluster (Connected) Hostname: 10.70.35.100 Uuid: fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5 State: Peer in Cluster (Connected) [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# rpm -qa | grep gluster glusterfs-server-3.8.4-2.el6rhs.x86_64 python-gluster-3.8.4-2.el6rhs.noarch glusterfs-events-3.8.4-2.el6rhs.x86_64 glusterfs-3.8.4-2.el6rhs.x86_64 glusterfs-fuse-3.8.4-2.el6rhs.x86_64 glusterfs-ganesha-3.8.4-2.el6rhs.x86_64 gluster-nagios-common-0.2.4-1.el6rhs.noarch glusterfs-debuginfo-3.8.4-1.el6rhs.x86_64 glusterfs-client-xlators-3.8.4-2.el6rhs.x86_64 glusterfs-cli-3.8.4-2.el6rhs.x86_64 glusterfs-geo-replication-3.8.4-2.el6rhs.x86_64 gluster-nagios-addons-0.2.8-1.el6rhs.x86_64 vdsm-gluster-4.16.30-1.5.el6rhs.noarch glusterfs-api-3.8.4-2.el6rhs.x86_64 glusterfs-api-devel-3.8.4-2.el6rhs.x86_64 nfs-ganesha-gluster-2.3.1-8.el6rhs.x86_64 glusterfs-libs-3.8.4-2.el6rhs.x86_64 glusterfs-devel-3.8.4-2.el6rhs.x86_64 glusterfs-rdma-3.8.4-2.el6rhs.x86_64 [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# gluster v info Volume Name: nash Type: Disperse Volume ID: 5bcd289c-9d38-4f36-9228-b3ec00d869e8 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: 10.70.35.115:/bricks/brick1/nash Brick2: 10.70.35.100:/bricks/brick1/nash Brick3: 10.70.35.101:/bricks/brick1/nash Brick4: 10.70.35.104:/bricks/brick1/nash Brick5: 10.70.35.101:/bricks/brick2/nash Brick6: 10.70.35.104:/bricks/brick2/nash Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.bitrot: on features.scrub: Active Volume Name: ozone Type: Distributed-Replicate Volume ID: a9540783-ab5d-4b58-9c69-e81dbe986559 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.35.115:/bricks/brick1/ozone Brick2: 10.70.35.100:/bricks/brick1/ozone Brick3: 10.70.35.101:/bricks/brick1/ozone Brick4: 10.70.35.115:/bricks/brick2/ozone Brick5: 10.70.35.100:/bricks/brick2/ozone Brick6: 10.70.35.104:/bricks/brick2/ozone Options Reconfigured: transport.address-family: inet features.bitrot: on features.scrub: Active features.scrub-freq: minute [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# gluster v bitrot ozone scrub ondemand volume bitrot: success [root@dhcp35-104 ozone]# gluster v bitrot ozone scrub status Volume name : ozone State of scrub: Active (In Progress) Scrub impact: lazy Scrub frequency: minute Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node: localhost Number of Scrubbed files: 1 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 11:03:07 Duration of last scrub (D:M:H:M:S): 0:0:0:2 Error count: 0 ========================================================= Node: dhcp35-101.lab.eng.blr.redhat.com Number of Scrubbed files: 0 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 11:02:30 Duration of last scrub (D:M:H:M:S): 0:0:0:4 Error count: 0 ========================================================= Node: 10.70.35.100 Number of Scrubbed files: 0 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 10:55:56 Duration of last scrub (D:M:H:M:S): 0:0:0:3 Error count: 1 Corrupted object's [GFID]: 5793c862-a24c-41ab-b708-64de6da40ba0 ========================================================= Node: dhcp35-115.lab.eng.blr.redhat.com Number of Scrubbed files: 2 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 10:55:57 Duration of last scrub (D:M:H:M:S): 0:0:0:3 Error count: 0 ========================================================= [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# [root@dhcp35-104 ozone]# gluster v bitrot ozone scrub status Volume name : ozone State of scrub: Active (Idle) Scrub impact: lazy Scrub frequency: minute Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node: localhost Number of Scrubbed files: 1 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 11:03:07 Duration of last scrub (D:M:H:M:S): 0:0:0:2 Error count: 0 ========================================================= Node: dhcp35-101.lab.eng.blr.redhat.com Number of Scrubbed files: 3 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 11:03:10 Duration of last scrub (D:M:H:M:S): 0:0:0:4 Error count: 0 ========================================================= Node: 10.70.35.100 Number of Scrubbed files: 3 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 11:03:08 Duration of last scrub (D:M:H:M:S): 0:0:0:3 Error count: 2 Corrupted object's [GFID]: a9384c25-93fd-412d-a8b1-442d0f3c16c4 5793c862-a24c-41ab-b708-64de6da40ba0 ========================================================= Node: dhcp35-115.lab.eng.blr.redhat.com Number of Scrubbed files: 4 Number of Skipped files: 0 Last completed scrub time: 2016-10-03 11:03:08 Duration of last scrub (D:M:H:M:S): 0:0:0:3 Error count: 0 ========================================================= [root@dhcp35-104 ozone]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html