Bug 1342459

Summary: [Bitrot]: Sticky bit files considered and skipped by the scrubber, instead of getting ignored.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: bitrotAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, khiremat, rcyriac, rhinduja, rhs-bugs
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1355706 (view as bug list) Environment:
Last Closed: 2017-03-23 05:34:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351522, 1355706, 1359017, 1359020    
Attachments:
Description Flags
Server and client logs none

Description Sweta Anandpara 2016-06-03 10:23:11 UTC
Description of problem:
=======================
If we have sticky bit files present in a distribute replicate volume, then they get displayed as 'skipped' in the scrub status output. Considering that the actual file would have anyways gotten considered, the scrubber should ideally be ignoring all T files.


Version-Release number of selected component (if applicable):
==========================================================
3.7.9-7


How reproducible:
================
Reporting the first occurrence.


Steps to Reproduce:
===================
1. Have a 4node cluster, with n*2 distribute replicate volume.
2. Create files from the mountpoint
3. Create a scenario such that sticky bit files are created (say, kill a brick process, and do creates)
4. Validate the scrub status output

Actual results:
================
Step4 shows a number>0 in the field 'Files skipped' in scrub status output.

Expected results:
==================
Step4 should not show any files as skipped, as there is no open FD while the scrubber is doing its run.



Additional info:
=================

[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# rpm -qa | grep gluster
glusterfs-cli-3.7.9-7.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-7.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
glusterfs-fuse-3.7.9-7.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-7.el7rhgs.x86_64
glusterfs-libs-3.7.9-7.el7rhgs.x86_64
glusterfs-api-3.7.9-7.el7rhgs.x86_64
glusterfs-3.7.9-7.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-server-3.7.9-7.el7rhgs.x86_64
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster pool list
UUID					Hostname                         	State
d8339859-b7e5-4683-9e53-00e34a3d090d	dhcp47-188.lab.eng.blr.redhat.com	Connected 
1bb3d70d-dbb0-4dd7-9a4d-ae33564ef226	10.70.46.215                     	Connected 
34a7a230-1513-4244-92b6-47fd17cd7f37	10.70.46.193                     	Connected 
60b85677-44a0-413f-9200-7516c9b88006	localhost                        	Connected 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster v list
disp
distrep2
distrep3
gluster_shared_storage
mm
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster v info distrep2
 
Volume Name: distrep2
Type: Distributed-Replicate
Volume ID: a40e89f0-02dd-4fa7-8687-afe0f092ae80
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.188:/brick/brick1/distrep2
Brick2: 10.70.46.215:/brick/brick1/distrep2
Brick3: 10.70.46.187:/brick/brick1/distrep2
Brick4: 10.70.46.193:/brick/brick1/distrep2
Options Reconfigured:
cluster.self-heal-daemon: enable
performance.readdir-ahead: on
features.bitrot: on
features.scrub: Active
features.scrub-freq: hourly
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster v bitrot distrep2 scrub status

Volume name : distrep2

State of scrub: Active

Scrub impact: lazy

Scrub frequency: hourly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node: localhost

Number of Scrubbed files: 9

Number of Skipped files: 1

Last completed scrub time: 2016-06-03 10:21:46

Duration of last scrub (D:M:H:M:S): 0:0:0:43

Error count: 0


=========================================================

Node: 10.70.46.193

Number of Scrubbed files: 9

Number of Skipped files: 1

Last completed scrub time: 2016-06-03 10:21:45

Duration of last scrub (D:M:H:M:S): 0:0:0:42

Error count: 0


=========================================================

Node: dhcp47-188.lab.eng.blr.redhat.com

Number of Scrubbed files: 13

Number of Skipped files: 0

Last completed scrub time: 2016-06-03 10:21:50

Duration of last scrub (D:M:H:M:S): 0:0:0:46

Error count: 0


=========================================================

Node: 10.70.46.215

Number of Scrubbed files: 13

Number of Skipped files: 0

Last completed scrub time: 2016-06-03 10:21:49

Duration of last scrub (D:M:H:M:S): 0:0:0:46

Error count: 0

=========================================================

[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# cd /brick/brick1/distrep2/dir1/dir2/dir3/dir4/dir5/
[root@dhcp46-187 dir5]# ls -l
total 32
---------T. 2 root root 20 Jun  3 11:59 file1_ln
-rw-r--r--. 2 root root 22 Jun  3 11:25 test1
-rw-r--r--. 2 root root 22 Jun  3 11:25 test2
-rw-r--r--. 2 root root 22 Jun  3 11:25 test4
[root@dhcp46-187 dir5]# 
[root@dhcp46-187 dir5]#

Comment 2 Kotresh HR 2016-07-12 10:03:02 UTC
Upstream Patch:

http://review.gluster.org/14903 (master)

Comment 4 Kotresh HR 2016-08-11 12:04:53 UTC
Upstream Patches:

http://review.gluster.org/14903   (master)
http://review.gluster.org/14982   (3.7)
http://review.gluster.org/14983   (3.8)

Comment 5 Atin Mukherjee 2016-09-17 13:43:54 UTC
as mentioned in comment 4, the fix is already available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4

Comment 8 Sweta Anandpara 2016-11-07 09:04:56 UTC
Tested and verified this on the build glusterfs-3.8.4-3.el7rhgs.x86_64

Had a 4 node setup with 14*2 distribute-replicate volume created and enabled bitrot on the same. Mounted it via fuse and created files. Killed bricks and moved files around, so that link files get created. 
 
Monitored the scrub output across various intervals and it correctly showed the count with respect to 'files scrubbed' and 'files skipped'. Also, corrupted one of the files from the backend (for which a link file was present) and validated that it correctly got detected as corrupted and also got healed as expected. 

Moving this BZ to verified in 3.2. Detailed logs are attached.

Comment 9 Sweta Anandpara 2016-11-07 09:05:18 UTC
Created attachment 1217926 [details]
Server and client logs

Comment 11 errata-xmlrpc 2017-03-23 05:34:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html