1342459 – [Bitrot]: Sticky bit files considered and skipped by the scrubber, instead of getting ignored.

Bug 1342459 - [Bitrot]: Sticky bit files considered and skipped by the scrubber, instead of getting ignored.

Summary: [Bitrot]: Sticky bit files considered and skipped by the scrubber, instead of...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	bitrot
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Kotresh HR
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351522 1355706 1359017 1359020
TreeView+	depends on / blocked

Reported:	2016-06-03 10:23 UTC by Sweta Anandpara
Modified:	2017-03-23 05:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1355706 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:34:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Server and client logs (25.79 KB, application/vnd.oasis.opendocument.text) 2016-11-07 09:05 UTC, Sweta Anandpara	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Sweta Anandpara 2016-06-03 10:23:11 UTC

Description of problem:
=======================
If we have sticky bit files present in a distribute replicate volume, then they get displayed as 'skipped' in the scrub status output. Considering that the actual file would have anyways gotten considered, the scrubber should ideally be ignoring all T files.


Version-Release number of selected component (if applicable):
==========================================================
3.7.9-7


How reproducible:
================
Reporting the first occurrence.


Steps to Reproduce:
===================
1. Have a 4node cluster, with n*2 distribute replicate volume.
2. Create files from the mountpoint
3. Create a scenario such that sticky bit files are created (say, kill a brick process, and do creates)
4. Validate the scrub status output

Actual results:
================
Step4 shows a number>0 in the field 'Files skipped' in scrub status output.

Expected results:
==================
Step4 should not show any files as skipped, as there is no open FD while the scrubber is doing its run.



Additional info:
=================

[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# rpm -qa | grep gluster
glusterfs-cli-3.7.9-7.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-7.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
glusterfs-fuse-3.7.9-7.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-7.el7rhgs.x86_64
glusterfs-libs-3.7.9-7.el7rhgs.x86_64
glusterfs-api-3.7.9-7.el7rhgs.x86_64
glusterfs-3.7.9-7.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-server-3.7.9-7.el7rhgs.x86_64
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster pool list
UUID					Hostname                         	State
d8339859-b7e5-4683-9e53-00e34a3d090d	dhcp47-188.lab.eng.blr.redhat.com	Connected 
1bb3d70d-dbb0-4dd7-9a4d-ae33564ef226	10.70.46.215                     	Connected 
34a7a230-1513-4244-92b6-47fd17cd7f37	10.70.46.193                     	Connected 
60b85677-44a0-413f-9200-7516c9b88006	localhost                        	Connected 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster v list
disp
distrep2
distrep3
gluster_shared_storage
mm
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster v info distrep2
 
Volume Name: distrep2
Type: Distributed-Replicate
Volume ID: a40e89f0-02dd-4fa7-8687-afe0f092ae80
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.188:/brick/brick1/distrep2
Brick2: 10.70.46.215:/brick/brick1/distrep2
Brick3: 10.70.46.187:/brick/brick1/distrep2
Brick4: 10.70.46.193:/brick/brick1/distrep2
Options Reconfigured:
cluster.self-heal-daemon: enable
performance.readdir-ahead: on
features.bitrot: on
features.scrub: Active
features.scrub-freq: hourly
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# gluster v bitrot distrep2 scrub status

Volume name : distrep2

State of scrub: Active

Scrub impact: lazy

Scrub frequency: hourly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node: localhost

Number of Scrubbed files: 9

Number of Skipped files: 1

Last completed scrub time: 2016-06-03 10:21:46

Duration of last scrub (D:M:H:M:S): 0:0:0:43

Error count: 0


=========================================================

Node: 10.70.46.193

Number of Scrubbed files: 9

Number of Skipped files: 1

Last completed scrub time: 2016-06-03 10:21:45

Duration of last scrub (D:M:H:M:S): 0:0:0:42

Error count: 0


=========================================================

Node: dhcp47-188.lab.eng.blr.redhat.com

Number of Scrubbed files: 13

Number of Skipped files: 0

Last completed scrub time: 2016-06-03 10:21:50

Duration of last scrub (D:M:H:M:S): 0:0:0:46

Error count: 0


=========================================================

Node: 10.70.46.215

Number of Scrubbed files: 13

Number of Skipped files: 0

Last completed scrub time: 2016-06-03 10:21:49

Duration of last scrub (D:M:H:M:S): 0:0:0:46

Error count: 0

=========================================================

[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# 
[root@dhcp46-187 ~]# cd /brick/brick1/distrep2/dir1/dir2/dir3/dir4/dir5/
[root@dhcp46-187 dir5]# ls -l
total 32
---------T. 2 root root 20 Jun  3 11:59 file1_ln
-rw-r--r--. 2 root root 22 Jun  3 11:25 test1
-rw-r--r--. 2 root root 22 Jun  3 11:25 test2
-rw-r--r--. 2 root root 22 Jun  3 11:25 test4
[root@dhcp46-187 dir5]# 
[root@dhcp46-187 dir5]#

Comment 2 Kotresh HR 2016-07-12 10:03:02 UTC

Upstream Patch:

http://review.gluster.org/14903 (master)

Comment 4 Kotresh HR 2016-08-11 12:04:53 UTC

Upstream Patches:

http://review.gluster.org/14903   (master)
http://review.gluster.org/14982   (3.7)
http://review.gluster.org/14983   (3.8)

Comment 5 Atin Mukherjee 2016-09-17 13:43:54 UTC

as mentioned in comment 4, the fix is already available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4

Comment 8 Sweta Anandpara 2016-11-07 09:04:56 UTC

Tested and verified this on the build glusterfs-3.8.4-3.el7rhgs.x86_64

Had a 4 node setup with 14*2 distribute-replicate volume created and enabled bitrot on the same. Mounted it via fuse and created files. Killed bricks and moved files around, so that link files get created. 
 
Monitored the scrub output across various intervals and it correctly showed the count with respect to 'files scrubbed' and 'files skipped'. Also, corrupted one of the files from the backend (for which a link file was present) and validated that it correctly got detected as corrupted and also got healed as expected. 

Moving this BZ to verified in 3.2. Detailed logs are attached.

Comment 9 Sweta Anandpara 2016-11-07 09:05:18 UTC

Created attachment 1217926 [details]
Server and client logs

Comment 11 errata-xmlrpc 2017-03-23 05:34:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.