1356976 – seeing dataheal pending bits bump up when source databrick is down and arbiter does metadata and entry heal

Bug 1356976 - seeing dataheal pending bits bump up when source databrick is down and arbiter does metadata and entry heal

Summary: seeing dataheal pending bits bump up when source databrick is down and arbite...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	arbiter
Sub Component:
Version:	3.7.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-15 12:16 UTC by Nag Pavan Chilakam
Modified:	2016-07-20 12:38 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-07-20 12:38:06 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2016-07-15 12:16:27 UTC

Description of problem:
==========================
I see that the dataheal bits in truster.afr xattr are getting bumped up when arbiter brick heals  a destination data brick for metadata and entry heal while source brick is down

Version-Release number of selected component (if applicable):
glusterfs 3.9dev built on Jul 11 2016 10:04:54

How reproducible:
always


Steps to Reproduce:
================
1.create a 1x(2+1) replicate arbiter vol
2.now mount the vol by fuse
3.create a directory say dir1
4. Now bring down the first data brick(db1) 
5. create a file sat f1 under dir1 with some contents 
6. note down the getfattr details from both db2 and ab1 
for eg below is the info:
===>from db2:
[root@dhcp43-153 ~]# getfattr -d -m . -e hex /bricks/brick1/arbit/db1_Down/datafile
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/arbit/db1_Down/datafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.arbit-client-0=0x000000020000000200000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005788cb300009088d
trusted.gfid=0x091d29ddf4e149da83531686e59818de

===>from ab1:
[root@dhcp43-157 ~]#  getfattr -d -m . -e hex /bricks/brick2/arbit/db1_Down/datafile
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick2/arbit/db1_Down/datafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.arbit-client-0=0x000000020000000200000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005788cb3000085f14
trusted.gfid=0x091d29ddf4e149da83531686e59818de


7.Now bring down the other data brick too ie db2
7. bring up the db1 which was down while keeping db2 down
8. check heal info and trigger a manual heal
9. once the entry and metadata heal is over check the xattr info on ab1 for the file. It can be seen that the data bit is bumped up

[root@dhcp43-157 ~]#  getfattr -d -m . -e hex /bricks/brick2/arbit/db1_Down/datafile
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick2/arbit/db1_Down/datafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.arbit-client-0=0x000000030000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005788cb3000085f14
trusted.gfid=0x091d29ddf4e149da83531686e59818de



This shouldn't be happening

Comment 1 Ravishankar N 2016-07-20 12:38:06 UTC

This is expected behaviour. When an entry self heal results in new entry (file) creation on the sink bricks, the data and metadata changelogs of that file are incremented on the source brick(s) to indicate that they need to be healed too. This is mostly uselful in replace brick scenarios. 

The reason why you only see dataheal pending bits (and not metadata bit) is because the metadata bit, though incremented, was cleared after the heal completed. The databit remained because arbiter cannot be used for data heal.

Note You need to log in before you can comment on or make changes to this bug.