1360331 – default timeout of 5min not honored for analyzing split-brain files post setfattr replica.split-brain-heal-finalize

Bug 1360331 - default timeout of 5min not honored for analyzing split-brain files post setfattr replica.split-brain-heal-finalize

Summary: default timeout of 5min not honored for analyzing split-brain files post setf...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Karthik U S
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:	1503519
Blocks:	1503134
TreeView+	depends on / blocked

Reported:	2016-07-26 12:35 UTC by Nag Pavan Chilakam
Modified:	2019-04-03 09:28 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.12.2-2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1503519 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:29:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:31:22 UTC

Description Nag Pavan Chilakam 2016-07-26 12:35:05 UTC

Description of problem:
=======================
a file in split-brain can be still read or accessed from a fuse mount by pointing to the desired brick by using following steps:
1)first identify the split-brain file
2)confirm from the fuse mount using getfattr -n replica.split-brain-status <split-brain file>
Now as the file is in split-brain the user gets I/O error
But the user can analyze the split brain file by getting access by setting "setfattr -n replica.split-brain-choice -v "choiceX" <path-to-
file>"
where -V is for the brick

However, the expected default time-out is 5min, post which the file should become again inaccessible.
However, this doesnt work as the user can keep accessing the file indefinitely 
you can also refer admin do 10.11.2.1:Recovering File Split-brain from the Mount Point -->>Setting the split-brain-choice on the file



Version-Release number of selected component (if applicable):
==================
3.7.9.10

How reproducible:
===============
always


Steps to Reproduce:
1.create a data/metadata split brain
2.on fuse mount use "setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>" to access the splitbrain file from fuse mount
3.the split brain file should be accessible only for 5min after using above command

Actual results:
=================
the splitbrain file is accessible without any timebound

Expected results:
=====================
the splitbrain file should be  accessible for default time of 5min

Additional info:

Comment 2 Nag Pavan Chilakam 2016-07-27 07:42:06 UTC

FYI,
If i clear the client cache using "free && sync && echo 3 > /proc/sys/vm/drop_caches && free" then the 5min is honoured. similar behavior even if we change the timeout
So, now the question is , shouldn't the cache invalidation kick in post timeout instead of user needing to clear cache?

Comment 4 Ravishankar N 2017-10-09 09:06:36 UTC

Hi Karthik, could you take a look at this bug? Check if the problem is with AFR timer expiry logic (unlikely) or due to caching in perf xlators or in the fuse kernel.

Comment 5 Karthik U S 2017-10-09 09:09:40 UTC

Sure Ravi. Will check.

Comment 6 Karthik U S 2017-10-18 11:31:33 UTC

Upstream patch: https://review.gluster.org/18546

Comment 9 Vijay Avuthu 2018-04-19 06:45:38 UTC

Update:
========

Build Used : glusterfs-3.12.2-7.el7rhgs.x86_64

1) create data-split brain files
2) set the replica.split-brain-choice for accessing the file for 5 min ( default time ) from mount point
3) validate whether file is accessed ONLY for 5 min. After 5th min, it should throw I/O error


# date;setfattr -n replica.split-brain-choice -v "12-client-1" file_2
Thu Apr 19 02:36:39 EDT 2018
# 

# date;cat file_2
Thu Apr 19 02:41:39 EDT 2018
Initial contnet
Appending contnet while b0 is down
# date;cat file_2
Thu Apr 19 02:41:40 EDT 2018
cat: file_2: Input/output error
#

> Also Tried accessing file in a loop continuously and after 5 min its throwing I/O error as expected.

Changing status to Verified.

Comment 11 errata-xmlrpc 2018-09-04 06:29:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.