1552425 – Make afr_fsync a transaction

Bug 1552425 - Make afr_fsync a transaction

Summary: Make afr_fsync a transaction

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Karthik U S
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:	1548361
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-03-07 06:19 UTC by Karthik U S
Modified:	2018-09-19 05:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.12.2-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:44:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:45:12 UTC

Description Karthik U S 2018-03-07 06:19:01 UTC

Description of problem:
Currently afr_fsync() is not a transaction and it can lead to some problems. If data is not yet synced to the bricks and the app issues fsync and some bricks crash, then we need to heal them to other bricks.
Since fsync is not a transaction it won't set any pending markers to indicate data needs to be copied to the other bricks which failed to do fsync. By making it a transaction it will take care of setting pending markers and when the crashed bricks come up, it will do the sync guaranteeing the data persists on the disk.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Karthik U S 2018-03-07 08:46:09 UTC

Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/131942/

Comment 3 Karthik U S 2018-03-09 07:19:26 UTC

Upstream patch: https://review.gluster.org/#/c/19621/

Comment 8 Karthik U S 2018-05-21 09:45:34 UTC

Steps to validate this bug:

- Create a replica volume, start & mount it
- Create a file and write some data to it
- Kill one of the brick and do fsync on the file
- Check the xattrs on the file to see whether data pending set on it (It will be set only if it is a transaction).
- The entry should be present in the heal info
- Bring the brick up and wait for the heal to complete
- Now the pending mark should be reset, and heal info should be 0

Comment 9 Vijay Avuthu 2018-05-22 08:51:52 UTC

Update:
========

1) create 2 * 3 distributed-replicate volume and start
2) create files with data from mount point 
3) kill brick ( b0 ) from replica set
4) do fsync on a file
5) check the heal info - it should show pending heal for the file where did fsync in step 4
6) check xattrs on the file -- data pending bit should be set


# gluster vol heal testvol_distributed-replicated info
Brick 10.70.47.45:/bricks/brick3/testvol_distributed-replicated_brick0
Status: Transport endpoint is not connected
Number of entries: -

Brick 10.70.47.144:/bricks/brick3/testvol_distributed-replicated_brick1
/file_1 
Status: Connected
Number of entries: 1

Brick 10.70.46.135:/bricks/brick1/testvol_distributed-replicated_brick2
/file_1 
Status: Connected
Number of entries: 1

Brick 10.70.46.35:/bricks/brick0/testvol_distributed-replicated_brick3
Status: Connected
Number of entries: 0

Brick 10.70.46.166:/bricks/brick0/testvol_distributed-replicated_brick4
Status: Connected
Number of entries: 0

Brick 10.70.47.122:/bricks/brick0/testvol_distributed-replicated_brick5
Status: Connected
Number of entries: 0

# 



# getfattr -d -m . -e hex /bricks/brick3/testvol_distributed-replicated_brick1/file_1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/testvol_distributed-replicated_brick1/file_1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol_distributed-replicated-client-0=0x000000010000000000000000
trusted.gfid=0xcc8c2dfde4c84160b450e5ddd2e9966f
trusted.gfid2path.e3c7c5862232fa1b=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f66696c655f31

# 

Changing status to Verified.

Comment 11 errata-xmlrpc 2018-09-04 06:44:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.