1341934 – [Bitrot]: Recovery fails of a corrupted hardlink (and the corresponding parent file) in a disperse volume

Bug 1341934 - [Bitrot]: Recovery fails of a corrupted hardlink (and the corresponding parent file) in a disperse volume

Summary: [Bitrot]: Recovery fails of a corrupted hardlink (and the corresponding paren...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	bitrot
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Kotresh HR
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1311843 1351522 1351530 1373520 1374564 1374565 1374567
TreeView+	depends on / blocked

Reported:	2016-06-02 05:26 UTC by Sweta Anandpara
Modified:	2017-03-23 05:34 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:	When a file that had a hard link was marked as bad, the bad file marker was not being removed from the inode. This meant that self-heal attempted to open the file in order to heal it, the heal failed with an EIO error, and the hard link was not recovered. The bad file marker on the inode is now removed during lookup so that files with hard links recover successfully.
Clone Of:
Clones:	1373520 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:34:06 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Server and client logs (26.01 KB, application/vnd.oasis.opendocument.text) 2016-11-07 06:34 UTC, Sweta Anandpara	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Sweta Anandpara 2016-06-02 05:26:47 UTC

Description of problem:
=======================
Have a 4node cluster with a 1 x (4+2) volume ozone. Enable bitrot and set the scrubber frequency to hourly. Create files/directories via fuse/nfs and create a couple of hardlinks as well. Corrupt one of the hardlinks from the backend brick path and wait for the scrubber to mark it as corrupted. Now follow the standard procedure of recovering a corrupted file, by deleting the same on the backend and accessing it from the mountpoint. After recovery, we see that the recovered file has the same contents as what it had when it was corrupted. 


Version-Release number of selected component (if applicable):
=============================================================
3.7.9-6


How reproducible:
================
Hit it in my setup. Recreated the issue on another setup which was shared by development team. This setup is still in the same state, in case it has to be looked at.


Steps to Reproduce:
==================

1. Have a 4node cluster. Create a 4+2 disperse volume on node2, node3 and node4 by using 2 bricks each from every node.
2. Enable bitrot and mount it via fuse. Create 5 files and 2 hardlinks.
3. Go to the brick backend path of node2, and append a line to one of the hardlinks.
4. Verify using 'cat' that the hardlink as well as the parent file get corrupted at the backend. 
5. Wait for the scrubber to finish its run, and verify that /var/log/glusterfs/scrub.log detects the corruption.
6. Delete the hardlink (and the parent file) from the backend brick path of node2 and access the file from the moutnpoint, hoping that afr will recover the file on node2.

Actual results:
================
After step6, file and the hardlink do get recovered, but it continues to have the corrupted data.


Expected results:
=================
Good copy of file should get recovered

Comment 3 Sweta Anandpara 2016-06-02 07:45:53 UTC

Proposing it as a blocker as the data once corrupted, continues to remain corrupted. 

This reduces the redundancy that comes along with a disperse volume, without the user's knowledge.

Comment 5 Sweta Anandpara 2016-06-02 12:08:15 UTC

Few updates about what happened during the day while trying to debug this issue.

1. Tried the same steps without bitrot, with a plain disperse volume. If there is no scrubber involved which marks the file as bad, then the recovery of the file works as expected at the outset. (However further testing would be required to confidently claim the same)

2. In the setup that was shared by Kotresh, this behaviour was consistently reproduced not just for hardlinks/softlinks but even for regular files.

3. Had missed deleting the file entry from .glusterfs folder. Re did the steps mentioned in the description. THIS time again, the file gets recovered not with the corrupted data, but with NO data. It is an empty file, which continues to remain empty. Multiple attempts to manually heal the file using 'gluster volume heal <volname>' has no effect.

To sum it up, recovery of (corrupted) file is not working as expected in a disperse volume. Data corruption (and no way to recover) silently leaves the system in a -1 redundancy state.

Comment 8 Kotresh HR 2016-06-03 10:42:33 UTC

Analysis:
The lookup on deleted file should have cleaned the inode context where bitrot had marked the file bad in memory. For some reason this is not happening with EC volume. It needs further investigation.
Well we have a workaround here. If the brick is restared, healing successfully happens.

Comment 10 Sweta Anandpara 2016-06-05 13:17:25 UTC

Recreated the issue on the build 3.7.9-7, and tested the workaround of restarting the brick process. The file does get healed successfully. 

Will execute a few cases in and around this workaround, to ensure there's no unexpected impact to the rest of the functionality.

Comment 11 Sweta Anandpara 2016-06-06 07:54:43 UTC

Follow up of comment10 of validating the workaround:

Killing the brick process by 'kill -15' and restarting it by 'gluster volume start <volname> force' does help in recovery of the file. We no longer see the recovered-file empty.

Impact of the workaround:
-------------------------
The only known/recommended way of restarting brick process is to start the volume by force, which in turn restarts the scrubber as well. All the status of the volume wrt #files scrubbed, #files skipped, #duration, #last completed scrub time, is reset.

Had the information related to (other) corruptions also lost, it would have been a concern, as the user would have had to wait for another scrub run. But that informations remains, and shows up correctly in the scrub status output. 

To sum it up, (1) kill -15 <brick pid>  (2) gluster volume <volname> start force can be accepted as a workaround to recover a (corrupted) file in disperse volume. 

Atin/Kotresh, please do write if you see any other concern wrt 'volume start force'

Comment 13 Kotresh HR 2016-09-06 13:27:05 UTC

Upstream Patch:

http://review.gluster.org/15408

Comment 15 Atin Mukherjee 2016-09-17 13:46:41 UTC

Fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4

Comment 18 Sweta Anandpara 2016-11-07 06:33:12 UTC

Tested and verified this on the build glusterfs-3.8.4-3.el7rhgs.x86_64

Followed the steps mentioned in the description multiple times, with hardlinks created at various directory levels, and was able to successfully recover everytime. 
Did see an issue with scrubbed/skipped files #count, but that is not related to the issue this BZ was raised. 

Moving this BZ to verified in 3.2. The console logs are attached.

Comment 19 Sweta Anandpara 2016-11-07 06:34:53 UTC

Created attachment 1217863 [details]
Server and client logs

Comment 23 errata-xmlrpc 2017-03-23 05:34:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.