Bug 1700078 - disablle + reenable of bitrot leads to files marked as bad
Summary: disablle + reenable of bitrot leads to files marked as bad
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-15 18:47 UTC by Raghavendra Bhat
Modified: 2019-04-25 05:20 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-04-25 05:20:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 22360 0 None Merged features/bit-rot: Unconditionally sign the files during oneshot crawl 2019-04-25 05:20:19 UTC
Gluster.org Gerrit 22572 0 None Open features/bit-rot-stub: clean the mutex after cancelling the signer thread 2019-04-17 03:28:00 UTC

Description Raghavendra Bhat 2019-04-15 18:47:34 UTC
Description of problem:

Disable and reenable of bit-rot feature on a gluster volume can lead to a situation where some files are marked as bad (even though they are not corrupted).

Consider a gluster volume with bit-rot feature enabled and consisting of files that have been signed with the checksum. Now, disable the feature. The files still continue to contain the version and signature extended attributes. At this stage if some files are modified and later bit-rot feature is reenabled, then those files are which were modified while the feature was off, will be marked as bad by the scrubber after the feature is reenabled.

This happens because of this reason.

The modification of the file(s), while the feature was off, would not have resulted in calculation of the checksum of the file and that checksum being saved as part of the signature xattr. 

And the bit-rot daemon whenever is spawned (either restart or regular start due to feature enable) does a one shot crawling of the entire volume, where it skips calculating the checksum of the files (and saving that checksum as part of signature) if any file contains those xattrs already (assuming their value should be correct).

So when scrubber does its job, it finds the on disk checksum and the calculated checksum to be different. This makes scrubber mark such a file as bad.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a gluster volume, start it and mount it
2. Enable bit-rot feature
3. Create a file with some data
4. Wait till the file is properly signed (it takes 2 minutes for proper 
   signature to be saved as an xattr)
5. disable bit-rot
6. Modify the contents of the file.
7. Reenable the bit-rot feature
8. Start on-demand scrubbing.


Actual results:

File is marked as bad even though no corruption has happened due to bit-rot.

Expected results:


Additional info:

Comment 1 Worker Ant 2019-04-15 18:50:02 UTC
REVIEW: https://review.gluster.org/22572 (features/bit-rot-stub: clean the mutex after cancelling the signer thread) posted (#1) for review on master by Raghavendra Bhat

Comment 2 Worker Ant 2019-04-17 03:28:01 UTC
REVIEW: https://review.gluster.org/22572 (features/bit-rot-stub: clean the mutex after cancelling the signer thread) merged (#3) on master by Amar Tumballi

Comment 3 Worker Ant 2019-04-17 18:33:21 UTC
REVIEW: https://review.gluster.org/22360 (features/bit-rot: Unconditionally sign the files during oneshot crawl) posted (#2) for review on master by Raghavendra Bhat

Comment 4 Worker Ant 2019-04-25 05:20:20 UTC
REVIEW: https://review.gluster.org/22360 (features/bit-rot: Unconditionally sign the files during oneshot crawl) merged (#7) on master by Kotresh HR


Note You need to log in before you can comment on or make changes to this bug.