Bug 1298258 - [RFE] Bit-rot lgetxattrs calls on volume with bit-rot disabled
Summary: [RFE] Bit-rot lgetxattrs calls on volume with bit-rot disabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: bitrot
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: RHGS 3.3.0
Assignee: Kotresh HR
QA Contact: Sweta Anandpara
URL:
Whiteboard:
: 1224216 (view as bug list)
Depends On: 1359599
Blocks: 1223636 1406743 1417138 1417145
TreeView+ depends on / blocked
 
Reported: 2016-01-13 15:55 UTC by Krzysztof Pawłowski
Modified: 2020-04-15 14:21 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.8.4-23
Doc Type: Enhancement
Doc Text:
Object versioning is now enabled only when bitrot detection is in use, improving performance when bitrot detection is disabled.
Clone Of:
Environment:
Last Closed: 2017-09-21 04:25:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Krzysztof Pawłowski 2016-01-13 15:55:07 UTC
Description of problem:

I found a high cpu usage when find command is executed on volume. After quick investigation it looks that lgetxattr calls are responsible for that.

Version-Release number of selected component (if applicable): glusterfs-3.7.1-16.el6rhs.x86_64


How reproducible:
Always. Just execute find command on volume with large number of files.

Steps to Reproduce:
1. use find on volume with large number of files

Actual results:
High cpu usage

Expected results:
Lower cpu usage


Additional info:
Strace shows that a lot of the work of glusterfsd is done by:
[pid 16975] lgetxattr("/srv/sda/test/", "trusted.bit-rot.bad-file", 0x0, 0) = -1 ENODATA (No data available)
[pid 16975] lgetxattr("/srv/sda/test/", "trusted.bit-rot.signature", 0x0, 0) = -1 ENODATA (No data available)
[pid 16975] lgetxattr("/srv/sda/test/", "trusted.bit-rot.version", 0x0, 0) = -1 ENODATA (No data available)

In my opinion this calls should be executed only if bit-rot feature is enabled for volume.

Comment 2 Bhaskarakiran 2016-01-14 09:20:50 UTC
Steps to Reproduce:
1. use find on volume with large number of files

--> Could you let us know the exact command you run.

Comment 3 Krzysztof Pawłowski 2016-01-14 09:24:03 UTC
Ofcourse, in example: find /mnt/mounted_volume -group root

Comment 8 Kotresh HR 2016-08-08 07:43:18 UTC
The patch posted is just the initial patch. The entire bitrot was written
with the assumption that file versions are available from the beginning.
Now, in order to make it optional and enable only during bitrot enable
is like touching entire stable code base and involves changes in lot of
places. Though the changes look easy, it might result in few races which we don't know yet. Hence taken it as a stretch goal to get it merged in master first
to have soak time and take it downstream later.

Comment 9 Raghavendra Bhat 2016-08-11 19:59:41 UTC
Are there any known problems identified with the changes? When I ran some bitrot related tests from the repo (i.e. tests/basic/bitrot) it passed (well I had to make some changes in the test itself to remove the expectation of bitrot xattrs when bitrot is not enabled).

Comment 10 Kotresh HR 2016-08-12 05:34:03 UTC
If you see the patch uploaded by me, I have already modified the bitrot related test cases and it passes locally. I have tested that. The bitrot-stub is modified only in certain fops in the patch. And in other fops it is not. Certain fops expect the bitrot context to be present and is failing in that code path (You can see the AFR regression failure). We need to modify those paths as well. And my other concern is, even if you modify that and as soon as you enable the bitrot, if there is race between fops and if bitrot had not been set the context and other fops expects it and fails with EINVAL, how do we handle? Should we consider bitrot context not being present as not a failure scenario? It yes, then we are good.

Comment 11 Raghavendra Bhat 2016-08-16 14:26:39 UTC
If you are referring to the situations where bitrot is not enabled for wind path, but enabled while unwinding (where it expects the context to be present), then we can have a work around by saving priv->br_enabled in frame->local and checking that in unwind path to see if bitrot was enabled for that fop or not.

If its set of fops initiated by other xlators such as AFR (while doing transactions) and bitrot being enabled for some of the fops of the transaction and disabled for the remaining fops of the transactions (or vice versa) causes problem, then yes, I agree it might be a problem and can cause some problems.

But IIUC, AFR does data modification in op phase of the transaction. Other fops of the transactions (lock, pre-op, post-op, unlock) are not data modification fops. So IIUC they should not create any problems (irrespective of whether bitrot is enabled or not).

Comment 12 Kotresh HR 2016-08-17 06:31:14 UTC
Yes, I think that could be done and see if it unveils any further issues.

Comment 13 Raghavendra Bhat 2016-08-25 13:48:20 UTC
A Gentle reminder

Comment 19 Kotresh HR 2017-04-19 09:44:50 UTC
Downstream patch:

https://code.engineering.redhat.com/gerrit/#/c/103743/

Comment 21 Kotresh HR 2017-04-25 09:04:45 UTC
*** Bug 1224216 has been marked as a duplicate of this bug. ***

Comment 22 Sweta Anandpara 2017-04-25 09:41:42 UTC
Have done the first round of basic sanity related to the xattrs present in the file. 

The two xattrs in question:
trusted.bit-rot.signature
trusted.bit-rot.version

If bitrot is not enabled in the volume, new files do not have the xattrs. Once bitrot is enabled, new files created as well as the existing files, get these xattrs. Disabling bitrot does not do any change to the bitrot related xattrs present on the existing files, but new files created do not get the bitrot-related-xattrs.

At the outset, the code seems to be doing okay with enabling and disabling bitrot on the most-basic-functional-unit-test front. 

Further testing is required from the sanity front, with more number of files and volume-types in question. Will keep this space updated on the progress, when I make some.

Comment 23 Sweta Anandpara 2017-04-25 13:56:02 UTC
Moving the RFE to verified with the observations seen in comment 22. Will be raising separate BZs for any specific issues if I encounter.

Comment 25 Kotresh HR 2017-08-16 03:54:38 UTC
Doc Text looks good.

Comment 27 errata-xmlrpc 2017-09-21 04:25:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 28 errata-xmlrpc 2017-09-21 04:53:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.