Description of problem: I found a high cpu usage when find command is executed on volume. After quick investigation it looks that lgetxattr calls are responsible for that. Version-Release number of selected component (if applicable): glusterfs-3.7.1-16.el6rhs.x86_64 How reproducible: Always. Just execute find command on volume with large number of files. Steps to Reproduce: 1. use find on volume with large number of files Actual results: High cpu usage Expected results: Lower cpu usage Additional info: Strace shows that a lot of the work of glusterfsd is done by: [pid 16975] lgetxattr("/srv/sda/test/", "trusted.bit-rot.bad-file", 0x0, 0) = -1 ENODATA (No data available) [pid 16975] lgetxattr("/srv/sda/test/", "trusted.bit-rot.signature", 0x0, 0) = -1 ENODATA (No data available) [pid 16975] lgetxattr("/srv/sda/test/", "trusted.bit-rot.version", 0x0, 0) = -1 ENODATA (No data available) In my opinion this calls should be executed only if bit-rot feature is enabled for volume.
Steps to Reproduce: 1. use find on volume with large number of files --> Could you let us know the exact command you run.
Ofcourse, in example: find /mnt/mounted_volume -group root
The patch posted is just the initial patch. The entire bitrot was written with the assumption that file versions are available from the beginning. Now, in order to make it optional and enable only during bitrot enable is like touching entire stable code base and involves changes in lot of places. Though the changes look easy, it might result in few races which we don't know yet. Hence taken it as a stretch goal to get it merged in master first to have soak time and take it downstream later.
Are there any known problems identified with the changes? When I ran some bitrot related tests from the repo (i.e. tests/basic/bitrot) it passed (well I had to make some changes in the test itself to remove the expectation of bitrot xattrs when bitrot is not enabled).
If you see the patch uploaded by me, I have already modified the bitrot related test cases and it passes locally. I have tested that. The bitrot-stub is modified only in certain fops in the patch. And in other fops it is not. Certain fops expect the bitrot context to be present and is failing in that code path (You can see the AFR regression failure). We need to modify those paths as well. And my other concern is, even if you modify that and as soon as you enable the bitrot, if there is race between fops and if bitrot had not been set the context and other fops expects it and fails with EINVAL, how do we handle? Should we consider bitrot context not being present as not a failure scenario? It yes, then we are good.
If you are referring to the situations where bitrot is not enabled for wind path, but enabled while unwinding (where it expects the context to be present), then we can have a work around by saving priv->br_enabled in frame->local and checking that in unwind path to see if bitrot was enabled for that fop or not. If its set of fops initiated by other xlators such as AFR (while doing transactions) and bitrot being enabled for some of the fops of the transaction and disabled for the remaining fops of the transactions (or vice versa) causes problem, then yes, I agree it might be a problem and can cause some problems. But IIUC, AFR does data modification in op phase of the transaction. Other fops of the transactions (lock, pre-op, post-op, unlock) are not data modification fops. So IIUC they should not create any problems (irrespective of whether bitrot is enabled or not).
Yes, I think that could be done and see if it unveils any further issues.
A Gentle reminder
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/103743/
*** Bug 1224216 has been marked as a duplicate of this bug. ***
Have done the first round of basic sanity related to the xattrs present in the file. The two xattrs in question: trusted.bit-rot.signature trusted.bit-rot.version If bitrot is not enabled in the volume, new files do not have the xattrs. Once bitrot is enabled, new files created as well as the existing files, get these xattrs. Disabling bitrot does not do any change to the bitrot related xattrs present on the existing files, but new files created do not get the bitrot-related-xattrs. At the outset, the code seems to be doing okay with enabling and disabling bitrot on the most-basic-functional-unit-test front. Further testing is required from the sanity front, with more number of files and volume-types in question. Will keep this space updated on the progress, when I make some.
Moving the RFE to verified with the observations seen in comment 22. Will be raising separate BZs for any specific issues if I encounter.
Doc Text looks good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774