Bug 978335
Summary: | afr: Self-Heal of directories are unsuccessful with error : "Non Blocking entrylks failed" | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | ||||||||||
Component: | replicate | Assignee: | Ravishankar N <ravishankar> | ||||||||||
Status: | CLOSED EOL | QA Contact: | Rahul Hinduja <rhinduja> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 2.1 | CC: | nsathyan, rhs-bugs, sdharane, spandura, storage-qa-internal, surs, vagarwal, vbellur | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2015-12-03 17:12:25 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Rahul Hinduja
2013-06-26 11:52:35 UTC
Discussions with Rahul and Pranith on this. Following is the summary and action items: 1) Rahul has seen cases where an entry just lies in xatrrop directory and does not get healed even after couple of hours. A lookup on the original file then causes the heal to kick in. This is on the 3.4.0.12 branch of rhs. I was not able to reproduce this issue though. So he will try to reproduce this and get the self heal state dumps and also turn on higher logging level on self heal for further debugging. 2) The present way of self heal is pretty undeterministic in the sense that gfids are picked up in the xattrop directory in fifo fashion. Dependencies are not taken care and that is the reason why multiple crawls are reuqired. This causes problems in estimating the time taken for self heal as well as reporting. We need a mechanism to build structure among the entries to be healed. This will be taken as a feature extension to self heal and discussion will be taken on gluster devel mailing list. Created attachment 774161 [details]
Script1.sh
Created attachment 774162 [details]
Script2.sh
Created attachment 774652 [details]
self_heal_all_file_types_script1.sh
Created attachment 774653 [details]
self_heal_all_file_types_script2.sh
It looks like AFR has a problem in removing the xattrop entry when any of the directories does not have the dht related xattr key-value; when heal happens 2 things are seen which I have been able to reproduce locally: 1) the dht key-value does not get restored 2) the index gfid file is not removed from the indices directory. Triaging the issue. This issue can be bypassed by making sure that before we power down the machines in the test setup, we do a sync so that everything gets written to the disk. If that solution is there, do we still need to make this a blocker for big bend? One more thing: this entry in xattrop directory is not malignant. It will be cleared on next heal. Also the heal on the directory has hapenned except that this entry still remains. We need to take this also into account before deciding if this is a blocker. AM able to reproduce the exact issue as quoted by spandura. These are the steps: 1) Have a 2*1 replicate cluster 2) create a directory "top_dir" 3) create a new directory under "top_dir" say "test_dir". 4) bring down the brick process on one of the bricks 5) remove the soft link created for the test_dir(under .glusterfs directory) on the backend directory of the brick whose process is done. (this is what seems to be hapening when a brick volume is shutdown improperly) 6) create a new directory under test_dir from client 7) bring the brick process up You will see that on the brick which did not go down, the xattrop directory will have entries that will never get healed by self heal daemon. The reason being is that for self heal for directories, the .glusterfs soft link has to be there on all the bricks. Else self heal daemon will fail. Here since we are not entry healing, the gfid will never get healed; hence this issue. Fix for the same under discussion as this a design related bug. Targeting for 2.1.z U2 (Corbett) release. Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release. |