Description of problem: "gluster volume heal <vol-name> info", doesn't responds till self-heal is completed. Consequence : 1. At the laymen level, it looks like the command has hung, though in reality its not. 2. The progress of self-heal neither can be tracked nor its exactly known. But once self-heal is complete, "heal info" responds backs with "number of entries:0" for all bricks, confirming that the self-heal is complete Version-Release number of selected component (if applicable): glusterfs-3.4.0.51rhs.el6rhs How reproducible: Happened all the 5 times I tried Steps to Reproduce: I hit this bug wrt "virt rhev" environment, so providing the same steps 1. Created Trusted Storage Pool with 4 RHSS Nodes (i.e) gluster peer probe <RHSS-Node> 2. Created a distribute-replicate volume with 8 bricks ( 2 brick per RHSS Node ) (i.e) gluster volume create <vol-name> replica 2 <brick1> .. <brick8> 3. Optimized the volume for virt store (i.e) gluster volume set <vol-name> group virt NOTE: Ownership for this volume is also set to 36:36, just for RHEV Env 4. Started the volume (i.e) gluster volume start <vol-name> 5. Used this volume for the Storage Domain ( Data Domain ), in the datacenter (i.e) domain used to store Data Images of VMs 6. Created 2 VMs, and installed them RHEL 6.5 7. Brought down server2(RHSS2) and server4(RHSS4), so that atleast one brick of the replica pair is UP 8. Created 2 more VMs & installed them with RHEL 6.5 9. Powered up the RHSS Nodes that were down as result of step 7 10. Once the nodes are up, trigger self-heal ( though background self-heal is on ) (i.e) gluster volume heal <vol-name> 11. Check the heal info, (i.e) gluster volume heal <vol-name> info Actual results: "gluster volume heal <vol-name> info" - never responded back for >15 minutes Initially I thought that the command has hung/dead After sometime ( after 20 mins ), I got the ouput with "Number of Entries:0" for all bricks Expected results: 1. "gluster volume heal <vol-name> info" should respond immediately 2. Progress of self-heal must be available to the user or there should be some indication that self-heal is going-on in the volume Additional info:
As a way to fix false +ves in heal info, we started taking locks to figure out whether files need self-heal or not. If for all the files self-heal-daemon wins taking lock before self-heal-info, then this can happen. The bug description is a bit inaccurate. The locks are taken per file. lets say we have file a, b, c which need self-heal, both heal info (to find whether it needs self-heal) and self-heal-daemon (to do the actual heal) want to take locks. Now for each file if self-heal-daemon always gets the lock on the files before heal info. It seems like heal info doesn't respond until heal on the volume is complete. There are still false +ves for metadata and entry self-heal.
Please add DocText for this Known Issue.
(In reply to Pranith Kumar K from comment #1) > As a way to fix false +ves in heal info, we started taking locks to figure > out whether files need self-heal or not. If for all the files > self-heal-daemon wins taking lock before self-heal-info, then this can > happen. The bug description is a bit inaccurate. The locks are taken per > file. lets say we have file a, b, c which need self-heal, both heal info (to > find whether it needs self-heal) and self-heal-daemon (to do the actual > heal) want to take locks. Now for each file if self-heal-daemon always gets > the lock on the files before heal info. It seems like heal info doesn't > respond until heal on the volume is complete. There are still false +ves for > metadata and entry self-heal. Pranith, I hit a scenario where, gluster volume heal <vol-name> info" takes more than 50 minutes to respond back. And I think this is too high Check the timestamp available with command, << Note Timestamp here when command was triggered, [Wed Jan 8 20:08:57 UTC 2014 root.37.187:~ ] # gluster volume heal dr-imgstore info Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick1/drdir1/ /c33c0d51-e8f5-409d-9a52-fea048db0645/images/94b388b5-2906-43e8-b372-bd6bfce099f6/ff2fffbf-a14f-4727-9bea-8afa672e9bc8 Number of entries: 1 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick1/drdir1/ /c33c0d51-e8f5-409d-9a52-fea048db0645/images/94b388b5-2906-43e8-b372-bd6bfce099f6/ff2fffbf-a14f-4727-9bea-8afa672e9bc8 Number of entries: 1 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick2/drdir2/ Number of entries: 0 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick2/drdir2/ Number of entries: 0 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/addbrick1/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/addbrick1/ Number of entries: 0 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/addbrick2/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/addbrick2/ Number of entries: 0 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/addbrick3/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/addbrick3/ Number of entries: 0 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/addbrick4/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/addbrick4/ Number of entries: 0 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/addbrick5/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/addbrick5/ Number of entries: 0 <<<<############### long hang [Wed Jan 8 20:59:50 UTC 2014 root.37.187:~ ] # <<< Note timestamp
This may have impact on documentation. Please check the relevant document sections in administration guide.
This bug is introduced after bigbend and fixed before corbett, so no need to add any doctext. Please set doc-text flag to '-'
Tested with glusterfs-3.4.0.57rhs-1.el6rhs "gluster volume heal <vol-name>", doesn't hang for long time but return back immediately. [Tue Jan 14 15:49:30 UTC 2014 root.37.187:~ ] # gluster volume heal drvol Launching heal operation to perform index self heal on volume drvol has been successful Use heal info commands to check status [Tue Jan 14 15:50:51 UTC 2014 root.37.187:~ ] # gluster volume heal drvol info Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick1/drdir1/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/7e4d8003-9248-4e82-8c41-9c4093de1623/b2dc01a7-4833-41c8-9e0f-84102f97b80d - Possibly undergoing heal /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/1217280d-e8d5-4f79-826f-64514e6f5c56 /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/e1503573-d342-442b-902d-f5cb55e48edc /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/fe34200e-3614-4fbf-ab46-62ba6e39b20e Number of entries: 4 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick1/drdir1/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/7e4d8003-9248-4e82-8c41-9c4093de1623/b2dc01a7-4833-41c8-9e0f-84102f97b80d - Possibly undergoing heal Number of entries: 1 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick2/drdir2/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/6d845676-0267-4b44-9856-712feda16035/27f64b50-b1c1-4ce7-a3a6-08523efa1dfc - Possibly undergoing heal /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/1217280d-e8d5-4f79-826f-64514e6f5c56 /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/e1503573-d342-442b-902d-f5cb55e48edc /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/fe34200e-3614-4fbf-ab46-62ba6e39b20e Number of entries: 4 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick2/drdir2/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/6d845676-0267-4b44-9856-712feda16035/27f64b50-b1c1-4ce7-a3a6-08523efa1dfc - Possibly undergoing heal Number of entries: 1 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick3/drdir3/ Number of entries: 0 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick3/drdir3/ Number of entries: 0 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick4/drdir4/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/de44188d-1ed1-40cc-9373-cca801b23d6d/2f8fafc7-d755-4b5a-9cfe-fb0ce83b54d8 - Possibly undergoing heal /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/1217280d-e8d5-4f79-826f-64514e6f5c56 /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/e1503573-d342-442b-902d-f5cb55e48edc /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/fe34200e-3614-4fbf-ab46-62ba6e39b20e Number of entries: 4 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick4/drdir4/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/de44188d-1ed1-40cc-9373-cca801b23d6d/2f8fafc7-d755-4b5a-9cfe-fb0ce83b54d8 - Possibly undergoing heal Number of entries: 1 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/add-dir1/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/add-dir1/ Number of entries: 0 [Tue Jan 14 15:51:12 UTC 2014 root.37.187:~ ] # gluster volume heal drvol info Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick1/drdir1/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/7e4d8003-9248-4e82-8c41-9c4093de1623/b2dc01a7-4833-41c8-9e0f-84102f97b80d - Possibly undergoing heal /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/1217280d-e8d5-4f79-826f-64514e6f5c56 /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/e1503573-d342-442b-902d-f5cb55e48edc /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/fe34200e-3614-4fbf-ab46-62ba6e39b20e Number of entries: 4 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick1/drdir1/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/7e4d8003-9248-4e82-8c41-9c4093de1623/b2dc01a7-4833-41c8-9e0f-84102f97b80d - Possibly undergoing heal Number of entries: 1 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick2/drdir2/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/6d845676-0267-4b44-9856-712feda16035/27f64b50-b1c1-4ce7-a3a6-08523efa1dfc - Possibly undergoing heal /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/1217280d-e8d5-4f79-826f-64514e6f5c56 /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/e1503573-d342-442b-902d-f5cb55e48edc /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/fe34200e-3614-4fbf-ab46-62ba6e39b20e Number of entries: 4 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick2/drdir2/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/6d845676-0267-4b44-9856-712feda16035/27f64b50-b1c1-4ce7-a3a6-08523efa1dfc - Possibly undergoing heal Number of entries: 1 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick3/drdir3/ Number of entries: 0 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick3/drdir3/ Number of entries: 0 Brick rhss1.lab.eng.blr.redhat.com:/rhs/brick4/drdir4/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/de44188d-1ed1-40cc-9373-cca801b23d6d/2f8fafc7-d755-4b5a-9cfe-fb0ce83b54d8 - Possibly undergoing heal /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/1217280d-e8d5-4f79-826f-64514e6f5c56 /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/e1503573-d342-442b-902d-f5cb55e48edc /0218725d-3846-4c6d-b9d7-c05bd55c031b/master/vms/fe34200e-3614-4fbf-ab46-62ba6e39b20e Number of entries: 4 Brick rhss2.lab.eng.blr.redhat.com:/rhs/brick4/drdir4/ /0218725d-3846-4c6d-b9d7-c05bd55c031b/images/de44188d-1ed1-40cc-9373-cca801b23d6d/2f8fafc7-d755-4b5a-9cfe-fb0ce83b54d8 - Possibly undergoing heal Number of entries: 1 Brick rhss3.lab.eng.blr.redhat.com:/rhs/brick1/add-dir1/ Number of entries: 0 Brick rhss4.lab.eng.blr.redhat.com:/rhs/brick1/add-dir1/ Number of entries: 0 As the problem related to this bug is solved, this bug could be closed. But again,"gluster volume heal <vol-name> info" gives out few entries with message, "Possibly undergoing heal", and there are entries without this message. What is the significance of having entries with this message This behavior have to be documented, in that case
Cancelling need_info as requires_doc_text flag is set to '-' based on comment 6.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html