Bug 221152
| Summary: | GFS2: Clean up of glock code | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Steve Whitehouse <swhiteho> | ||||||||||||||
| Component: | GFS-kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||
| Priority: | medium | ||||||||||||||||
| Version: | rawhide | CC: | adas, bmarzins, rpeterso, triage | ||||||||||||||
| Target Milestone: | --- | Keywords: | FutureFeature | ||||||||||||||
| Target Release: | --- | ||||||||||||||||
| Hardware: | All | ||||||||||||||||
| OS: | Linux | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | Doc Type: | Enhancement | |||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2010-08-19 13:50:00 UTC | Type: | --- | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Bug Depends On: | 224480 | ||||||||||||||||
| Bug Blocks: | 235349 | ||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Steve Whitehouse
2007-01-02 14:33:05 UTC
Further ideas: 10. Tidy up the glock operations (reduce in number). In particular go_callback is never actually used anywhere. 11. Remove local exclusive lock mode and use a combination of a mutex or rwsem and a glock where local locking is required as well. 12. We really need a proper state machine document for the glock layer. I have a patch for #1 on the list ready and I have some patches in preparation for some of the other items too. The story so far: #1 is done and upstream #5 is partly done (greedy code removed) upstream #7 I've looked into briefly and some changes to lockdep will be required, though not too many I think #10 is done and upstream #11 is done and upstream Async locking and in particular how the lock_dlm module interacts with GFS2 needs looking at again. Another glock task: 13. Make glock acquisition truely async by not doing sync reads in the inode/rgrp callbacks. We probably need submit_bh() along with our own end-I/O callback and a workqueue or similar to run the final bit of the lock state-machine. In bz #224480, as a side effect of fixing the bug. #6 is also done. This bug has been around a while, and its an important one, so adding to the CC list. Created attachment 213371 [details]
First of five glock patches
The patches I'm attaching here are ones which were written a little while ago
and not actually applied to the tree. I'm attaching them to this bz so that
they don't get lost in case we need them in future, they probably won't apply
without changes to the current tree, nor do I have any immediate plans at the
moment to apply them.
Created attachment 213381 [details]
The second of five patches
Created attachment 213391 [details]
The third of five patches
Created attachment 213401 [details]
The fourth of five patches
Created attachment 213411 [details]
The fifth of five patches
Created attachment 241931 [details]
An updated version of patch 5
This applys against the current upstream kernel. I think it might be worth
pushing this patch in sooner rather than later as its basically doing useless
work right in the fast path of the glock locking code.
Any objections?
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. Please do not close this bug. Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping With the latest upstream code, a lot of the originally proposed changes are done. Here is a list of what is left: 1. Documentation - how does it work? :-) 2. Remove one (or both?!) of ->go_xmote_bh and ->go_lock. The latter is more important since it sits in the fast path. 3. Remove ->go_unlock if possible 4. lockdep support (task #7 from the above list) 5. The GL_NOCACHE flag could be replaced by a call to ->go_demote_ok 6. The GL_ASYNC flag could be replaced by a "wait" flag to gfs_glock_nq 7. The GLF_STICKY flag coule be replaced by a call to ->go_demote_ok 8. Remove gfs2_glockd and replace with the workqueue 9. Remove gfs2_scand and replace with the workqueue 10. Investigate why the workqueue seems to be the cause of poor performance wrt lock completions and consider whether its possible to have a per-glock tasklet instead. This issue here is whether we can submit I/O from that context or not. If we were able to eliminate all I/O from the glops callbacks, then that would cease to be a problem, but I'm not sure thats very practical. 11. Make the I/O async wrt the workqueue to avoid blocking I/O requests by waiting for other I/O requests. Has implications for task #10 Time to add some more to the list: 12. Make the glock cache respond to memory pressure by implementing an LRU list of demotable/freeable locks and having a cache shrinker which scans it. See also tasks 8 & 9 above. 13. Remove the "drop locks" feature. Its the wrong thing to do in that situation anyway. 14. Remove the lock modules (merge into main fs). Lock_nolock is already done upstream, lock_dlm is more tricky. With the latest patch for eliminating glockd and scand, I have had another idea to help try and improve the performance of glocks. Ideally we'd like to be able to solve the writeback problem and also that of efficient scheduling of inode reads (via inode_go_lock at the moment). It occurred to me that if we introduced a new concept, that of an elevator based workqueue, we could solve both problems at the same time. Each entry in the workqueue would have an index, which in this case would be determined by the inode number in question. This, in conjunction with an elevator algorithm when the workqueue is being run, should result in I/O being issued in a more optimal order. It also means that we can then deal with both reads and writes with a single solution since both writeback and inode reads are started under the workqueue. From comment #16, items 7-9 are done in the latest patch. From comment #17, items 12 & 13 are done and item 14 is high on the list. There is still a lot of work to do in this area though. This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping I think that just about all of the tasks which I'd intended that we do under this bug have now been done. We have docs on the glock code, we have tracepoints. We don't have the lockdep support for glocks, but that is still tricky to do and we can always look at that as a separate item. We do have the hangalyzer though which goes some way towards providing the same functionality. So I think we can close this now. |