This is a place holder bug to cover a variety of issues in the glock code for consideration in upstream development. There are a number of issues which can be addressed in this area: 1. Remove fields from struct gfs2_holder to shrink it (and thus the struct gfs2_inode). At the top of the hit list are the completion and the gh_error fields. The completion is the most important one as its huge. 2. Convert hash rwlocks to spinlocks and use RCU for the hash table 3. Introduce some kind of LRU list for potentially demotable glocks rather than searching the hashes (and maybe make lock demotion depend on memory pressure more directly?). 4. Merge gl->gl_spin lock into the much less frequently used hash (since RCU) spinlocks. This avoids extra locking and reduces the size of a glock. 5. Remove the "greedy" code in favour of a solution which doesn't require memory allocations and which works for _all_ callers rather than for just page faults. 6. Change state machine to not require queued structures for demote requests so that memory allocation isn't required for demotion (will as a side effect also slim down the struct gfs2_glock). 7. Add support for Ingo's lockdep for glocks. 8. Any other debugging features required? 9. Work out how to embed a struct address_space in a glock and thus avoid a second gfs2_inode per "real" inode. Do we then perhaps need two kinds of glock, one with an address space and one without to avoid wasting too much memory? Can anybody think of any more?
Further ideas: 10. Tidy up the glock operations (reduce in number). In particular go_callback is never actually used anywhere. 11. Remove local exclusive lock mode and use a combination of a mutex or rwsem and a glock where local locking is required as well. 12. We really need a proper state machine document for the glock layer. I have a patch for #1 on the list ready and I have some patches in preparation for some of the other items too.
The story so far: #1 is done and upstream #5 is partly done (greedy code removed) upstream #7 I've looked into briefly and some changes to lockdep will be required, though not too many I think #10 is done and upstream #11 is done and upstream Async locking and in particular how the lock_dlm module interacts with GFS2 needs looking at again.
Another glock task: 13. Make glock acquisition truely async by not doing sync reads in the inode/rgrp callbacks. We probably need submit_bh() along with our own end-I/O callback and a workqueue or similar to run the final bit of the lock state-machine.
In bz #224480, as a side effect of fixing the bug. #6 is also done.
This bug has been around a while, and its an important one, so adding to the CC list.
Created attachment 213371 [details] First of five glock patches The patches I'm attaching here are ones which were written a little while ago and not actually applied to the tree. I'm attaching them to this bz so that they don't get lost in case we need them in future, they probably won't apply without changes to the current tree, nor do I have any immediate plans at the moment to apply them.
Created attachment 213381 [details] The second of five patches
Created attachment 213391 [details] The third of five patches
Created attachment 213401 [details] The fourth of five patches
Created attachment 213411 [details] The fifth of five patches
Created attachment 241931 [details] An updated version of patch 5 This applys against the current upstream kernel. I think it might be worth pushing this patch in sooner rather than later as its basically doing useless work right in the fast path of the glock locking code. Any objections?
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
Please do not close this bug.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
With the latest upstream code, a lot of the originally proposed changes are done. Here is a list of what is left: 1. Documentation - how does it work? :-) 2. Remove one (or both?!) of ->go_xmote_bh and ->go_lock. The latter is more important since it sits in the fast path. 3. Remove ->go_unlock if possible 4. lockdep support (task #7 from the above list) 5. The GL_NOCACHE flag could be replaced by a call to ->go_demote_ok 6. The GL_ASYNC flag could be replaced by a "wait" flag to gfs_glock_nq 7. The GLF_STICKY flag coule be replaced by a call to ->go_demote_ok 8. Remove gfs2_glockd and replace with the workqueue 9. Remove gfs2_scand and replace with the workqueue 10. Investigate why the workqueue seems to be the cause of poor performance wrt lock completions and consider whether its possible to have a per-glock tasklet instead. This issue here is whether we can submit I/O from that context or not. If we were able to eliminate all I/O from the glops callbacks, then that would cease to be a problem, but I'm not sure thats very practical. 11. Make the I/O async wrt the workqueue to avoid blocking I/O requests by waiting for other I/O requests. Has implications for task #10
Time to add some more to the list: 12. Make the glock cache respond to memory pressure by implementing an LRU list of demotable/freeable locks and having a cache shrinker which scans it. See also tasks 8 & 9 above. 13. Remove the "drop locks" feature. Its the wrong thing to do in that situation anyway. 14. Remove the lock modules (merge into main fs). Lock_nolock is already done upstream, lock_dlm is more tricky.
With the latest patch for eliminating glockd and scand, I have had another idea to help try and improve the performance of glocks. Ideally we'd like to be able to solve the writeback problem and also that of efficient scheduling of inode reads (via inode_go_lock at the moment). It occurred to me that if we introduced a new concept, that of an elevator based workqueue, we could solve both problems at the same time. Each entry in the workqueue would have an index, which in this case would be determined by the inode number in question. This, in conjunction with an elevator algorithm when the workqueue is being run, should result in I/O being issued in a more optimal order. It also means that we can then deal with both reads and writes with a single solution since both writeback and inode reads are started under the workqueue. From comment #16, items 7-9 are done in the latest patch. From comment #17, items 12 & 13 are done and item 14 is high on the list. There is still a lot of work to do in this area though.
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I think that just about all of the tasks which I'd intended that we do under this bug have now been done. We have docs on the glock code, we have tracepoints. We don't have the lockdep support for glocks, but that is still tricky to do and we can always look at that as a separate item. We do have the hangalyzer though which goes some way towards providing the same functionality. So I think we can close this now.