Bug 221152 - GFS2: Clean up of glock code
Summary: GFS2: Clean up of glock code
Alias: None
Product: Fedora
Classification: Fedora
Component: GFS-kernel (Show other bugs)
(Show other bugs)
Version: rawhide
Hardware: All Linux
Target Milestone: ---
Assignee: Steve Whitehouse
QA Contact:
Keywords: FutureFeature
Depends On: 224480
Blocks: 235349
TreeView+ depends on / blocked
Reported: 2007-01-02 14:33 UTC by Steve Whitehouse
Modified: 2010-08-19 13:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-08-19 13:50:00 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
First of five glock patches (3.52 KB, patch)
2007-10-02 13:37 UTC, Steve Whitehouse
no flags Details | Diff
The second of five patches (1.99 KB, patch)
2007-10-02 13:37 UTC, Steve Whitehouse
no flags Details | Diff
The third of five patches (6.01 KB, patch)
2007-10-02 13:38 UTC, Steve Whitehouse
no flags Details | Diff
The fourth of five patches (6.63 KB, patch)
2007-10-02 13:38 UTC, Steve Whitehouse
no flags Details | Diff
The fifth of five patches (3.82 KB, patch)
2007-10-02 13:38 UTC, Steve Whitehouse
no flags Details | Diff
An updated version of patch 5 (2.20 KB, patch)
2007-10-29 14:37 UTC, Steve Whitehouse
no flags Details | Diff

Description Steve Whitehouse 2007-01-02 14:33:05 UTC
This is a place holder bug to cover a variety of issues in the glock code for
consideration in upstream development. There are a number of issues which can be
addressed in this area:

1. Remove fields from struct gfs2_holder to shrink it (and thus the struct
gfs2_inode). At the top of the hit list are the completion and the gh_error
fields. The completion is the most important one as its huge.
2. Convert hash rwlocks to spinlocks and use RCU for the hash table
3. Introduce some kind of LRU list for potentially demotable glocks rather than
searching the hashes (and maybe make lock demotion depend on memory pressure
more directly?).
4. Merge gl->gl_spin lock into the much less frequently used hash (since RCU)
spinlocks. This avoids extra locking and reduces the size of a glock.
5. Remove the "greedy" code in favour of a solution which doesn't require memory
allocations and which works for _all_ callers rather than for just page faults.
6. Change state machine to not require queued structures for demote requests so
that memory allocation isn't required for demotion (will as a side effect also
slim down the struct gfs2_glock).
7. Add support for Ingo's lockdep for glocks.
8. Any other debugging features required?
9. Work out how to embed a struct address_space in a glock and thus avoid a
second gfs2_inode per "real" inode. Do we then perhaps need two kinds of glock,
one with an address space and one without to avoid wasting too much memory?

Can anybody think of any more?

Comment 1 Steve Whitehouse 2007-01-18 14:30:54 UTC
Further ideas:

10. Tidy up the glock operations (reduce in number). In particular go_callback
is never actually used anywhere.
11. Remove local exclusive lock mode and use a combination of a mutex or rwsem
and a glock where local locking is required as well.
12. We really need a proper state machine document for the glock layer.

I have a patch for #1 on the list ready and I have some patches in preparation
for some of the other items too.

Comment 2 Steve Whitehouse 2007-01-29 14:02:51 UTC
The story so far:

#1 is done and upstream
#5 is partly done (greedy code removed) upstream
#7 I've looked into briefly and some changes to lockdep will be required, though
not too many I think
#10 is done and upstream
#11 is done and upstream

Async locking and in particular how the lock_dlm module interacts with GFS2
needs looking at again.

Comment 3 Steve Whitehouse 2007-02-28 14:42:09 UTC
Another glock task:

13. Make glock acquisition truely async by not doing sync reads in the
inode/rgrp callbacks. We probably need submit_bh() along with our own end-I/O
callback and a workqueue or similar to run the final bit of the lock state-machine.

Comment 4 Steve Whitehouse 2007-03-12 13:16:35 UTC
In bz #224480, as a side effect of fixing the bug. #6 is also done.

Comment 5 Steve Whitehouse 2007-10-02 13:33:54 UTC
This bug has been around a while, and its an important one, so adding to the CC

Comment 6 Steve Whitehouse 2007-10-02 13:37:06 UTC
Created attachment 213371 [details]
First of five glock patches

The patches I'm attaching here are ones which were written a little while ago
and not actually applied to the tree. I'm attaching them to this bz so that
they don't get lost in case we need them in future, they probably won't apply
without changes to the current tree, nor do I have any immediate plans at the
moment to apply them.

Comment 7 Steve Whitehouse 2007-10-02 13:37:35 UTC
Created attachment 213381 [details]
The second of five patches

Comment 8 Steve Whitehouse 2007-10-02 13:38:03 UTC
Created attachment 213391 [details]
The third of five patches

Comment 9 Steve Whitehouse 2007-10-02 13:38:33 UTC
Created attachment 213401 [details]
The fourth of five patches

Comment 10 Steve Whitehouse 2007-10-02 13:38:59 UTC
Created attachment 213411 [details]
The fifth of five patches

Comment 11 Steve Whitehouse 2007-10-29 14:37:04 UTC
Created attachment 241931 [details]
An updated version of patch 5

This applys against the current upstream kernel. I think it might be worth
pushing this patch in sooner rather than later as its basically doing useless
work right in the fast path of the glock locking code.

Any objections?

Comment 13 Bug Zapper 2008-04-03 18:50:33 UTC
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 14 Steve Whitehouse 2008-04-04 07:27:54 UTC
Please do not close this bug.

Comment 15 Bug Zapper 2008-05-14 02:32:02 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:

Comment 16 Steve Whitehouse 2008-06-02 09:39:37 UTC
With the latest upstream code, a lot of the originally proposed changes are
done. Here is a list of what is left:

 1. Documentation - how does it work? :-)
 2. Remove one (or both?!) of ->go_xmote_bh and ->go_lock. The latter is more
important since it sits in the fast path.
 3. Remove ->go_unlock if possible
 4. lockdep support (task #7 from the above list)
 5. The GL_NOCACHE flag could be replaced by a call to ->go_demote_ok
 6. The GL_ASYNC flag could be replaced by a "wait" flag to gfs_glock_nq
 7. The GLF_STICKY flag coule be replaced by a call to ->go_demote_ok
 8. Remove gfs2_glockd and replace with the workqueue
 9. Remove gfs2_scand and replace with the workqueue
 10. Investigate why the workqueue seems to be the cause of poor performance wrt
lock completions and consider whether its possible to have a per-glock tasklet
instead. This issue here is whether we can submit I/O from that context or not.
If we were able to eliminate all I/O from the glops callbacks, then that would
cease to be a problem, but I'm not sure thats very practical.
 11. Make the I/O async wrt the workqueue to avoid blocking I/O requests by
waiting for other I/O requests. Has implications for task #10

Comment 17 Steve Whitehouse 2008-06-03 08:33:29 UTC
Time to add some more to the list:

 12. Make the glock cache respond to memory pressure by implementing an LRU list
of demotable/freeable locks and having a cache shrinker which scans it. See also
tasks 8 & 9 above.
 13. Remove the "drop locks" feature. Its the wrong thing to do in that
situation anyway.
 14. Remove the lock modules (merge into main fs). Lock_nolock is already done
upstream, lock_dlm is more tricky.

Comment 18 Steve Whitehouse 2008-11-20 15:11:09 UTC
With the latest patch for eliminating glockd and scand, I have had another idea to help try and improve the performance of glocks. Ideally we'd like to be able to solve the writeback problem and also that of efficient scheduling of inode reads (via inode_go_lock at the moment).

It occurred to me that if we introduced a new concept, that of an elevator based workqueue, we could solve both problems at the same time. Each entry in the workqueue would have an index, which in this case would be determined by the inode number in question. This, in conjunction with an elevator algorithm when the workqueue is being run, should result in I/O being issued in a more optimal order. It also means that we can then deal with both reads and writes with a single solution since both writeback and inode reads are started under the workqueue.

From comment #16, items 7-9 are done in the latest patch. From comment #17, items 12 & 13 are done and item 14 is high on the list. There is still a lot of work to do in this area though.

Comment 19 Bug Zapper 2008-11-26 01:52:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:

Comment 20 Steve Whitehouse 2010-08-19 13:50:00 UTC
I think that just about all of the tasks which I'd intended that we do under this bug have now been done.

We have docs on the glock code, we have tracepoints. We don't have the lockdep support for glocks, but that is still tricky to do and we can always look at that as a separate item. We do have the hangalyzer though which goes some way towards providing the same functionality.

So I think we can close this now.

Note You need to log in before you can comment on or make changes to this bug.