Red Hat Bugzilla – Bug 408631
GFS2: [RFE] Improve libgfs2
Last modified: 2015-05-18 11:25:04 EDT
There are a number of improvements whuch ought to be made to libgfs2 in order to
turn it into a proper library. Probably we'd want to do them as seperate bugs,
but for now I'm adding all the ideas here just to keep them together until there
is time to implement them.
o We should #include <features.h> into the libgfs2 header file and make use of
the various macros it provides
o We should rename all the externally accessible functions to be of the form
o We should use libtool to create a proper shared library, not just a static
o We should audit the code for any GPL code thats sneeked in. I presume that
we'd want an LGPL license for the library?
o Also we should have just _one_ list implemtation not the two we seem to
expose at the moment. Perhaps we should use the glib one? Or better still just
change the interfaces so that we don't need to expose the list implementation.
o Remove the #include "copyright.cf" since the message is out of date and
applies to the programs and not the library itself.
o Audit so that only one set of types is exposed to library users. Currently we
have both the Linux kernel set and the stdint set. This also means that we
probably don't want to expose the on-disk structures directly to the library
users.... need to think about that a bit to ensure we do the right thing.
o Get rid of die & zalloc which don't belong in the library (replace the latter
with calloc for example)
o Add bindings for a scripting language (Python?)
Obviously thats a big list and a lot of its longer term stuff, but I just wanted
to have somewhere to keep this TOOD list so that it doesn't get forgotten.
> o We should use libtool to create a proper shared library, not just a static
This is not required. It's a one line change in the actual Makefile to turn the
build into a shared library.
> o We should audit the code for any GPL code thats sneeked in. I presume that
> we'd want an LGPL license for the library?
Code review is of course good, but all our libraries are already LGPL by
> o Also we should have just _one_ list implemtation not the two we seem to
> expose at the moment. Perhaps we should use the glib one? Or better still just
> change the interfaces so that we don't need to expose the list implementation.
We have several different list implementations around the whole stack. It's a
good start if each subproject would turn to one only. After that we can start
collecting common includes in a common location for the sake of avoiding:
a) fixing the same bug N times around
b) avoiding code duplication
> o Remove the #include "copyright.cf" since the message is out of date and
> applies to the programs and not the library itself.
You can remove this already but the file is uptodate now. It's just not used in
> o Add bindings for a scripting language (Python?)
Maybe after the library is turned shared. Not before.
> Obviously thats a big list and a lot of its longer term stuff, but I just wanted
> to have somewhere to keep this TOOD list so that it doesn't get forgotten.
Some tools will still require to link statically with libgfs2 such as mount.
IIRC there are some tools that still duplicate some code from libgfs2 (and
gfs1). You really want to get rid of this code now that we will need to send
master/cluster3 to full QE process again.
PS One last bit.. it is now possible to link gfs2 tools against libgfs1. this
was problematic at the time. You could consider cleaning libgfs1 as well and
kill dup code from gfs2 tools, then start killing obsolete gfs1 tools around
that could be made common.
Of course the original comment in this bz is quite old, which is why some of the
items don't apply any more, since they've already been done.
To follow the Fedora guidelines, all libraries should be shared and static
libraries should only be used in special cases (obviously mount being one of those).
I think we need to take a long and careful look at the interface which the
library presents and tidy that up a bit. A good time to do that would be when we
come to look at the libgfs2 stuff too. I had always assumed that we'd have a
single library which could deal with both versions of the fs eventually, but
maybe two libraries is ok too... not sure about the best way to go on that one.
(In reply to comment #2)
> To follow the Fedora guidelines, all libraries should be shared and static
> libraries should only be used in special cases (obviously mount being one of
Careful not to get confused on this point. Fedora guidelines strongly recommends
to avoid a package to ship static libraries (it is still possible with proper
exception). Nothing says that you cannot build a static library and use it
within your source. What matters is to not distribute it (and we never did as
> I think we need to take a long and careful look at the interface which the
> library presents and tidy that up a bit. A good time to do that would be when we
> come to look at the libgfs2 stuff too. I had always assumed that we'd have a
> single library which could deal with both versions of the fs eventually, but
> maybe two libraries is ok too... not sure about the best way to go on that one.
It really depends on a lot things IME but do we actually need to turn this
library shared? for what purpose? who is going to benefit from it?
So far it has been used only internally to gfs2-tools afaict.
(In reply to comment #3)
> So far it has been used only internally to gfs2-tools afaict.
I use bits of libgfs2 in askant and it makes things fairly tricky - I have to
'install' the static lib myself in order to link against it, hence askant isn't
packaged yet. It's sort of blocked on libgfs2 becoming shared or askant being
added to the cluster tree.
RE: comment #3
Although it has only been used internally, the idea behind this bug was to get
it into a state where it could be exposed externally too. There are a number of
thoughts behind this:
i) We should tidy up the code to the same standard whether its used internally
or externally. The reason for keeping code tidy and sticking to certain rules is
to make it less prone to bugs, and easier to understand and update. Whether its
internal or external I think we still want to have those features
ii) It would make building tools to parse & manipulate gfs2 filesystems much
quicker and easier. The idea of the python interface is so that if we need a
tool to do some specific job, we can rapidly produce it. I see this side of
things becoming increasinly important as we move towards doing more work on
verification and performance analysis of the kernel code.
iii) If the code is of a better standard it will help to encourge outside help.
Nobody likes working on code that they struggle to understand.
iv) It would be useful to have an interface into which writers of graphical
tools (and here I'm assuming probably not written by us) might plug in order to
provide some more user friendly tools in the future.
Btw, I do understand the point made in the Fedora guidelines, but it seems that
the point where we differ is that I don't really beleive in "internal"
interfaces, and would prefer to see everything done as if it is "external".
Obviously this is a long term plan, and not something I expect to see happen
overnight. Having said that I'd view any moves towards this end in the mean time
as a positive step.
Some thoughts regarding libgfs:
I originally recreate libgfs for the purpose of being able to link
a bunch of gfs functions into the gfs2_convert tool so that it could
read both gfs and gfs2 formats. That turned out to be a bad idea.
I changed it so that gfs2_convert had everything it needed so there
were no dependencies between the gfs-utils and gfs2-utils packages.
At the time, I had done some work to make some of the tools, particularly
gfs_fsck, use libgfs as well. I decided to ship the original gfs_fsck
without using libgfs at all because I couldn't justify the added risk to
customers when there was a perfectly good working version without it.
Then I got too caught up in gfs2 concerns and never focused on migrating
any other gfs tools to use libgfs.
So afaik, there aren't any "real" users of libgfs, and I'm not sure
we need or want to encourage community tools using it either. I'd
personally rather see us focus our efforts into libgfs2 and abandon
As for libgfs2:
I'd like to see it shipped as a dynamic library. It would make projects
like Andy's easier, I think, and since gfs2 is in the upstream kernel,
I could see more community use of it.
(In reply to comment #6)
> Some thoughts regarding libgfs:
> I originally recreate libgfs for the purpose of being able to link
> a bunch of gfs functions into the gfs2_convert tool so that it could
> read both gfs and gfs2 formats. That turned out to be a bad idea.
> I changed it so that gfs2_convert had everything it needed so there
> were no dependencies between the gfs-utils and gfs2-utils packages.
Right, this is a problem only in rhel* branches and F9. The problem doesn't
exist anymore for master and stable2 branches that can still receive some heavy
dust cleaning. Specially master as it needs to go through QE in full.
> At the time, I had done some work to make some of the tools, particularly
> gfs_fsck, use libgfs as well. I decided to ship the original gfs_fsck
> without using libgfs at all because I couldn't justify the added risk to
> customers when there was a perfectly good working version without it.
> Then I got too caught up in gfs2 concerns and never focused on migrating
> any other gfs tools to use libgfs.
Some tools use libgfs tho i can't say "how much they could use" without doing
some deep code investigation.
> So afaik, there aren't any "real" users of libgfs, and I'm not sure
> we need or want to encourage community tools using it either. I'd
> personally rather see us focus our efforts into libgfs2 and abandon
> libgfs altogether.
As far as I can tell, gfs1 is not going away anytime soon. We can either just
leave it as it is, or at least remove all the redundant code around. Less to
look at even for simpler bug fixes and maintanance operations.
There was no documentation/comments as to what info was needed, so
I'm cancelling the NEEDINFO flag.
Steve has been doing a great deal of work here; he might as well get
the credit for it. :7)
I'm looking at the libext2fs code and I wonder if we could take inspiration/code from it. For example, the ext2_types.h file seems to take care of the kernel types vs. stdint types problem.
 Generated from http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=lib/ext2fs/ext2_types.h.in
Ideally I'd like something a bit different to that. It would be nice to be able to run sparse over the userland code, so that should be taken into account in any revision of the type system.
The other big question is how to expose the various on-disk structures to the applications. I has hoped to avoid doing that, so that only libgfs2 would depend upon the kernel header. One way around it is to duplicate the on disk structures in libgfs2.h using our own type system.
We also need to ensure that types used by libgfs2 either all start with a common prefix so that we won't have any issues with clashing names, or that they start with one or more underscores (meaning "reserved by the system") if we use them only internally within the library.
We really ought to try and agree on a common prefix for the library so that we can start moving things over to that naming convention. I think that probably lgfs2_ would be a good plan, as its reasonably short, but long enough to be unlikely to clash with another lib. That would give us lgfs2_u8, lgfs2_u16 etc as well as lgfs2_be8, lgfs2_be16 and so forth.
I'm not sure how we could teach sparse about those types, so we need to look into that first.
(In reply to comment #12)
> It would be nice to be able to run sparse over the userland code
Seems to do the trick.
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Some more thoughts on libgfs2:
1. We should get all the user interface code out of the library
- There isn't a lot of it anyway
- We don't want strings in the lib which need translation, so that we can
just translate strings from the programs
- I've yet to see a case where we can't just return an error code
2. I'd like to have a documented list of library entry points
- Plan is to use this later when we start writing tests
- I'd like to have a (simple) test for every library function
- The tests would be run on every library build before linking to the apps
- We can do this with a macro, #define ENTRY_POINT() etc.
- The tests would also declare which entry points they test, so we can
automatically check that all are covered
- Said list could also be used to automatically figure out whether there are
unused library functions
3. There are a number of macros in the header file which could/should be functions
There are a number of functions in the header file which should be in .c files
4. Some functions are returning error codes as if they are kernel code, we should
use the user space standard of return -1 and set errno (assuming that this
hasn't been done anyway via a failed syscall).
5. Audit everything to ensure that its thread safe (not generally true at the moment)
Probably there are more things, but those are the ones which spring to mind from my recent look at the code.
Created attachment 498749 [details]
configure.ac patch to use sparse by default
Just a little proof-of-concept patch to show how we could use sparse (cgcc) to compile gfs2-utils by default. If cgcc isn't found it falls back to gcc.
That sounds like a good plan. The only issue is that we currently lack the annotations required in order to get the best from using sparse, and adding those is a much bigger job.
Recently some more updates were pushed to libgfs2 and there are one or two more pending. So we are moving slowly in the right direction.