Red Hat Bugzilla – Bug 480002
GFS2: Document issues relating to initscripts and umount
Last modified: 2010-03-14 17:29:47 EDT
We need to document a couple of issues (preferably as a section of one of the existing documents) to describe a couple of problems which we are not likely to be able to fix before RHEL 5.4. These issues also apply to RHEL 5.3 as well, and indeed GFS, not just GFS2.
Issue 1. Initscripts & mount order
Ordering of filesystem mounts is determined by a number of factors:
1. For filesytems without the _netdev flag and without their own initscripts, the ordering is defined by the order in which the filesystems appear in fstab
2. For filesystems with their own initscripts or with the _netdev flag, the mount comes later on and mounts of the same fstype are mounted together.
Note that the man page for fstab doesn't really explain that at all. The issue with GFS2 is that people will want to use bind mounts with GFS2 and that might mean either (a) creating bind mounts to make a filesystem on which to mount GFS2 or (b) to bind mount GFS2. This means that there is no correct ordering wrt bind mounts and GFS2 mounts.
It is possible to solve (a) by putting the bind mounts before any GFS2 filesystems in fstab and after any local filesystems which might be required. To solve (b) some people have resorted to tagging bind mounts with _netdev, however this is not a good solution really and I'm not sure that we should encourage that at all. It might be better to just explain how to create a custom init script to do the bind mounting.
Issue 2. Umount hangs relating to GFS/GFS2 filesystems
Filesystems which have been mounted manually, and not via the GFS2 initscripts will not be known to the system at umount time. As a result the GFS2 script will not umount the filesystem, the standard initscripts then kill off all remaining user processes (including gfs_controld, etc) and try to umount the filesystem. This then fails due to the lack of gfs_controld and results in a hung system.
The work arounds are:
a) Always do a hardware reboot if this occurs or
b) Always use the initscripts to mount the filesystem or
c) Rememeber to always umount manually any filesystems which have been mounted manually
In the event of the filesystem hanging, it is very unlikely that any data will be lost since it will have been synced earlier in the shutdown process.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Updating PM score.
I have this in my list of 5.4 work but I haven't spent much time evaluating it yet. (My deadline is much later than development's deadline). I'm currently looking at my 4.8 work, in fact.
Do these issues apply to RHEL 4.8 as well? I'm assuming so. If so, then I'll clone this bug for myself to RHEL 4.8 and work on it over the next couple of weeks. Mostly likely I'll have a question or two, but even if not I'll need to pass it by you for review.
I've changed the component here from kernel to documentation-cluster -- having finally realized that I was getting pinged on this as a kernel code deadline because of how it is categorized. I believe that by changing this component to documentation-cluster it will appear on the correct ping lists.
I have updated the relevant new sections of the GFS2 manual as per Bob Peterson's review comments and I have tested the sample initscript (finding a problem). I have sent the udpated GFS2 document (with these new sections highlighted) out for general review on June 25 with a two-week review period. So we're still on track for having this information in the RHEL 5.4 release of the GFS2 manual.
Steven, please can you give an update on progress? This is the last bug on my 5.4 list.
(In reply to comment #13)
> Steven, please can you give an update on progress? This is the last bug on my
> 5.4 list.
The status in Comment #11 still holds.
I added new sections with new info in June to the GFS2 doc, and sent this note on June 12:
I've made a first pass at writing up a couple of new sections for the GFS2 manual to address Bug 480002: Document issues relating to initscripts and umount. I'm hoping you can give this a review.
To address the issue of GFS2 file systems hanging at umount if they have been manually mounted, I added a section to the document called "Special Considerations when Mounting GFS2 File Systems". I think that's a somewhat awkward title, so if you have a more elegant suggestion I'm all ears. This is pretty much what you wrote in the bug report. My concern here is that it calls out the gfs_controld process -- in a document where we don't elsewhere get into that level of explanation of the actual GFS2 processes. I was tempted to rewrite this slightly so that it doesn't mention the specific process -- but that seemed too vague.
In any case: Do you think it's ok to talk about the underlying processes like this in what is otherwise a manual of basic administrator commands?
Here's that new section:
The bigger section to work on for me -- because it involved learning a lot more about initialization process and order than I previously knew -- was the section about gfs2 bind mounts and mount order consideration. Bob Peterson helped me a great deal with this, and he even wrote up an example custom init script to perform bind mounts onto a GFS2 file system.
Here's that new section. Please note that I haven't tested Bob's script yet -- I wanted to run it by you to be sure it shows what we want to show before doing that, as testing this script will be time-consuming for me (although a great learning process). Also: I don't really say much about init scripts in general -- this is certainly not a general piece of documentation about writing init scripts and all that entails -- but I do provide some of the basic info explaining this one in particular. Any additional information or clarification you think belongs in this section would be appreciated.
This is the new section, called "Bind Mounts and File System Mount Order". Again, if you have a more elegant suggestion for a title I'd definitely consider it.
Once we're satisfied with this new information, I can see about putting it in the GFS manual as well, if you think that's necessary.
Coming up next: Expanding the documentation on gfs2_jadd, as per Bug 498292.
I didn't hear back from you, but Bob reviewed my stuff and we fixed the custom inti script and I put it into the GFS2 manual draft which I sent out for general review on June 25 (I see that I used filesystem-dept-list but did not send this to you explicitly):
The GFS2 Manual for RHEL 5.4 is available for review. For this release I have added the following new sections, which I will add to the GFS document as well after incorporating review comments:
Section 3.4: Special Considerations when Mounting GFS2 File Systems. This section addresses Bug 480002 and notes that if you mount a GFS2 file system manually you must unmount it manually.
Section 3.13: Bind Mounts and File System Mount Order. This section also addresses Bug 480002 and describes how to configure a system to use bind mounts of non-GFS2 file systems onto GFS2 file systems. This section includes information on writing custom init scripts, and includes an example script.
Section 3.14: The GFS2 Withdraw Function. This section addresses longtime Bug 458604. We had not previously documented GFS/GFS2 withdraw.
I have also updated Section 3.7, Adding Journals to a File System, to address Bug 498292. The section now calls out the fact that you cannot add journals to a GFS2 file system that is full, even if you extend the underlying volume. This note will obviously not apply to the GFS document.
In addition to the new sections, it might be time to re-visit the introductory information in chapter one, which was written early in the GFS2 development process. In particular, there may be areas in which we can improve Section 1.2. Differences between GFS and GFS2.
Thank you for any help or advice you can offer. Please return comments to me by Wednesday, July 8.
The document is available for review in html format at the following location:
From that link, you can also find a PDF version of the document by clicking on the pdf link on the menu on the left side of the screen, under the Global File System entry for RHEL 5.
Abhi sent me some general review comments, but that's all I got -- Bob had already reviewed the new stuff he helped me with.
So the status is that these new sections are done for GFS2 and have been reviewed by Bob (and Abhi). I haven't added them to the GFS manual but the sections that address these bugs should be the same.
I think that looks ok to me. The only possible issue is that withdraw isn't very reliable as there are many errors which will cause the withdraw to fail (node hangs) and requires power fencing to bring it back into the cluster. It may just be that we need to improve the error handling though, rather than any change to the manual at this stage.
(In reply to comment #15)
> I think that looks ok to me. The only possible issue is that withdraw isn't
> very reliable as there are many errors which will cause the withdraw to fail
> (node hangs) and requires power fencing to bring it back into the cluster. It
> may just be that we need to improve the error handling though, rather than any
> change to the manual at this stage.
OK, I'll add the relevant sections to the GFS manual as well (that is, not the new section on adding journals to a file system, which is GFS2 specific), test the document build, and check it in in preparation for RHEL 5.4 (which should then allow me to close this Bug).
ok, sounds good
(In reply to comment #17)
> ok, sounds good
I've added the new sections on mounting GFS file systems and on the withdraw function to the GFS draft for RHEL 5.4 which I'll be building today on the draft staging area.
I did not add the new section on adding journals to a file system, and I also did not add the new section on bind mounts and file system mount order to the GFS manual. In looking at the manuals I realized that the GFS manual documents context-dependent path names (and does not document bind mounts), while the GFS2 manual documents bind mounts (and notes that context-dependent path names are not supported). The new section on Bind Mounts and File System Mount order is really an extension to the section on Bind Mounts, so it didn't really belong in the existing GFS manual.
With the release of RHEL 5.4, I am closing this bug.