Bug 228540
Summary: | Need to add gfs2_tool lockdump support to gfs2 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Robert Peterson <rpeterso> |
Component: | kernel | Assignee: | Robert Peterson <rpeterso> |
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.1 | CC: | dzickus, jbacik, kanderso, lwang, rkenna, swhiteho |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-11-07 19:40:17 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 224480, 235349 | ||
Bug Blocks: | 221300 | ||
Attachments: |
Description
Robert Peterson
2007-02-13 17:40:03 UTC
Created attachment 148091 [details]
First go at a fix
This patch adds the functionality back using debugfs. To get the
glock dump, do something like this:
mkdir /debug
mount -t debugfs none /debug
mount -tgfs2 /dev/trin_vg/trin_lv /mnt/gfs2
cat /debug/gfs2/bob_cluster2\:bobs_gfs
We can easily make gfs2_tool do a cat of the appropriate file to
emulate its previous behavior.
Steve Whitehouse: Please review the patch and see if you have any
suggestions or improvements. I have three points of concern:
1. There are changes in the patch that aren't in the git tree I
pulled. This might be due to the fact that I pulled gfs2 from
Linus' git tree earlier and ported it back to your git tree
because I had problems with the qla2xxx tree in the Linus version.
I need to know what to keep and what to throw out.
2. I introduced a somewhat ugly kludge around the problem of not having
a vseq_printf function. I used an intermediate vsprintf to a string.
Ugly, I'll admit, but how will it be perceived by the upstream
community? I deemed this less ugly than duplicating code between the
printk and seq_printf versions of the functions.
Perhaps you can suggest an alternative.
3. The dump_holder function has a print_symbol function that I don't
understand its purpose. Perhaps you can let me know. It doesn't
seem to affect the output any. My fear is that we might need to
implement a special seq_printf version.
Added Steve Whitehouse to the cc list. Should have done that before I added comment #3, which I wanted him to see. To try and address each point: 1. I'm not quite sure that I follow what you did, but possibly you pulled some changes without doing a checkout -f to ensure that your copy of the source was clean before you started. Either way it doesn't look to tricky to cut out the unwanted bits by hand. So far as I can see, there are no overlaps in the patches. 2. Yes, its ugly :-) How do you know 256 bytes is (a) enough and (b) ok for the stack? 3. print_symbol prints the value of a kernel symbol, it takes a printf style argument with a single %s which will be replaced by the symbol. If the value given doesn't match any entry in the kernel symbol table, its value alone is printed. I'm not yet completely convinced of the merits of this approach. The format of the various fields of the glock and holder structures was really designed to be human readable. Here the aim seems to be to pass it to the gfs2_tool, so I think we could perhaps invent something a bit more succinct? Perhaps one line per glock or holder with each glock starting with its number and each holder starting with 'H' or something in the first field? It would result in a fair amount of memory saved for large dumps, as well as being easier to parse. 1. I'll resync with the latest nwm tree today. 2. I'll rewrite gfs2_print_dbg so that it doesn't rely on the stack; good catch. 3. The current refs to print_symbol are all used for printing an inode pointer, as in gl->gl_ip or gh->gh_ip. Can that really be a kernel symbol or is it alright to just print it as a decimal or hex value? Here's the thing: I wanted to have a separate debugfs.c, but the code need to reference symbols that are internal to glock.c, so I had to put the code there. In the current glock.c, there is already a bunch of code to dump a human-readable version of the glocks (dump_glock). It is called from gfs2_dump_lockstate, which is triggered if a deadlock is detected in unmount. Today it dumps human-readable glocks using printk. This is exactly the same format that gfs_tool lockdump uses in GFS1. The patch merely transforms the existing kernel code so that it can print either to the kernel messages (printk) or to a debugfs sequence file. Plus there's some extra code needed either way for managing debugfs. (I believe using debugfs is the way to go, rather than trying to reintroduce ioctl calls into the kernel and push them upstream.) I can restructure the whole thing so that it dumps the glock information to less friendly output that can then be reinterpreted by gfs2_tool and translated back into the same output, but that involves (1) Making a second set of dump routines (leaving alone the unmount deadlock code described above) and creating a second (redundant) set of routines to dump the glock data in a more succinct format. (2) Changing gfs2_tool to interpret the succinct data and print it out in the same format that gfs2 would, should a unmount deadlock occur. Seems pointless when we can have one function that serves both purposes. Of course, a succinct format would be more practical if we were planning to write a glock monitoring tool that would download the data several times a second. Ok, that sounds like a reasonable plan. The symbols that print_symbool is printing are not inode pointers, but instruction pointers. It shows the function which actually locked the glock in question, so is rather useful. Given a kernel and a hex address, its possible to match the two up, but its rather easier to do automatically than manually after the fact. If you can reliably open the System.map which matches the running kernel at the time of the dump, then you should be able to do the matching in userspace, if that helps. Created attachment 148880 [details]
Second go at a fix
I'm not sure if your "reasonable plan" was referring to the code that
doesn't have redundant functions, or the one that does.
At any rate, this attachment is my latest and greatest. I synched it
with the latest gfs2 nmw tree, so there shouldn't be anything extraneous.
I agree that the print_symbol information is very useful when it
comes to glocks, and it would be nice to have it in the debugfs.
Unfortunately, I only see three choices:
(1) Leave it a hex value in the debugfs version, which affects the
least amount of code, but it's also not very useful.
(2) Copy a great deal of code from kallsyms.c into glock.c, which
I don't like, and neither will the upstream community, or
(3) Change the kernel so that it exports the function in kallsyms
needed to do the symbol lookup, which is what I did in this patch.
That may also be frowned upon by the upstream community. However,
that's the most functional and has minimal impact on the underlying
kernel, since it just adds "extern" to existing declarations and such.
I got around the stack issue by keeping the temporary string inside
the debugfs iterator. However, having done that, I now see that the
print_symbol function in kallsyms.c does what I was doing before:
it uses a temp string on the stack. At any rate, since the variable
is used on a per-line basis, I would think 256 bytes should be ample.
I increased it to 1K just to be safe.
Waiting for some feedback on this before I try to post something like
it on rhkernel-list or lkml.
*** Bug 221300 has been marked as a duplicate of this bug. *** Created attachment 149026 [details]
third go at a fix
I had some IRC discussions with Steve Whitehouse and Dave Teigland.
Together, we decided to separate the print_symbol changes from the
basic functionality. Hopefully, this will make the whole process more
palatable to lkml. This third version of the patch is the basic
functionality, without any changes to print_symbol. If the code
is called from debugfs, the symbol will be printed as a hex number.
If the code is called from an unmount lockup, the symbol will be
printed with print_symbol. Later, I'll post a kernel patch to lkml
to introduce a more versatile version of print_symbol, and then
incorporate it into glock.c. This version also fixes some issues
raised by Steve concerning the kmalloc and a "} else {" statement.
Progress report: Looks like patch 3 was integrated into Steve Whitehouse's upstream git tree. I've made great progress is getting a patch into the upstream kernel that allows the exporting of a new kernel function, sprint_symbol, in kernel/kallsyms.c, that does a kernel symbol lookup. I patched the gfs2 kernel so that it uses the new function so we get the symbol in the debugfs log instead of a hex number, as in comment #8. The kallsyms patch was incorporated into Andrew Morton's latest (-mm2) patch set after some revisions based on lkml feedback. I tested 2.6.21-rc3-mm2 and the new function worked perfectly with gfs2. I ran into two problems with mm2 that were unrelated: (1) The qla2xxx device driver was hosed. I got around that by using the previous version of the driver, which works. (2) GFS2 makes a call to filemap_nopage which is gone for some reason in mm2. I got around that by patching around it. Having tested the mm2 patch, I asked Andrew Morton to pull the fix into Linus' upstream git tree. As of this morning, it's not there and I haven't gotten a response from him. Meanwhile, I made a patch to the userland gfs2_tool so that it simulates old behavior by using the debugfs file system. I imagine this bugzilla record will be used for POSTing the RHEL5.1 kernel changes and that the userland changes will require another bz. I'll post the userland patch here anyway. Created attachment 149854 [details]
Patch for gfs2_tool to use the new kernel feature
This patch adds back the functionality of "gfs2_tool lockdump", provided
the kernel patch for debugfs is in place. I'm not sure if it's really
necessary to do this, but if necessary, I'll probably need a
separate bugzilla record for the userland portion, and this bug record
will be for the kernel patch to be POSTed for the RHEL5.1 kernel.
Created attachment 150918 [details]
Additional patch to use pid number rather than process pointer
In Testing the previously posted and accepted patch
I uncovered some gfs2 badness. It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures. Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.
This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.
I wrote this patch that avoids saving process pointers
and instead saves off the process pid. Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.
This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.
The patch was tested on system trin-10.
The patch is against 23 March 2007 nwm git tree.
It was pulled into the tree by Steve Whitehouse on 26 March 2007.
Created attachment 150925 [details]
The kallsyms patch for RHEL5 2.6.18-8.EL5 kernel
This base kernel patch adds the ability for kernel symbols to be
printed out to a buffer rather than just to dmesg.
This enables gfs2's debugfs new lockdump implementation to print the
kernel symbols for "initialized at" values of a glock (very important
to glock debugging). Without it, the value will be printed as a hex
number, which is one step above useless.
This patch was reviewed and accepted upstream in lkml. It now appears
in Andrew Morton's "mm" (i.e. 2.6.21-rcX-mmX) patch set for the latest
kernel. However, since this is considered a kernel enhancement and
not a bug, I was told the fix won't go into the base upstream kernel
until 2.6.22.
However, Steve and I both want this to go into the base RHEL5.1 kernel
for gfs2 glock debugging purposes.
I discussed the gameplan for this patch with Steve Whitehouse this morning, and this is the status and current plan as I understand it: 1. The "Third go at a fix" and "additional patch to use pid number" patches are in Steve's nwm git tree for 2.6.21 as of today. 2. Steve is going to try to request a bunch of patches be pulled into the RHEL5 kernel for 5.1 by Don Zickus which will include them both. 3. The additional change, that allows the kernel symbols to print properly has been attached as the "kallsyms patch". Steve asked me to post it here so that Don Z can pull it all in one place rather than trying to keep track of multiples. 4. Once those patches are in place, the stub for glock.c may be patched to call the new sprint_symbol function, as described in the comments of glock.c, function gfs2_print_symbol. This also should go into the RHEL5.1 kernel otherwise the kallsyms patch will be pointless. Created attachment 150936 [details] Patch to glock.c to use new sprint_symbol For the sake of completeness, this is the patch needed so that gfs2 can take advantage of the previous kallsyms patch, function sprint_symbol. This corresponds to comment #13, point #4. The patch is against the 2.6.18-8.EL5 kernel. Created attachment 152978 [details] Master patch against 2.6.18-15.el5 This patch is against the 2.6.18-15.el5 kernel. It consolidates the following fixes that all relate to gfs2_tool lockdump: 1. The original fix for this bug, 228540. 2. The fix to use pid number and do pid lookups so we don't reference poisoned memory. See comment #11. 3. The kallsyms patch that was accepted by the upstream kernel for 2.6.22, currently appearing in Andrew Morton's -mm patchset. 4. The fix for 236008, where lockdump was causing a kernel gpf. 5. The fix I sent to cluster-devel today regarding the fact that the debugfs files were not deleted upon failed mount attempts. 6. The implementation of Dave Teigland's suggestion to move the debugfs glock file into a separate directory (renamed) leaving room for future debugfs enhancements. 7. Reintroduce glock demote stuff (GLF_DEMOTE constant, gl_demote_state and so on) that were apparently dropped off the previous patch. The gfs2 module wouldn't compile without it. Can you leave out step 7 in your comment #15 above? This is introduced by a later patch and the idea here was to try and keep the same patch ordering in RHEL as in upstream so that we'd need only minimal changes to get the patches to apply. You ought to be able to just leave out anything refering to the GLF_DEMOTE flag since it shouldn't be defined or used until that later patch is applied. Otherwise it looks good I think. Revised patch (without #7 from comment #15) sent to rhkernel-list. Changing status of the bz to POST and adding Don Zickus to the cc list. Changing the component to kernel and putting lwang on the cc list. BTW, This is now in the upstream nmw git tree. It was tested on RHEL5 on trin-10 and found to be correct. *** Bug 231369 has been marked as a duplicate of this bug. *** *** Bug 236008 has been marked as a duplicate of this bug. *** in 2.6.18-17.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |