Bug 183119
Summary: | Assertion failure in journal_next_log_block | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Charles Lopes <tjarls> |
Component: | kernel | Assignee: | Josef Bacik <jbacik> |
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.0 | CC: | bigmach, bugproxy, bugzilla.redhat, esandeen, jbaron, sct, sfolkwil, staubach, tao, vgoyal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-07-24 19:11:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 422551, 430698, 439193 | ||
Attachments: |
Description
Charles Lopes
2006-02-26 15:12:13 UTC
We are experiencing the exact same problem. Any solution possible yet? No solution as of yet, but there are a couple of leads. One possible workaround is to increase the size of the journal on the fs. Can anyone who can hit this reliably give it a whirl with: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/jbd-journal-overflow-fix-2.patch to see if that helps? if so I'll see if I can find a clever way to deal with the KABI issues... Created attachment 143420 [details]
patch file for linux kernel RPM build 2.6.9-42.0.3.EL
Hello,
We have had the same type of crash on our system (with a more recent kernel)
I have tried to implement the patch that was suggested by Eric Sandeen. I had
to patch a bit further to make it compile with a our kernel (2.6.9-42.0.3.EL)
This is the patch file for the xattr file. If you try this, please keep in
mind that I am not a kernel programmer and mostly just did the same kind of
thing that Eric did in his patch.
However, our kernel still crashes in the same manner as before. Has anybody
had luck solving this issue? Any suggestions?
Created attachment 144402 [details] patch -- don't call journal_release_buffer from xattr calls As Leif stated, this patch applies cleanly, but I think it depends on some other patches as well. In looking for those other patches I ran across this one: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=47930089e101bdf5bd4dce5c41ab64d74d204bb1 This doesn't apply cleanly at all, however. The attached patch is a backport of it. With this, the one recommended by Eric seems to compile cleanly. I've posted some kernels containing both of these patches to my people page: http://people.redhat.com/jlayton/bz183119/ Anyone seeing this bug should test them and see if this resolves it. These kernels are test kernels and should not be allowed to touch any data which you care about. I've done a little testing with them and don't see any obvious problems, but they may eat your data for lunch. In fact, it may be safest to wait until Eric has had a look at them since he has more familiarity with this code than I. Also, I've only posted SMP kernels for i686, x86_64 and ia64. Please post here if you need other flavors. I've been able to reliably reproduce a panic on a test box with this patch: Unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: e0048107 *pde = 15d03001 Oops: 0000 [#1] SMP Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core rpcsec_gss_krb5 auth_rpcgss des sunrpc dm_multipath button battery ac uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e1000 floppy ata_piix libata scsi_mod ipw2100 ieee80211 ieee80211_crypt dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod CPU: 1 EIP: 0060:[<e0048107>] Not tainted VLI EFLAGS: 00010246 (2.6.9-42.36.EL.TEST.bz183119.1smp) EIP is at journal_dirty_metadata+0x40/0x1a0 [jbd] eax: 00000000 ebx: df01bf80 ecx: d317bd18 edx: df02c6c8 esi: 00000000 edi: dd5c3908 ebp: df02c6c8 esp: d68ccd44 ds: 007b es: 007b ss: 0068 Process up2date (pid: 4421, threadinfo=d68cc000 task=d61d4330) Stack: df65ca00 df65c800 00000000 00000000 df02c6c8 e010fe1b dd30ed14 df65c800 d317bde8 dd5c3908 00000000 dd9bce9c df02c6c8 d4c15000 d4c16000 d4c15020 e010f95b d4c15000 00000fe8 00000000 00000007 00000000 00000fe8 d4c15038 Call Trace: [<e010fe1b>] ext3_xattr_set_handle2+0x473/0x4bd [ext3] [<e010f95b>] ext3_xattr_set_handle+0x6fb/0x748 [ext3] [<e010feaf>] ext3_xattr_set+0x4a/0x83 [ext3] [<e01113e2>] ext3_xattr_security_set+0x3c/0x83 [ext3] [<c0176acb>] generic_setxattr+0x48/0x50 [<c0176307>] setxattr+0x1c3/0x21c [<c0166baa>] do_lookup+0x1f/0x8f [<c016f6ac>] dput+0x34/0x1a7 [<c0167717>] __link_path_walk+0xafd/0xbb5 [<c016f6ac>] dput+0x34/0x1a7 [<c0167863>] link_path_walk+0x94/0xbe [<c011ae59>] do_page_fault+0x0/0x5c6 [<c0167ba7>] path_lookup+0x14b/0x17f [<c01763da>] sys_lsetxattr+0x32/0x45 [<c011ae59>] do_page_fault+0x0/0x5c6 [<c02d58d3>] syscall_call+0x7/0xb [<c02d007b>] unix_stream_recvmsg+0x362/0x3a5 Code: 72 28 0f 85 7b 01 00 00 f6 00 02 0f 85 72 01 00 00 eb 0c f3 90 8b 45 00 a9 00 00 08 00 75 f4 f0 0f ba 6d 00 13 19 c0 85 c0 75 ea <83> 7e 0c 00 75 39 c7 46 0c 01 00 00 00 83 7f 04 00 7f 29 68 ec <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception ...so I'm thinking that these two patches by themselves won't cut it. Crash occurred here: 0xe0048101 <journal_dirty_metadata+58>: sbb %eax,%eax 0xe0048103 <journal_dirty_metadata+60>: test %eax,%eax 0xe0048105 <journal_dirty_metadata+62>: jne 0xe00480f1 0xe0048107 <journal_dirty_metadata+64>: cmpl $0x0,0xc(%esi) <<<<< crash here ...but I'm having a real problem trying to match this up with the C code. Jeff, what's your recipe for reproducing this one? Thanks, -Eric I had it occurring when trying to install a RPM package on my test box. Here's how it happened: I needed to install a package on my test box to build something and I was running this kernel there. I installed it and the machine paniced, I rebooted to the same kernel and tried installing it again. It paniced at exactly the same spot. I then booted to a different kernel (without that patch) and the package installed just fine. I removed the package and then booted to the patched kernel again, and tried to reinstall the package and it didn't panic that time around. Later on though, I needed to install a different package and got another panic. So it seems like once it panics, it will continue to panic consistently in the same spot until enough activity occurs to make things different enough on disk I think if you do a lot of rpm activity on the box that will probably make it happen. I'm not sure if that means that my backport of the patch was bad (I did have to make some changes though I thought I did it right), or if one or both of these patches depends on a different one. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I have recently encountered this issue and have a simple way to reproduce it. The 'postmark' benchmark seems to reliably cause this assertion failure after running for less than an hour. I've seen this on both 32-bit and 64-bit versions of RHEL4 update 4. I also tested 32-bit and 64-bit versions of RHEL 5, where the benchmark ran to completion in about 3 hours with no problems. You can get postmark here: http://ftp.debian.org/debian/pool/main/p/postmark/postmark_1.51.orig.tar.gz After compiling postmark, run it and issue these commands: set number 1000000 set transactions 1000000 set location /tmp run Hm, tried the postmark tests, haven't recreated it yet. -Eric Another candidate: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=96761507cc998cb9c1c06f9d74bfa47cb11ae254 discussion at: http://lkml.org/lkml/2005/1/19/144 The thread started out with Alex claiming it addressed the assert, but Stephen disagreed. It went in, for other reasons (mostly stack savings) but it appears that it did remedy the assert for Alex, for apparently unknown reasons. Looks like this went into 2.6.12 This change would also affect jbd KABI... I'm setting up a box to try and reproduce this, though I suspect this has probably been resolved in 4.6 with the journal checkpoint overhaul. I've done everything I could possibly think of to reproduce this issue and I'm getting no where. I've reduced the journal size, changed the journal commit interval to > 5 minutes, everything to try and make me run out of blocks and I can't do it. Could somebody who can reproduce this easily try on the newest RHEL4 kernel. I backported Jan's checkpoint overhaul into 4.6 which I believe should resolve this issue, and if you can still get it to reproduce please give me a detailed view of your system and what you did to reproduce it so I can try and mirror your setup as much as possible. Created attachment 291901 [details]
console log
Shows kernel boot messages and output from these commands, followed by the
assertion failure and resulting stack trace:
lspci
lspci -v
lspci -v -v
dmidecode
cat /proc/scsi/scsi
cat /proc/scsi/mptsas/0
cat /proc/scsi/mptsas/1
cat /proc/interrupts
cat /proc/ioports
ls -l /sys/bus/scsi/drivers/sd
mount
dumpe2fs -h /dev/mapper/VolGroup01-LogVol00
Comment on attachment 291901 [details]
console log
I was able to reproduce this issue on a RHEL4u4 fresh install running the stock
kernel (2.6.9-42.ELsmp). I then installed the latest u6 kernel
(kernel-smp-2.6.9-67.EL.x86_64.rpm) and was also able to reproduce the issue.
This attachment contains my console output
Would you be willing to upload a core from the crash on the 67 kernel to dropbox.redhat.com so I can take a look? Just give me the filename if you uploaded it so I can pull it down. (In reply to comment #39) > Would you be willing to upload a core from the crash on the 67 kernel to > dropbox.redhat.com so I can take a look? Sure, that's no problem. I didn't have netdump configured when I reproduced the issue yesterday, and since then I have done a fresh RHEL4u6 install on the machine, so the core I uploaded is from that environment and doesn't match up with yesterday's kernel log - I hope that's OK. The filename is dump_183119.tar.bz2 and its md5sum is 5e9e0a89457ce6079fe702c13e6d254b awesome thanks I will pull it down tomorrow and start going through it. Created attachment 292185 [details]
debug patch to watch transaction credits
well the committing transaction has a negative number for
t_outstanding_credits, which would wreak all kinds of havoc. The only way for
this to happen is if somebody runs journal_stop twice on the same handle, which
isn't good. Could you try this patch and when it panics upload the vmcore and
the kernel-debuginfo from the kernel that you built with this patch? (or just
the vmlinux/jbd.ko.debug/ext3.ko.debug).
Created attachment 292225 [details]
console log
Unfortunately the kernel I built with this patch paniced my system on boot.
Attached is the console log. I verified that I could build a kernel without
this patch that booted cleanly so I therefore conclude that there is a problem
with the patch.
hmm well thats odd, maybe its because of the GENKSYSM trick. I'll test this out on my box and when I have an actual booting kernel I'll give you the patch. Sorry about that. Created attachment 292398 [details]
new debug patch.
Well my box boots fine with this patch, and there isn't anything obviously
wrong with it, unless its freaking out b/c of the genksyms thing. If this
patch doesn't fix it, one thing to try would be to go into the patch and where
I have __builtin_return_address(2), change that to __builtin_return_address(1)
and see if it starts playing nicely.
This latest patch did not apply cleanly to the kernel sources, but your suggestion of using the previous patch with __builtin_return_address(1) resulted in a kernel that does boot. So, I have uploaded dump2_183119.tar.bz2 to dropbox; the md5sum is eb9e968d9843fbc9dd5dbe6aedaa48b9. could you also upload the kernel-debuginfo rpm so I can open this core? I have uploaded kernel-debuginfo-2.6.9-67.EL.jwhiter.patch.x86_64.rpm; the md5sum is 67f218d594675128d6947744cbdb6d12. ooh ok i see you already gave me the vmlinux/ext3.ko.debug/jbd.ko.debug in the original tarball. Sorry about that, thats perfect thank you. So looking at this my little trap didn't get tripped, but we're still negative, so we must be going negative in journal_restart_handle. I assume you already have a kernel tree in place so you don't have to keep rebuilding so just add J_ASSERT(transaction->t_outstanding_credits >= 0); transaction->t_ip = (unsigned long)__builtin_return_address(1); to journal_restart_handle after the transaction->t_updates--; line and retry and see if it trips that assert. Thanks much. Well, the new assert didn't trip, instead I just got the original issue again. In case it reveals any new information I've uploaded the vmcore, etc. in file dump3_183119.tar.bz2 with md5sum b5a41d34bbbc8d1bbf4fe5a08f65686f. Created attachment 292657 [details]
watch everytime we touch t_outstanding_credits
Ok well this is just weird. Apply this to a clean source tree and see if it
will trip one of these asserts. There is something that worries me in
journal_commit_transaction, we are doing a t_outstanding_credits--, and the
comments say that journal_next_log_block() modifies this counter, but I can't
find evidence of that anywhere. So hopefully with this we'll be able to catch
the culprit.
With this patch, my kernel paniced while booting just like the first patched kernel did. Changing the __builtin_return_address(2) to __builtin_return_address(1) fixed that. This time when I ran the postmark benchmark the assert was tripped almost immediately during the file creation phase, while it usually doesn't happen until well into the transaction phase. As an experiment, I tried just doing a recursive copy of /usr to another location and was able to hit the issue within a couple minutes. I've uploaded dump4_183119.tar.bz2 with md5sum 60050479d703512682c19fc688fcdb45. Created attachment 292682 [details]
debug patch with potential fix.
Ok so what I suspected was going wrong is going wrong. Try this patch,
hopefully it will fix the problem and if not it still has the other debug stuff
in place so it will at least point me in the next direction we need to go.
Thanks sooooo much for helping me, I would have never figured this out without
your help.
I'm glad to be of assistance. This latest patch is interesting; when I started the postmark benchmark after building with the patch, the performance was much slower than ever before. Normally the file creation phase takes 10-15 minutes, but it was still running after an hour! So, based on the theory that the asserts were injecting a lot of extra delay, I rebuilt a kernel with just the "fix" part of your latest patch, and no asserts, hoping that it would run faster. Well, it's been running for about 3 hours so far and still hasn't finished creating files. I'm going to let it run overnight and see what happens, but I'm concerned that if things run this slowly we might not hit the same race condition or combination of code paths as when it ran more quickly. I'll check on it late tonight and report if anything interesting has happened. I am a bit concerned that even though this change may fix the issue, the resulting performance hit may not be acceptable. Hmm thats odd. I agree if its going to be that severe of a performance hit then something else must be going on. I wonder if whats happening is we are in fact supposed to go into the negative to account for the amount of space in the journal we are freeing up, and with my "fix" we appear to be running out of journal space more quickly and therefore are waiting for commits to occur in order to free up space to continue doing IO. I will look into this more tomorrow when I actually have the code sitting in front of me. If thats the case I'm definitely going to be adding some more specific comments as there is no information anywhere that would indicate that. Well, postmark finally got through the file creation phase, and currently is 70% through the transaction phase with no panics, so that is the good news. Bad news of course is that it may end up taking 12 hours to complete a run that should take around 3 hours. The benchmark did eventually complete without error. Well looking through it jbd does weird things in order to track how much of the journal it needs, and of course it tracks how much is needed by the committing transaction by making t_outstanding_credits go negative in order to account for the blocks that its using. Why we don't just return j_max_blocks - j_free i don't know, so I'm going to do that and take out this t_outstanding_credits hackery and see how that goes, and if it works well enough with performance and doesn't cause anything to blow up I'll attach it here and see if it will make your problem go away. rrr i lost my big long comment about what was going on, so short version is I understand whats going wrong, I just need to find where it is, hopefully the next patch i give you will actually work. Created attachment 293429 [details]
potential fix
Ok after going through this and staring at everything for entirely too long I
think I've figured out an acceptable solution. Originally I thought somebody
may be adding something to the transaction without first doing a
journal_get_write_access(), which would be incorrect as it screws up the
accounting. What I found is that the committing transaction will sometimes
punt some of its buffers onto the next running transaction, which doesn't take
into account any of the credit handling stuff which is why you get such a
discrepancy between t_nr_buffers and t_outstanding_credits. So the solution I
think is to not use t_outstanding_credits at all but rather use t_nr_buffers,
since at this point in the game nobody should be modifying the committing
transaction. This patch should apply to a clean source tree. Let me know how
it works for you.
I had to make one slight change to the patch before the kernel would compile, changing 'transaction_t trans = ...' to 'transaction_t *trans = ...' but after that, I was able to finish a postmark run of 1000000 files and 1000000 transactions without any issues. Looks like a good fix, I'm happy to say. awesome, thank you for testing this, now to send it upstream. Created attachment 293746 [details]
final version of the patch.
This is the final version of the patch that was accepted upstream and will be
going into RHEL4.
I just tried building a kernel with this patch and hit a compilation error: include/linux/jbd.h: In function `jbd_space_needed': include/linux/jbd.h:1091: error: `trans' undeclared (first use in this function) Created attachment 293757 [details]
final version of the patch that actually compiles.
Doh sorry about that, didn't update my local copy with the copy that I tested.
Heres the right one.
bah, Jan just looked at my patch and looks like this isn't the correct fix, so I'm asking that they pull it from the ext4 patch queue. Back to the drawing board... Changing the status back to Assigned as the right fix is yet to be posted. Testing a new fix, finally narrowed down what was going wrong. Provided nothing goes horribly wrong with my testing I'll post the patch so those who can actually hit the assertion can make sure that it goes away with my new patch. ------- Comment From jwhiter 2008-01-08 10:11 EST------- ------- Comment From jwhiter 2008-01-14 14:48 EST------- ------- Comment From jwhiter 2008-01-17 09:08 EST------- ------- Comment From jwhiter 2008-01-17 17:25 EST------- ------- Comment From jwhiter 2008-01-21 14:01 EST------- ------- Comment From jwhiter 2008-01-22 10:22 EST------- ------- Comment From jwhiter 2008-01-22 13:54 EST------- ------- Comment From jwhiter 2008-01-23 21:37 EST------- ------- Comment From jwhiter 2008-01-24 12:18 EST------- ------- Comment From jwhiter 2008-01-28 10:41 EST------- ------- Comment From jwhiter 2008-01-31 11:00 EST------- ------- Comment From jwhiter 2008-02-05 14:08 EST------- Created attachment 295041 [details]
next attempt at a fix
Ok this patch looks to be working and hasn't broken anything. Could you please
test it and make sure it keeps the assertion from occurring.
Running with this patch I hit the following assert fairly quickly: Assertion failure in do_get_write_access() at fs/jbd/transaction.c:693: "handle->h_buffer_credits > 0" I have uploaded dump5_183119.tar.bz2 with md5sum c1d3068e76794f595e101b837c8e3a95. lovely, fix one bug and another pops up. This is where jlayton was going before with that xattr thing, we really shouldn't use journal_release_buffer at all. Created attachment 295315 [details]
yet another patch.
Ok so my original fix uncovered this nice little bug where we leak credits
because the xattr code _loves_ to use journal_release_buffer instead of only
getting write access when it absolutely needs to. So this patch includes my
original fix and the xattr fix which should keep us from running out of handle
credits. Let me know how this works for you.
I ran the postmark benchmark with a million files and two million transactions, and after completing about 80% of the transactions it hit the same assert: Assertion failure in do_get_write_access() at fs/jbd/transaction.c:693: "handle->h_buffer_credits > 0" I've uploaded dump6_183119.tar.bz2 with md5sum 515e6f09df0da1e95571b1c54459e0d5 Ok after looking into this more than i would have ever wanted to it turns out upstream does in fact have the same sort of problem, which makes me feel less bad about why the upstream solutions weren't fixing the problem. So now to figure out a way to fix upstream, and then backport. FYI we've found an acceptable solution to this for upstream, so after my testing is finished I will be posting those patches, and as soon as they are accepted somewhere I will post RHEL patches for testing here, and if all goes well will post them and we'll be all set. Created attachment 296621 [details]
upstream backport.
Please test this patch, it is a backport of the patches that just went upstream
that should fix this problem. Please test it to make sure I didn't miss
anything, as this requires changing quite a bit of the core jbd code for RHEL4.
I am happy to report that my system running with this patch successfully completed a run of the postmark benchmark with one million files and two million transactions. All of the failures I encountered except for the one mentioned in comment #76 occurred well within a run of one million files and one million transactions. Out of an abundance of caution I have started a run with four million files and four million transactions; if it encounters any issues I'll report back. Awesome thank you for testing this, it has been a tremendous help. Just for future reference; the posted patch consists of essentially these 3 patches backported: original addition of b_modified: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=5025daf1ae75ab4d91f48cebb9af3d4a519e611b josef's 2 patches now in -mm: http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc3/2.6.25-rc3-mm1/broken-out/jbd-fix-possible-journal-overflow-issues.patch http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc3/2.6.25-rc3-mm1/broken-out/jbd-fix-the-way-the-b_modified-flag-is-cleared.patch FYI, the postmark run with four million files and four million transactions did complete without error. Nice to see this issue resolved. Sweet thanks for testing all of this for me. Wouldn't you know as soon as I got it fixed I can now hit the panic pretty regularly :). Committed in 68.20.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ ------- Comment From mranweil.com 2008-06-11 17:26 EDT------- Closing on this side with the testing from a few comments (and couple months ;) ago. Patch is in 4.7. Thank you. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html |