Bug 183119

Summary:

Assertion failure in journal_next_log_block

Product:

Red Hat Enterprise Linux 4

Reporter:

Charles Lopes <tjarls>

Component:

kernel

Assignee:

Josef Bacik <jbacik>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.0

CC:

bigmach, bugproxy, bugzilla.redhat, esandeen, jbaron, sct, sfolkwil, staubach, tao, vgoyal

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

RHSA-2008-0665

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-07-24 19:11:08 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

422551, 430698, 439193

Attachments:

Description	Flags
patch file for linux kernel RPM build 2.6.9-42.0.3.EL	none
patch -- don't call journal_release_buffer from xattr calls	none
console log	none
debug patch to watch transaction credits	none
console log	none
new debug patch.	none
watch everytime we touch t_outstanding_credits	none
debug patch with potential fix.	none
potential fix	none
final version of the patch.	none
final version of the patch that actually compiles.	none
next attempt at a fix	none
yet another patch.	none
upstream backport.	none

Description Charles Lopes 2006-02-26 15:12:13 UTC

Sometimes (three times so far) when the system is under stress (recreating
indexes on a very large DB2 database), the system panics:

Feb 24 16:12:33 marvin kernel: Assertion failure in journal_next_log_block() at
fs/jbd/journal.c:576: "journal->j_free > 1"
Feb 24 16:12:33 marvin kernel: ------------[ cut here ]------------
Feb 24 16:12:33 marvin kernel: kernel BUG at fs/jbd/journal.c:576!
Feb 24 16:12:33 marvin kernel: invalid operand: 0000 [#1]
Feb 24 16:12:33 marvin kernel: SMP
Feb 24 16:12:33 marvin kernel: Modules linked in: sg md5 ipv6 i2c_dev i2c_core
st sunrpc microcode dm_mod button battery ac ohci_hcd cpqphp tg3 floppy ext3 jbd
cciss sd_mod scsi_mod
Feb 24 16:12:33 marvin kernel: CPU:    2
Feb 24 16:12:33 marvin kernel: EIP:    0060:[<f88716b9>]    Not tainted VLI
Feb 24 16:12:33 marvin kernel: EFLAGS: 00010216   (2.6.9-22.ELsmp)
Feb 24 16:12:33 marvin kernel: EIP is at journal_next_log_block+0x39/0x83 [jbd]
Feb 24 16:12:33 marvin kernel: eax: 00000060   ebx: f7f4ae00   ecx: f721bdcc  
edx: f88741cb
Feb 24 16:12:33 marvin kernel: esi: c96ae174   edi: f7f4ae14   ebp: f721be14  
esp: f721bdc8
Feb 24 16:12:33 marvin kernel: ds: 007b   es: 007b   ss: 0068
Feb 24 16:12:33 marvin kernel: Process kjournald (pid: 1209, threadinfo=f721b000
task=f7117430)
Feb 24 16:12:33 marvin kernel: Stack: f88741cb f8872fcf f88741ba 00000240
f8874255 f7f4ae00 c96ae174 c7a7e7dc
Feb 24 16:12:33 marvin kernel:        f62d4f80 f886e8eb 00000000 00000000
00000e84 c96ae17c 00000000 0000002d
Feb 24 16:12:33 marvin kernel:        ec0da6bc f7f4ae00 ed3bf9bc 00000a19
00000000 f7117430 c011ffb1 f721be44
Feb 24 16:12:33 marvin kernel: Call Trace:
Feb 24 16:12:33 marvin kernel:  [<f886e8eb>]
journal_commit_transaction+0x6d0/0xfc1 [jbd]
Feb 24 16:12:33 marvin kernel:  [<c011ffb1>] autoremove_wake_function+0x0/0x2d
Feb 24 16:12:33 marvin kernel:  [<c011ffb1>] autoremove_wake_function+0x0/0x2d
Feb 24 16:12:33 marvin kernel:  [<f8870e8d>] kjournald+0xc7/0x219 [jbd]
Feb 24 16:12:33 marvin kernel:  [<c011ffb1>] autoremove_wake_function+0x0/0x2d
Feb 24 16:12:33 marvin kernel:  [<c011ffb1>] autoremove_wake_function+0x0/0x2d
Feb 24 16:12:33 marvin kernel:  [<c011d41d>] schedule_tail+0x12/0x55
Feb 24 16:12:33 marvin kernel:  [<f8870dc0>] commit_timeout+0x0/0x5 [jbd]
Feb 24 16:12:33 marvin kernel:  [<f8870dc6>] kjournald+0x0/0x219 [jbd]
Feb 24 16:12:33 marvin kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Feb 24 16:12:33 marvin kernel: Code: b3 e5 a5 c7 83 bb c4 00 00 00 01 77 29 68
55 42 87 f8 68 40 02
00 00 68 ba 41 87 f8 68 cf 2f 87 f8 68 cb 41 87 f8 e8 59 0c 8b c7 <0f> 0b 40 02
ba 41 87 f8 83 c4 14 ff 8b c4 00 00 00 8b b3 bc 00
Feb 24 16:12:33 marvin kernel:  <0>Fatal exception: panic in 5 seconds




It happened twice in a row in the last couple of days, when running the same SQL
command. The kernel used was kernel-smp-2.6.9-22.EL. About a month ago it
happened on kernel-smp-2.6.9-22.0.2.EL, in a similar situation. We were then
recommanded by IBM support to downgrade to kernel-smp-2.6.9-22.EL

I'm wondering if we are hiting the bug fixed in 2.6.12
jbd-journal-overflow-fix-2.patch

As we are pressed by time, we will probably end up patching
kernel-smp-2.6.9-22.0.2.EL to see that works.

Comment 3 Need Real Name 2006-06-19 23:05:22 UTC

We are experiencing the exact same problem.  Any solution possible yet?

Comment 5 Jeff Layton 2006-09-29 00:34:47 UTC

No solution as of yet, but there are a couple of leads. One possible workaround
is to increase the size of the journal on the fs.

Comment 10 Eric Sandeen 2006-11-09 21:34:31 UTC

Can anyone who can hit this reliably give it a whirl with:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/jbd-journal-overflow-fix-2.patch

to see if that helps?  if so I'll see if I can find a clever way to deal with
the KABI issues...

Comment 11 Leif Bergman 2006-12-12 18:20:12 UTC

Created attachment 143420 [details]
patch file for linux kernel RPM build 2.6.9-42.0.3.EL

Hello,

We have had the same type of crash on our system (with a more recent kernel)

 I have tried to implement the patch that was suggested by Eric Sandeen.  I had
to patch a bit further to make it compile with a our kernel (2.6.9-42.0.3.EL) 
This is the patch file for the xattr file.  If you try this, please keep in
mind that I am not a kernel programmer and mostly just did the same kind of
thing that Eric did in his patch.

However, our kernel still crashes in the same manner as before.  Has anybody
had luck solving this issue?  Any suggestions?

Comment 12 Jeff Layton 2006-12-27 15:09:07 UTC

Created attachment 144402 [details]
patch -- don't call journal_release_buffer from xattr calls

As Leif stated, this patch applies cleanly, but I think it depends on some
other patches as well. In looking for those other patches I ran across this
one:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=47930089e101bdf5bd4dce5c41ab64d74d204bb1


This doesn't apply cleanly at all, however. The attached patch is a backport of
it. With this, the one recommended by Eric seems to compile cleanly.

Comment 13 Jeff Layton 2006-12-27 17:49:25 UTC

I've posted some kernels containing both of these patches to my people page:

http://people.redhat.com/jlayton/bz183119/

Anyone seeing this bug should test them and see if this resolves it. These
kernels are test kernels and should not be allowed to touch any data which you
care about. I've done a little testing with them and don't see any obvious
problems, but they may eat your data for lunch.

In fact, it may be safest to wait until Eric has had a look at them since he has
more familiarity with this code than I.

Comment 14 Jeff Layton 2006-12-27 17:53:39 UTC

Also, I've only posted SMP kernels for i686, x86_64 and ia64. Please post here
if you need other flavors.

Comment 15 Jeff Layton 2007-01-04 16:44:27 UTC

I've been able to reliably reproduce a panic on a test box with this patch:

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
e0048107
*pde = 15d03001
Oops: 0000 [#1]
SMP
Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport
autofs4 i2c_dev i2c_core rpcsec_gss_krb5 auth_rpcgss des sunrpc dm_multipath
button battery ac uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi
snd_seq_device snd soundcore e1000 floppy ata_piix libata scsi_mod ipw2100
ieee80211 ieee80211_crypt dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
CPU:    1
EIP:    0060:[<e0048107>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-42.36.EL.TEST.bz183119.1smp)
EIP is at journal_dirty_metadata+0x40/0x1a0 [jbd]
eax: 00000000   ebx: df01bf80   ecx: d317bd18   edx: df02c6c8
esi: 00000000   edi: dd5c3908   ebp: df02c6c8   esp: d68ccd44
ds: 007b   es: 007b   ss: 0068
Process up2date (pid: 4421, threadinfo=d68cc000 task=d61d4330)
Stack: df65ca00 df65c800 00000000 00000000 df02c6c8 e010fe1b dd30ed14 df65c800
       d317bde8 dd5c3908 00000000 dd9bce9c df02c6c8 d4c15000 d4c16000 d4c15020
       e010f95b d4c15000 00000fe8 00000000 00000007 00000000 00000fe8 d4c15038
Call Trace:
 [<e010fe1b>] ext3_xattr_set_handle2+0x473/0x4bd [ext3]
 [<e010f95b>] ext3_xattr_set_handle+0x6fb/0x748 [ext3]
 [<e010feaf>] ext3_xattr_set+0x4a/0x83 [ext3]
 [<e01113e2>] ext3_xattr_security_set+0x3c/0x83 [ext3]
 [<c0176acb>] generic_setxattr+0x48/0x50
 [<c0176307>] setxattr+0x1c3/0x21c
 [<c0166baa>] do_lookup+0x1f/0x8f
 [<c016f6ac>] dput+0x34/0x1a7
 [<c0167717>] __link_path_walk+0xafd/0xbb5
 [<c016f6ac>] dput+0x34/0x1a7
 [<c0167863>] link_path_walk+0x94/0xbe
 [<c011ae59>] do_page_fault+0x0/0x5c6
 [<c0167ba7>] path_lookup+0x14b/0x17f
 [<c01763da>] sys_lsetxattr+0x32/0x45
 [<c011ae59>] do_page_fault+0x0/0x5c6
 [<c02d58d3>] syscall_call+0x7/0xb
 [<c02d007b>] unix_stream_recvmsg+0x362/0x3a5
Code: 72 28 0f 85 7b 01 00 00 f6 00 02 0f 85 72 01 00 00 eb 0c f3 90 8b 45 00 a9
00 00 08 00 75 f4 f0 0f ba 6d 00 13 19 c0 85 c0 75 ea <83> 7e 0c 00 75 39 c7 46
0c 01 00 00 00 83 7f 04 00 7f 29 68 ec
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

...so I'm thinking that these two patches by themselves won't cut it.

Comment 16 Jeff Layton 2007-01-04 20:04:12 UTC

Crash occurred here:

0xe0048101 <journal_dirty_metadata+58>: sbb    %eax,%eax
0xe0048103 <journal_dirty_metadata+60>: test   %eax,%eax
0xe0048105 <journal_dirty_metadata+62>: jne    0xe00480f1
0xe0048107 <journal_dirty_metadata+64>: cmpl   $0x0,0xc(%esi) <<<<< crash here

...but I'm having a real problem trying to match this up with the C code.

Comment 17 Eric Sandeen 2007-01-04 23:05:46 UTC

Jeff, what's your recipe for reproducing this one?

Thanks,

-Eric

Comment 18 Jeff Layton 2007-01-05 13:33:26 UTC

I had it occurring when trying to install a RPM package on my test box. Here's
how it happened:

I needed to install a package on my test box to build something and I was
running this kernel there. I installed it and the machine paniced, I rebooted to
the same kernel and tried installing it again. It paniced at exactly the same spot.

I then booted to a different kernel (without that patch) and the package
installed just fine. I removed the package and then booted to the patched kernel
again, and tried to reinstall the package and it didn't panic that time around.
Later on though, I needed to install a different package and got another panic.

So it seems like once it panics, it will continue to panic consistently in the
same spot until enough activity occurs to make things different enough on disk

I think if you do a lot of rpm activity on the box that will probably make it
happen.

I'm not sure if that means that my backport of the patch was bad (I did have to
make some changes though I thought I did it right), or if one or both of these
patches depends on a different one.

Comment 19 RHEL Program Management 2007-05-09 10:46:42 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Roger Mach 2007-07-02 17:34:39 UTC

I have recently encountered this issue and have a simple way to reproduce it. 
The 'postmark' benchmark seems to reliably cause this assertion failure after
running for less than an hour.  I've seen this on both 32-bit and 64-bit
versions of RHEL4 update 4.  I also tested 32-bit and 64-bit versions of RHEL 5,
where the benchmark ran to completion in about 3 hours with no problems.

You can get postmark here:

http://ftp.debian.org/debian/pool/main/p/postmark/postmark_1.51.orig.tar.gz

After compiling postmark, run it and issue these commands:

set number 1000000
set transactions 1000000
set location /tmp
run

Comment 22 Eric Sandeen 2007-08-31 21:51:54 UTC

Hm, tried the postmark tests, haven't recreated it yet.

-Eric

Comment 24 Eric Sandeen 2007-11-01 18:04:15 UTC

Another candidate:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=96761507cc998cb9c1c06f9d74bfa47cb11ae254
discussion at:
http://lkml.org/lkml/2005/1/19/144

The thread started out with Alex claiming it addressed the assert, but Stephen
disagreed.  It went in, for other reasons (mostly stack savings) but it appears
that it did remedy the assert for Alex, for apparently unknown reasons.

Looks like this went into 2.6.12

This change would also affect jbd KABI...

Comment 35 Josef Bacik 2008-01-08 15:11:21 UTC

I'm setting up a box to try and reproduce this, though I suspect this has probably been 
resolved in 4.6 with the journal checkpoint overhaul.

Comment 36 Josef Bacik 2008-01-14 19:48:26 UTC

I've done everything I could possibly think of to reproduce this issue and I'm getting no 
where.  I've reduced the journal size, changed the journal commit interval to > 5 minutes, 
everything to try and make me run out of blocks and I can't do it.  Could somebody who 
can reproduce this easily try on the newest RHEL4 kernel.  I backported Jan's checkpoint 
overhaul into 4.6 which I believe should resolve this issue, and if you can still get it to 
reproduce please give me a detailed view of your system and what you did to reproduce it 
so I can try and mirror your setup as much as possible.

Comment 37 Roger Mach 2008-01-16 22:19:32 UTC

Created attachment 291901 [details]
console log

Shows kernel boot messages and output from these commands, followed by the
assertion failure and resulting stack trace:

lspci
lspci -v
lspci -v -v
dmidecode
cat /proc/scsi/scsi
cat /proc/scsi/mptsas/0
cat /proc/scsi/mptsas/1
cat /proc/interrupts
cat /proc/ioports
ls -l /sys/bus/scsi/drivers/sd
mount
dumpe2fs -h /dev/mapper/VolGroup01-LogVol00

Comment 38 Roger Mach 2008-01-16 22:23:13 UTC

Comment on attachment 291901 [details]
console log

I was able to reproduce this issue on a RHEL4u4 fresh install running the stock
kernel (2.6.9-42.ELsmp).  I then installed the latest u6 kernel
(kernel-smp-2.6.9-67.EL.x86_64.rpm) and was also able to reproduce the issue. 
This attachment contains my console output

Comment 39 Josef Bacik 2008-01-17 14:08:50 UTC

Would you be willing to upload a core from the crash on the 67 kernel to 
dropbox.redhat.com so I can take a look?  Just give me the filename if you uploaded it so I 
can pull it down.

Comment 40 Roger Mach 2008-01-17 21:37:16 UTC

(In reply to comment #39)
> Would you be willing to upload a core from the crash on the 67 kernel to 
> dropbox.redhat.com so I can take a look?

Sure, that's no problem.  I didn't have netdump configured when I reproduced the
issue yesterday, and since then I have done a fresh RHEL4u6 install on the
machine, so the core I uploaded is from that environment and doesn't match up
with yesterday's kernel log - I hope that's OK.  The filename is
dump_183119.tar.bz2 and its md5sum is 5e9e0a89457ce6079fe702c13e6d254b

Comment 41 Josef Bacik 2008-01-17 22:25:10 UTC

awesome thanks I will pull it down tomorrow and start going through it.

Comment 42 Josef Bacik 2008-01-18 18:42:26 UTC

Created attachment 292185 [details]
debug patch to watch transaction credits

well the committing transaction has a negative number for
t_outstanding_credits, which would wreak all kinds of havoc.  The only way for
this to happen is if somebody runs journal_stop twice on the same handle, which
isn't good.  Could you try this patch and when it panics upload the vmcore and
the kernel-debuginfo from the kernel that you built with this patch?  (or just
the vmlinux/jbd.ko.debug/ext3.ko.debug).

Comment 43 Roger Mach 2008-01-19 00:01:19 UTC

Created attachment 292225 [details]
console log

Unfortunately the kernel I built with this patch paniced my system on boot. 
Attached is the console log.  I verified that I could build a kernel without
this patch that booted cleanly so I therefore conclude that there is a problem
with the patch.

Comment 44 Josef Bacik 2008-01-21 19:01:42 UTC

hmm well thats odd, maybe its because of the GENKSYSM trick.  I'll test this out on my 
box and when I have an actual booting kernel I'll give you the patch.  Sorry about that.

Comment 45 Josef Bacik 2008-01-21 19:23:53 UTC

Created attachment 292398 [details]
new debug patch.

Well my box boots fine with this patch, and there isn't anything obviously
wrong with it, unless its freaking out b/c of the genksyms thing.  If this
patch doesn't fix it, one thing to try would be to go into the patch and where
I have __builtin_return_address(2), change that to __builtin_return_address(1)
and see if it starts playing nicely.

Comment 46 Roger Mach 2008-01-21 22:57:40 UTC

This latest patch did not apply cleanly to the kernel sources, but your
suggestion of using the previous patch with __builtin_return_address(1) resulted
in a kernel that does boot.  So, I have uploaded dump2_183119.tar.bz2 to
dropbox; the md5sum is eb9e968d9843fbc9dd5dbe6aedaa48b9.

Comment 47 Josef Bacik 2008-01-22 15:22:18 UTC

could you also upload the kernel-debuginfo rpm so I can open this core?

Comment 48 Roger Mach 2008-01-22 16:24:50 UTC

I have uploaded kernel-debuginfo-2.6.9-67.EL.jwhiter.patch.x86_64.rpm; the
md5sum is 67f218d594675128d6947744cbdb6d12.

Comment 49 Josef Bacik 2008-01-22 18:54:21 UTC

ooh ok i see you already gave me the vmlinux/ext3.ko.debug/jbd.ko.debug in the original 
tarball.  Sorry about that, thats perfect thank you.  So looking at this my little trap didn't 
get tripped, but we're still negative, so we must be going negative in 
journal_restart_handle.  I assume you already have a kernel tree in place so you don't have 
to keep rebuilding so just add

J_ASSERT(transaction->t_outstanding_credits >= 0);
transaction->t_ip = (unsigned long)__builtin_return_address(1);

to journal_restart_handle after the 

transaction->t_updates--;

line and retry and see if it trips that assert.  Thanks much.

Comment 50 Roger Mach 2008-01-23 04:28:31 UTC

Well, the new assert didn't trip, instead I just got the original issue again. 
In case it reveals any new information I've uploaded the vmcore, etc. in file
dump3_183119.tar.bz2 with md5sum b5a41d34bbbc8d1bbf4fe5a08f65686f.

Comment 51 Josef Bacik 2008-01-23 15:48:48 UTC

Created attachment 292657 [details]
watch everytime we touch t_outstanding_credits

Ok well this is just weird.  Apply this to a clean source tree and see if it
will trip one of these asserts.  There is something that worries me in
journal_commit_transaction, we are doing a t_outstanding_credits--, and the
comments say that journal_next_log_block() modifies this counter, but I can't
find evidence of that anywhere.  So hopefully with this we'll be able to catch
the culprit.

Comment 52 Roger Mach 2008-01-23 18:48:26 UTC

With this patch, my kernel paniced while booting just like the first patched
kernel did.  Changing the __builtin_return_address(2) to
__builtin_return_address(1) fixed that.  This time when I ran the postmark
benchmark the assert was tripped almost immediately during the file creation
phase, while it usually doesn't happen until well into the transaction phase. 
As an experiment, I tried just doing a recursive copy of /usr to another
location and was able to hit the issue within a couple minutes.  I've uploaded
dump4_183119.tar.bz2 with md5sum 60050479d703512682c19fc688fcdb45.

Comment 53 Josef Bacik 2008-01-23 19:10:46 UTC

Created attachment 292682 [details]
debug patch with potential fix.

Ok so what I suspected was going wrong is going wrong.	Try this patch,
hopefully it will fix the problem and if not it still has the other debug stuff
in place so it will at least point me in the next direction we need to go. 
Thanks sooooo much for helping me, I would have never figured this out without
your help.

Comment 54 Roger Mach 2008-01-24 00:17:54 UTC

I'm glad to be of assistance.  This latest patch is interesting; when I started
the postmark benchmark after building with the patch, the performance was much
slower than ever before.  Normally the file creation phase takes 10-15 minutes,
but it was still running after an hour!  So, based on the theory that the
asserts were injecting a lot of extra delay, I rebuilt a kernel with just the
"fix" part of your latest patch, and no asserts, hoping that it would run
faster.  Well, it's been running for about 3 hours so far and still hasn't
finished creating files.  I'm going to let it run overnight and see what
happens, but I'm concerned that if things run this slowly we might not hit the
same race condition or combination of code paths as when it ran more quickly. 
I'll check on it late tonight and report if anything interesting has happened.

I am a bit concerned that even though this change may fix the issue, the
resulting performance hit may not be acceptable.

Comment 55 Josef Bacik 2008-01-24 02:37:39 UTC

Hmm thats odd.  I agree if its going to be that severe of a performance hit then
something else must be going on.  I wonder if whats happening is we are in fact
supposed to go into the negative to account for the amount of space in the
journal we are freeing up, and with my "fix" we appear to be running out of
journal space more quickly and therefore are waiting for commits to occur in
order to free up space to continue doing IO.  I will look into this more
tomorrow when I actually have the code sitting in front of me.  If thats the
case I'm definitely going to be adding some more specific comments as there is
no information anywhere that would indicate that.

Comment 56 Roger Mach 2008-01-24 08:05:45 UTC

Well, postmark finally got through the file creation phase, and currently is 70%
through the transaction phase with no panics, so that is the good news.  Bad
news of course is that it may end up taking 12 hours to complete a run that
should take around 3 hours.

Comment 57 Roger Mach 2008-01-24 16:58:48 UTC

The benchmark did eventually complete without error.

Comment 58 Josef Bacik 2008-01-24 17:18:59 UTC

Well looking through it jbd does weird things in order to track how much of the
journal it needs, and of course it tracks how much is needed by the committing
transaction by making t_outstanding_credits go negative in order to account for
the blocks that its using.  Why we don't just return j_max_blocks - j_free i
don't know, so I'm going to do that and take out this t_outstanding_credits
hackery and see how that goes, and if it works well enough with performance and
doesn't cause anything to blow up I'll attach it here and see if it will make
your problem go away.

Comment 60 Josef Bacik 2008-01-28 15:41:05 UTC

rrr i lost my big long comment about what was going on, so short version is I understand 
whats going wrong, I just need to find where it is, hopefully the next patch i give you will 
actually work.

Comment 61 Josef Bacik 2008-01-30 16:23:21 UTC

Created attachment 293429 [details]
potential fix

Ok after going through this and staring at everything for entirely too long I
think I've figured out an acceptable solution.	Originally I thought somebody
may be adding something to the transaction without first doing a
journal_get_write_access(), which would be incorrect as it screws up the
accounting.  What I found is that the committing transaction will sometimes
punt some of its buffers onto the next running transaction, which doesn't take
into account any of the credit handling stuff which is why you get such a
discrepancy between t_nr_buffers and t_outstanding_credits.  So the solution I
think is to not use t_outstanding_credits at all but rather use t_nr_buffers,
since at this point in the game nobody should be modifying the committing
transaction.  This patch should apply to a clean source tree.  Let me know how
it works for you.

Comment 62 Roger Mach 2008-01-31 15:50:53 UTC

I had to make one slight change to the patch before the kernel would compile,
changing 'transaction_t trans = ...' to 'transaction_t *trans = ...' but after
that, I was able to finish a postmark run of 1000000 files and 1000000
transactions without any issues.  Looks like a good fix, I'm happy to say.

Comment 63 Josef Bacik 2008-01-31 16:00:49 UTC

awesome, thank you for testing this, now to send it upstream.

Comment 64 Josef Bacik 2008-02-01 16:29:25 UTC

Created attachment 293746 [details]
final version of the patch.

This is the final version of the patch that was accepted upstream and will be
going into RHEL4.

Comment 66 Roger Mach 2008-02-01 18:15:27 UTC

I just tried building a kernel with this patch and hit a compilation error:

include/linux/jbd.h: In function `jbd_space_needed':
include/linux/jbd.h:1091: error: `trans' undeclared (first use in this function)

Comment 67 Josef Bacik 2008-02-01 18:56:22 UTC

Created attachment 293757 [details]
final version of the patch that actually compiles.

Doh sorry about that, didn't update my local copy with the copy that I tested. 
Heres the right one.

Comment 68 Josef Bacik 2008-02-05 19:08:08 UTC

bah, Jan just looked at my patch and looks like this isn't the correct fix, so I'm asking that 
they pull it from the ext4 patch queue.  Back to the drawing board...

Comment 69 Vivek Goyal 2008-02-15 16:03:02 UTC

Changing the status back to Assigned as the right fix is yet to be posted.

Comment 70 Josef Bacik 2008-02-15 16:06:05 UTC

Testing a new fix, finally narrowed down what was going wrong.  Provided nothing goes 
horribly wrong with my testing I'll post the patch so those who can actually hit the 
assertion can make sure that it goes away with my new patch.

Comment 71 IBM Bug Proxy 2008-02-15 16:12:29 UTC

------- Comment From jwhiter 2008-01-08 10:11 EST-------

------- Comment From jwhiter 2008-01-14 14:48 EST-------


------- Comment From jwhiter 2008-01-17 09:08 EST-------




------- Comment From jwhiter 2008-01-17 17:25 EST-------
------- Comment From jwhiter 2008-01-21 14:01 EST-------
------- Comment From jwhiter 2008-01-22 10:22 EST-------
------- Comment From jwhiter 2008-01-22 13:54 EST-------





------- Comment From jwhiter 2008-01-23 21:37 EST-------
------- Comment From jwhiter 2008-01-24 12:18 EST-------
------- Comment From jwhiter 2008-01-28 10:41 EST-------
------- Comment From jwhiter 2008-01-31 11:00 EST-------

------- Comment From jwhiter 2008-02-05 14:08 EST-------

Comment 72 Josef Bacik 2008-02-15 21:47:07 UTC

Created attachment 295041 [details]
next attempt at a fix

Ok this patch looks to be working and hasn't broken anything.  Could you please
test it and make sure it keeps the assertion from occurring.

Comment 73 Roger Mach 2008-02-15 23:45:30 UTC

Running with this patch I hit the following assert fairly quickly:

Assertion failure in do_get_write_access() at fs/jbd/transaction.c:693:
"handle->h_buffer_credits > 0"

I have uploaded dump5_183119.tar.bz2 with md5sum c1d3068e76794f595e101b837c8e3a95.

Comment 74 Josef Bacik 2008-02-18 16:23:23 UTC

lovely, fix one bug and another pops up.  This is where jlayton was going before with that 
xattr thing, we really shouldn't use journal_release_buffer at all.

Comment 75 Josef Bacik 2008-02-19 18:32:08 UTC

Created attachment 295315 [details]
yet another patch.

Ok so my original fix uncovered this nice little bug where we leak credits
because the xattr code _loves_ to use journal_release_buffer instead of only
getting write access when it absolutely needs to.  So this patch includes my
original fix and the xattr fix which should keep us from running out of handle
credits.  Let me know how this works for you.

Comment 76 Roger Mach 2008-02-20 18:08:46 UTC

I ran the postmark benchmark with a million files and two million transactions,
and after completing about 80% of the transactions it hit the same assert:

Assertion failure in do_get_write_access() at fs/jbd/transaction.c:693:
"handle->h_buffer_credits > 0"

I've uploaded dump6_183119.tar.bz2 with md5sum 515e6f09df0da1e95571b1c54459e0d5

Comment 77 Josef Bacik 2008-02-20 22:59:48 UTC

Ok after looking into this more than i would have ever wanted to it turns out upstream 
does in fact have the same sort of problem, which makes me feel less bad about why the 
upstream solutions weren't fixing the problem.  So now to figure out a way to fix 
upstream, and then backport.

Comment 78 Josef Bacik 2008-02-26 18:55:43 UTC

FYI we've found an acceptable solution to this for upstream, so after my testing is finished 
I will be posting those patches, and as soon as they are accepted somewhere I will post 
RHEL patches for testing here, and if all goes well will post them and we'll be all set.

Comment 79 Josef Bacik 2008-03-03 16:02:09 UTC

Created attachment 296621 [details]
upstream backport.

Please test this patch, it is a backport of the patches that just went upstream
that should fix this problem.  Please test it to make sure I didn't miss
anything, as this requires changing quite a bit of the core jbd code for RHEL4.

Comment 80 Roger Mach 2008-03-04 06:23:43 UTC

I am happy to report that my system running with this patch successfully
completed a run of the postmark benchmark with one million files and two million
transactions.  All of the failures I encountered except for the one mentioned in
comment #76 occurred well within a run of one million files and one million
transactions.  Out of an abundance of caution I have started a run with four
million files and four million transactions; if it encounters any issues I'll
report back.

Comment 81 Josef Bacik 2008-03-04 15:47:57 UTC

Awesome thank you for testing this, it has been a tremendous help.

Comment 83 Eric Sandeen 2008-03-04 17:09:55 UTC

Just for future reference; the posted patch consists of essentially these 3
patches backported:

original addition of b_modified:

http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=5025daf1ae75ab4d91f48cebb9af3d4a519e611b

josef's 2 patches now in -mm:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc3/2.6.25-rc3-mm1/broken-out/jbd-fix-possible-journal-overflow-issues.patch

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc3/2.6.25-rc3-mm1/broken-out/jbd-fix-the-way-the-b_modified-flag-is-cleared.patch

Comment 84 Roger Mach 2008-03-05 16:26:30 UTC

FYI, the postmark run with four million files and four million transactions did
complete without error.  Nice to see this issue resolved.

Comment 85 Josef Bacik 2008-03-07 20:18:53 UTC

Sweet thanks for testing all of this for me.  Wouldn't you know as soon as I got it fixed I 
can now hit the panic pretty regularly :).

Comment 86 Vivek Goyal 2008-03-11 23:51:09 UTC

Committed in 68.20.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 89 IBM Bug Proxy 2008-06-11 21:34:02 UTC

------- Comment From mranweil.com 2008-06-11 17:26 EDT-------
Closing on this side with the testing from a few comments (and couple months ;)
ago.  Patch is in 4.7.  Thank you.

Comment 92 errata-xmlrpc 2008-07-24 19:11:08 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html