Bug 146344

Summary: kernel oops in kjournald on 2.6.10- smp kernels
Product: [Fedora] Fedora Reporter: Vincent Schonau <rhbugzilla>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: davej, sundaram, wtogami, zing
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-05 07:33:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel oops output
none
similar oops from kernel-smp-2.6.9-1.667
none
same oops on 2.6.10-1.766_FC3smp
none
dmsg for the system on 2.6.10-1.766_FC3smp
none
Patch to fix race in journal_unmap_buffer() none

Description Vincent Schonau 2005-01-27 09:03:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.5.6 (KHTML, 
like Gecko) Safari/125.12

Description of problem:
My FC3 system has been suffering apparently random crashes since the upgrade to 2.6.10 
FC3 kernels. I was finally able to capture an oops by logging the serial console.

This is for kernel-smp-2.6.10-1.741_FC3. The crashes have been occurring since FC3 
went from 2.6.9 to 2.6.10.

The sytem is a Supermicro 6012P-i:
http://www.supermicro.nl/products/system/1U/6012/SYS-6012P-i.cfm
with an Intel E7501 chipset, 2x P4 at 2.4GHz . 

There's no indication in logs or anywhere else about what triggers this oops, but perhaps 
I'm overlooking something.

The system appears to continue to respond to 'pings', but as far as I can see, al other 
processes hang or die.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.10-1.741_FC3

How reproducible:
Always

Steps to Reproduce:
No specific steps to reproduce. Occurs at apparently random times. 

Actual Results:  System crash.

Additional info:

Comment 1 Vincent Schonau 2005-01-27 09:05:21 UTC
Created attachment 110281 [details]
kernel oops output

Comment 2 jjaakkol 2005-02-01 14:43:18 UTC
I have the same problem on IBM xserver 435 (dual xeon system with
hyperthreading on both processors). My stack trace looks exatly the
same, with same symptoms. The server is a heavily loaded mail server,
but does not use iptables modules. Seems to be ext3 and journald
related. I was using data=writeback on one of our filesystems.

Comment 3 Vincent Schonau 2005-02-07 10:41:02 UTC
Created attachment 110713 [details]
similar oops from kernel-smp-2.6.9-1.667

It appears this problem originates before 2.6.10; I went back to
kernel-smp-2.6.9-1.667, which resulted in the attached oops.

Comment 4 Vincent Schonau 2005-02-23 07:52:11 UTC
Created attachment 111322 [details]
same oops on 2.6.10-1.766_FC3smp

Comment 5 Vincent Schonau 2005-02-23 07:53:05 UTC
Created attachment 111323 [details]
dmsg for the system on 2.6.10-1.766_FC3smp

Comment 6 Stephen Tweedie 2005-03-18 15:07:12 UTC
Created attachment 112127 [details]
Patch to fix race in journal_unmap_buffer()

This patch fixes a race condition between journal_unmap_buffer() and
journal_commit_transaction().  It involves journal_put_journal_head() being
called without any locking, and thus hitting a small window in kjournald where
the buffer's b_transaction can be temporarily NULL.  If that triggers, the
journal_unmap_buffer() ends up throwing away the journal_head that is still in
use by journal_commit_transaction.

Comment 7 Stephen Tweedie 2005-03-18 21:07:41 UTC
This patch been committed to CVS and will be in the next update release.