Bug 132610 - Kernel panic (Oops) in kjournald
Kernel panic (Oops) in kjournald
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
2
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Stephen Tweedie
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-14 21:50 EDT by Guy Albertelli
Modified: 2007-11-30 17:10 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-16 01:13:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Server Configuration information (21.36 KB, text/plain)
2004-09-14 21:56 EDT, Guy Albertelli
no flags Details
System info for Taner Halicioglu (13.22 KB, text/plain)
2004-11-05 15:09 EST, Taner Halicioglu
no flags Details
kjournald OOPS in 2.6.10-1.9_FC2smp (1.70 KB, text/plain)
2005-01-25 00:23 EST, Taner Halicioglu
no flags Details

  None (edit)
Description Guy Albertelli 2004-09-14 21:50:44 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115

Description of problem:
Server hung after 19 days of medium to heavy webserver application
load during a particularly high load time, upon reboot an Oops was
found in the logs
   - the main filesystem is ext3 on a Dell PowerEdge RAID Controller 
     3/DC using the megaraid driver

Oops message:

Sep 13 20:55:00 s10 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000004
Sep 13 20:55:00 s10 kernel:  printing eip:
Sep 13 20:55:00 s10 kernel: f8861407
Sep 13 20:55:00 s10 kernel: *pde = 00003001
Sep 13 20:55:00 s10 kernel: Oops: 0002 [#1]
Sep 13 20:55:00 s10 kernel: SMP
Sep 13 20:55:00 s10 kernel: Modules linked in: ipt_REJECT ipt_state
ip_conntrack iptable_filter ip_tables md5 ipv6 parport_pc lp parport
autofs4 sunrpc tg3 floppy sg microcode st dm_mod ohci_hcd button
battery asus_acpi ac ext3 jbd megaraid aic7xxx sd_mod scsi_mod
Sep 13 20:55:00 s10 kernel: CPU:    2
Sep 13 20:55:00 s10 kernel: EIP:    0060:[<f8861407>]    Not tainted
Sep 13 20:55:00 s10 kernel: EFLAGS: 00010202   (2.6.8-1.521smp)
Sep 13 20:55:00 s10 kernel: EIP is at
journal_commit_transaction+0x70c/0x13f7 [jbd]
Sep 13 20:55:00 s10 kernel: eax: 211da6ec   ebx: 00000000   ecx:
0196a980   edx: 00000000
Sep 13 20:55:00 s10 kernel: esi: 211da6ec   edi: e1f03a00   ebp:
e1e38780   esp: e0aadda0
Sep 13 20:55:00 s10 kernel: ds: 007b   es: 007b   ss: 0068
Sep 13 20:55:00 s10 kernel: Process kjournald (pid: 1138,
threadinfo=e0aad000 task=e0839370)
Sep 13 20:55:00 s10 kernel: Stack: 00000000 00000000 00000000 00000000
00000000 2c5d1a7c e1f03a00 5511ddac
Sep 13 20:55:00 s10 kernel:        00000359 00000002 e0aade14 0211c589
000001f6 00000000 00000002 00000230
Sep 13 20:55:00 s10 kernel:        0000015f 00000280 00000000 00000075
0240d1b8 0240d1d0 00000000 e0839370
Sep 13 20:55:00 s10 kernel: Call Trace:
Sep 13 20:55:00 s10 kernel:  [<0211c589>]find_busiest_group+0xe6/0x2b7
Sep 13 20:55:00 s10 kernel:  [<0211ebdc>]
autoremove_wake_function+0x0/0x2d
Sep 13 20:55:00 s10 kernel:  [<0211ebdc>]
autoremove_wake_function+0x0/0x2d
Sep 13 20:55:00 s10 kernel:  [<f886418f>] kjournald+0x10d/0x326 [jbd]
Sep 13 20:55:00 s10 kernel:  [<0211ebdc>]
autoremove_wake_function+0x0/0x2d
Sep 13 20:55:00 s10 kernel:  [<0211ebdc>]
autoremove_wake_function+0x0/0x2d


Version-Release number of selected component (if applicable):
kernel-2.6.8-1.521smp

How reproducible:
Didn't try


Additional info:
Comment 1 Guy Albertelli 2004-09-14 21:56:54 EDT
Created attachment 103853 [details]
Server Configuration information

Full system information in the form linux-kernel asks for it.
Comment 2 Taner Halicioglu 2004-11-05 14:54:11 EST
I have the same (or very similar) bug.

This was on the console of my machine that had locked up:

Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
82860407
*pde = 00003001
Oops: 0002 [#1]
SMP
Modules linked in: autofs4 e1000 e100 mii ipt_REJECT ipt_LOG
iptable_filter ip_tables microcode dm_mod ehci_hcd uhci_hcd button
battery asus_acpi ac ext3 jbd 3w_xxxx sd_mod scsi_mod
CPU:    2
EIP:    0060:[<82860407>]    Not tainted
EFLAGS: 00010206   (2.6.8-1.521smp)
EIP is at journal_commit_transaction+0x70c/0x13f7 [jbd]
eax: 02f5029c   ebx: 00000000   ecx: 3e531458   edx: 00000000
esi: 02f5029c   edi: 39fa0800   ebp: 04398780   esp: 809afda0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1231, threadinfo=809af000 task=81c93270)
Stack: 00000000 00000000 00000000 00000000 00000000 3b56d9bc 39fa0800
6b0217dc
       0000127b 00000002 00000118 00000075 000000fe 00000073 00000073
0240d1b8
       0240d1a0 809afe14 00000003 040310a0 00000003 040310a0 00000000
81c93270
Call Trace:
 [<0211ebdc>] autoremove_wake_function+0x0/0x2d
 [<0211ebdc>] autoremove_wake_function+0x0/0x2d
 [<0211cc95>] rebalance_tick+0x90/0xb0
 [<0211c589>] find_busiest_group+0xe6/0x2b7
 [<0223b063>] ata_probe+0x50/0xb4
 [<022d661a>] schedule+0x65e/0x6e9
 [<02128521>] del_timer_sync+0x6f/0x87
 [<8286318f>] kjournald+0x10d/0x326 [jbd]
 [<0211ebdc>] autoremove_wake_function+0x0/0x2d
 [<0211ebdc>] autoremove_wake_function+0x0/0x2d
 [<8286307c>] commit_timeout+0x0/0x5 [jbd]
 [<82863082>] kjournald+0x0/0x326 [jbd]
 [<021041f1>] kernel_thread_helper+0x5/0xb
Code: f0 ff 43 04 8b 03 a8 04 0f 84 b1 00 00 00 8b 4c 24 18 b2 01


Also, I have other machines randomly locking up (heavily-used
webservers).  None had OOPSes in syslog - but on one I noticed
kjournald taking 99% CPU, and shortly thereafter the machine froze.

This is kernel-2.6.8-1.521smp

Dual Xeon machines.

-Taner
Comment 3 Taner Halicioglu 2004-11-05 15:09:13 EST
Created attachment 106236 [details]
System info for Taner Halicioglu
Comment 4 Dave Jones 2004-11-27 16:46:27 EST
is this still a problem with the 2.6.9 based kernel update ?
Comment 5 Mukund 2004-11-29 09:15:43 EST
Dave: I get the similar issue on my SMP P3 machine when using
subversion under heavy load. The latest kernel update 2.6.9 kernel
crashes out in kjournald. So yes, the bug exists in 2.6.9. This
happened several times in recent kernel updates.

We are still using the box, so the next time it crashes I'll supply a
log of what it displays.

Bug #140653 seems to be related too.
Comment 6 Guy Albertelli 2004-11-29 17:43:50 EST
I've so far only seen this crash the once under 2.6.8, we probably
won't install the 2.6.9 update until decemeber.
Comment 7 David Mansfield 2005-01-11 15:12:35 EST
I just got what I think is the same bug.  I also have a dell poweredge (2850)
dual xeon, and have the megaraid controller (dell percraid) and ext3fs f/s with
heavy i/o going on at the time. 

Kernel was kernel-smp-2.6.9-1.681_FC3

I have upgraded to the 2.6.10-1.737fc3smp kernel... we'll see...

Here's the oops:

Jan  9 19:05:09 ccidb-10g-sec kernel: Unable to handle kernel NULL pointer
dereference at virtual address 0000000c
Jan  9 19:05:09 ccidb-10g-sec kernel:  printing eip:
Jan  9 19:05:09 ccidb-10g-sec kernel: 8282e63f
Jan  9 19:05:09 ccidb-10g-sec kernel: *pde = 00004001
Jan  9 19:05:09 ccidb-10g-sec kernel: Oops: 0002 [#1]
Jan  9 19:05:09 ccidb-10g-sec kernel: SMP
Jan  9 19:05:09 ccidb-10g-sec kernel: Modules linked in: md5 ipv6 parport_pc lp
parport autofs4 i2c_dev i2c_core sunrpc ipt_REJECT ipt_state ip_conntrack
iptable_filter ip_tables button battery ac uhci_hcd ehci_hcd e1000 floppy sg
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod megaraid_mbox megaraid_mm sd_mod
scsi_mod
Jan  9 19:05:09 ccidb-10g-sec kernel: CPU:    1
Jan  9 19:05:09 ccidb-10g-sec kernel: EIP:    0060:[<8282e63f>]    Not tainted VLI
Jan  9 19:05:09 ccidb-10g-sec kernel: EFLAGS: 00010206   (2.6.9-1.681_FC3smp)
Jan  9 19:05:09 ccidb-10g-sec kernel: EIP is at
journal_commit_transaction+0x442/0xfb3 [jbd]
Jan  9 19:05:09 ccidb-10g-sec kernel: eax: 47865b6c   ebx: 00000000   ecx:
046a4b80   edx: 3487614c
Jan  9 19:05:09 ccidb-10g-sec kernel: esi: 39e82200   edi: 47865b6c   ebp:
7ef27d00   esp: 802fedf0
Jan  9 19:05:09 ccidb-10g-sec kernel: ds: 007b   es: 007b   ss: 0068
Jan  9 19:05:09 ccidb-10g-sec kernel: Process kjournald (pid: 1529,
threadinfo=802fe000 task=802eaf10)
Jan  9 19:05:09 ccidb-10g-sec kernel: Stack: 00000000 00000000 00000000 00000000
00000000 00000000 685e4cec 39e82200
Jan  9 19:05:09 ccidb-10g-sec kernel:        276e39bc 00001744 00000000 802eaf10
0211e256 802fee44 802fee44 046ba054
Jan  9 19:05:09 ccidb-10g-sec kernel:        8285cf24 802fee44 00000000 802eaf10
0211e256 802fee44 802fee44 01000209
Jan  9 19:05:09 ccidb-10g-sec kernel: Call Trace:
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<0211e256>]
autoremove_wake_function+0x0/0x2d
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<8285cf24>] megaraid_isr+0x1ad/0x1bf
[megaraid_mbox]
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<0211e256>]
autoremove_wake_function+0x0/0x2d
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<82830e59>] kjournald+0xc7/0x215 [jbd]
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<0211e256>]
autoremove_wake_function+0x0/0x2d
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<0211e256>]
autoremove_wake_function+0x0/0x2d
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<82830d8c>] commit_timeout+0x0/0x5 [jbd]
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<82830d92>] kjournald+0x0/0x215 [jbd]
Jan  9 19:05:09 ccidb-10g-sec kernel:  [<021041f1>] kernel_thread_helper+0x5/0xb
Jan  9 19:05:09 ccidb-10g-sec kernel: Code: 00 00 e8 58 98 92 7f 8b 54 24 14 89
d8 e8 8a 0b 00 00 89 f0 e8 78 d8 a8 7f 83 7d 18 00 0f 84 1f 01 00 00 8b 45 18 8b
78 20 8b 1f <f0> ff 43 0c 8b 03 a8 04 74 5a 8b 4c 24 1c 8d 81 e4 00 00 00 e8
Comment 8 Taner Halicioglu 2005-01-25 00:21:26 EST
Just got a similar oops again, this time on 2.6.10-1.9_FC2smp...  Same
hardware as before.

-Taner


Unable to handle kernel NULL pointer dereference at virtual address
0000000c
 printing eip:
f8873669
*pde = 371ec001
Oops: 0002 [#1]
SMP
Modules linked in: autofs4 e1000 e100 mii ipt_REJECT ipt_LOG
iptable_filter ip_tables microcode dm_mod ehci_hcd uhci_hcd video
button battery ac ext3 jbd 3w_xxxx sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f8873669>]    Not tainted VLI
EFLAGS: 00010286   (2.6.10-1.9_FC2smp)
EIP is at journal_commit_transaction+0x458/0xe83 [jbd]
eax: f04d950c   ebx: 00000000   ecx: 00000008   edx: 00000000
esi: f04d950c   edi: f5472fa4   ebp: c8691880   esp: f5472df0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1556, threadinfo=f5472000 task=f5b55530)
Stack: 00000000 00000000 00000000 00000000 00000000 d83d60bc f5413200
ec611a7c
       000017f9 00000000 f5b55530 c012e826 f5472e40 f5472e40 00000000
00000000
       00000000 00000000 f5b55530 c012e826 f5472e40 f5472e40 00000000
00000000
Call Trace:
 [<c012e826>] autoremove_wake_function+0x0/0x2d
 [<c012e826>] autoremove_wake_function+0x0/0x2d
 [<c01199a1>] find_busiest_group+0xf1/0x2d2
 [<c0119da9>] load_balance_newidle+0x54/0x6c
 [<c0119204>] finish_task_switch+0x30/0x66
 [<c02bd1da>] schedule+0x906/0x94d
 [<c0124fb8>] del_timer_sync+0x7a/0xa3
 [<f8875b63>] kjournald+0xd1/0x222 [jbd]
 [<c012e826>] autoremove_wake_function+0x0/0x2d
 [<c012e826>] autoremove_wake_function+0x0/0x2d
 [<c011924c>] schedule_tail+0x12/0x4e
 [<f8875a8c>] commit_timeout+0x0/0x5 [jbd]
 [<f8875a92>] kjournald+0x0/0x222 [jbd]
 [<c01021f5>] kernel_thread_helper+0x5/0xb
Code: 89 d9 e8 d7 26 8e c7 8b 54 24 10 89 d8 e8 44 0a 00 00 89 f0 e8
62 a5 a4 c7 83 7d 18 00 0f 84 0d 01 00 00 8b 45 18 8b 70 20 8b 1e <f0>
ff 43 0c 8b 03 a8 04 74 53 8b 4c 24 18 8d 81 e4 00 00 00 e8
Comment 9 Taner Halicioglu 2005-01-25 00:23:50 EST
Created attachment 110180 [details]
kjournald OOPS in 2.6.10-1.9_FC2smp
Comment 10 Dave Brown 2005-01-31 23:17:19 EST
Unfortunately same issue here, we are using IBM ServeRAID 6M with
EXP400 external RAID enclosure. Will post oops once i've transposed it
from photo. Anyone know if FC3 or latest FC2 kernel fixes this. (we
are currently using 2.6.9-1.6_FC2smp)
Comment 11 Stephen Tweedie 2005-02-01 06:10:40 EST
There was one possible cause of a related problem fixed in the latest
kernel: a bug in the extended attribute code could trigger a kjournald
oops, by allowing one thread to free a buffer while another was trying
to use it.

In theory, it might still be possible to trigger that if there are
lingering inconsistencies on-disk.  Forcing a fsck of all filesystems
would eliminate that possibility.
Comment 12 tony 2005-02-21 11:22:00 EST
With regard to Stephen's comment (#11). Which version is "latest". I have seen
the same problem with 2.6.10-1.760_FC3smp (released 2 Feb). I have just upgraded
to 766. An fsak says the file systems are clean. There was no such issue with
the 2.4 kernel.
Comment 13 Stephen Tweedie 2005-02-21 12:09:58 EST
"An fsak says the file systems are clean."

Yes, but did you *force* the fsck?  fsck *always* says the fs is clean for a
journaled filesystem --- it won't actually go looking for on-disk problems
unless either you force it to with fsck -f, or the fsck expiry counts are
exceeded, or the fs is already marked as having errors.
Comment 14 tony 2005-02-21 21:05:41 EST
Stephen. Thanks for the correction. I then ran an fsck -f on all but the root
partition. Some of the file systems required two tries to come up clean-so you
were right. Then I rebooted with a forced fsck. My impression is that this is a
"real" fsck, equivalent to fsck -f. Please correct me if I'm wrong.
So I think I have cleaned the root partition as well (although this was
reformatted recently).

So now we will wait and see. There are 16 identical nodes so it should show up soon.

One crazy question. Could this be related to bug 132584 (dma_timer_expiry)? I
ask because both these issues have showed up when I switched  from RH9 to FC3.
Comment 15 Stephen Tweedie 2005-02-22 05:01:09 EST
"Could this be related to bug 132584 (dma_timer_expiry)"

It's hard to know for sure, but I doubt it.  The DMA problem would only be
causing other difficulties to the fs if it was resulting in corrupt data on
disk, and there's no definite sign of that.

Did you by any chance record what the fsck logs showed as being wrong with the
filesystems?  Thanks.
Comment 16 tony 2005-02-22 13:15:15 EST
Replying to #15.

There was no error message except to say that the file system had been modified.
Nodes have been up for 24 hours now, although usage has not been as heavy as usual.
Comment 17 Dave Jones 2005-04-16 01:13:20 EDT
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Note You need to log in before you can comment on or make changes to this bug.