Bug 75556 - ext3 crash (kjournald) on 2-4-18-3smp (bi-Xeon HyperThreaded)
Summary: ext3 crash (kjournald) on 2-4-18-3smp (bi-Xeon HyperThreaded)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-10-09 21:19 UTC by Stephane TRIMOULINARD
Modified: 2005-10-31 22:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-12-17 02:41:01 UTC
Embargoed:


Attachments (Terms of Use)

Description Stephane TRIMOULINARD 2002-10-09 21:19:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
The system crash every 20hours approximatively.
The system still respond to ping, but there is no more console nor 
telnet,ftp,...
The output on the console shows that this is the process kjournald which 
crashes, with a message : "EIP is at journal_commit_transaction [kernel]"
Only a reboot solve the problem (for 20 hours more...)


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.reboot and wait one day, the system crash.
2.
3.
	

Actual Results:  I've already tried to :
- update in 2-4-19 ans 2-4-pre-20 => same bug
- deactivate the hyperthreading so that I see only two CPUs (my 2 Xeon ,1,8Ghz 
on DELL P2650) => no more success, same bug

I'm trying since this morning to run a mono-proc kernel, I'll know tomorrow if 
the problem seems to be solved.


Additional info:

Here is the output on the console :
kernel bug at commit.c : 535 !
invalid operand : 0000
ide-cd cdrom nls-iso8859-1 nls-cp437 vfat fat eepro100 usb_ohci usbcore ide-disk
CPU : 2
EIP : 0010 [<c01753f4>] Not tainted
EFLAGS : 00010286
EIP is at journal_Commit_transaction [kernel] 0xcf4 (2.4.19-SG)
eax: 0000001c   ebx: 0000000a   ecx: c029aec0   edx: 00004c04
esi: f7312960   edi: f7b34a00   ebp: f7b4e0000   esp: f7b34fe78
ds: 0018   es: 0018   ss: 0018
Process kjournald (pid: 212, stackpage=f7b4f000)
Stack : c0213464  00000217  000dfef0   00000000  00000fdc  f52dd024  00000000  
f777c200  f5acc2d0
00000df4  37363524  42413938  46454443  4a494847  f5286700 f5286680 f5286700 
f5931380  f5931780
f5931700 f5931680  f5931600  f5931580  f5738a80 
Call Trace: [<c0122a95>] update_process_times [kernel] 0x25 
[<c01148d9>]smp_apic_timer_interrupt [kernel] 0xa9
[<c010a756>] do_IRQ [kernel] 0xc6
[<c01174e8>] schedule [kernel] 0x348
[<c0178053>] kjournald [kernel] 0x1a3 
[<c0177e90>] do_IRQ [kernel] 0xa5 
[<c0107a48>] commit_timeout [kernel] 0x0 
[<c0107286>] kernel_thread [kernel] 0x26
[<c0177eb0>] kjournald [kernel] 0x0
 
Code: 0f 0b 5a 59 6a 04 8b 44 24 18 50 56 e8 4b ef ff ff 8d 47 48

Here is an lsmod : 

[root@proxy]# lsmod
Module    Size                    Used by                Not tainted
eepro100  19504                    2
usb_ohci  19040                    0                    (unused)
usbcore   71840                    1                    [usb_ohci]
ide-disk  11968                    0
ide-probe-mod  10476                    0
ide-mod   67584                    0                     [ide-disk ide-probe-
mod]
aacraid   26536                    7
sd_mod    12736                    14
scsi_mod  106784                  2                    [aacraid sd_mod]
 
Help appreciated, this system has to be stable for ...friday !!!

Comment 1 Bill Nottingham 2002-10-09 21:35:06 UTC
FWIW, SMP problems with ext3 have been fixed in the 7.3 errata kernels. You
really should update.

Comment 2 Stephane TRIMOULINARD 2002-10-09 21:51:20 UTC
Do you mean that upgrading to 2.4.18-10 (RHBA-2002:085-11 and RHSA-2002:158-09) 
can solve this problem ?
If yes, I don't understand then why 2.4.19 and 2.4-pre20 did not solve 
this...??? Can it be normal (in fact, have I got a chance to fix it in 2.4.18-
10) ?

Best regards,


Comment 3 Stephane TRIMOULINARD 2002-10-11 17:04:36 UTC
The Mono-Proc Kernel seems to be stable (2 days and 9h00 up without crash).
But as I need performance, I really would like to test the stability of smp...
Can you help me to know 
- if the 2.4.18-10 kernel really fix it ?
- if yes, is it then possible that the 2.4.19 and 2.4-pre20 don't fix it ???

Thanks very much for your help, this is getting very urgent !



Comment 4 Stephane TRIMOULINARD 2002-10-11 17:07:24 UTC
Sorry, this seems to be the right procedure to report needs for more 
information...forget the last additional comment posted by myself...
---------------------------------------------------------------------
The Mono-Proc Kernel seems to be stable (2 days and 9h00 up without crash).
But as I need performance, I really would like to test the stability of smp...
Can you help me to know 
- if the 2.4.18-10 kernel really fix it ?
- if yes, is it then possible that the 2.4.19 and 2.4-pre20 don't fix it ???

Thanks very much for your help, this is getting very urgent !


Comment 5 Michael Ivanov 2003-02-20 14:55:02 UTC
I have same problem on 2 processor server (2x1.4Gz PIII, SDRAM133 on Tyan
Thunder HEsl-T motherboard with RAID-10 4x18Gb internal disks system). It
doesn't happen  so often, but there is one time in some days. I have also other
same server with same Linux but with RAID-5 instead RAID-10. It hadn't this
problem. After kjournald had crashed my Oracle server was still alive (as it
looks from logs) but without net services. I think hardware hyperthreading is
not important for this problem, more important kernel smp feature and disk
system support code.


Note You need to log in before you can comment on or make changes to this bug.