Bug 176809

Summary: journal_start crashed when doing long time test
Product: Red Hat Enterprise Linux 4 Reporter: liubin <liub>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, rwheeler
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: Unable to handle kernel NULL pointer dereference at virtual address 000000b0
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-16 18:47:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liubin 2006-01-03 06:34:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
We are doing a long time and high load test for an our server program 
on Redhat AS4.0. After about 20 hours running, our server program was
down and the following messages were outputed in /var/log/messages.

Dec 27 21:00:52 ruixj kernel: Unable to handle kernel NULL pointer dereference 
at virtual address 000000b0
Dec 27 21:00:52 ruixj kernel:  printing eip:
Dec 27 21:00:52 ruixj kernel: e00326e0
Dec 27 21:00:52 ruixj kernel: *pde = 04e84067
Dec 27 21:00:52 ruixj kernel: Oops: 0000 [#1]
Dec 27 21:00:52 ruixj kernel: Modules linked in: i915 parport_pc lp parport 
autofs4 sunrpc dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd snd_intel8x0 
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc 
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e100 mii floppy ext3 
jbd
Dec 27 21:00:52 ruixj kernel: CPU:    0
Dec 27 21:00:52 ruixj kernel: EIP:    0060:[<e00326e0>]    Not tainted VLI
Dec 27 21:00:52 ruixj kernel: EFLAGS: 00010202   (2.6.9-5.EL) 
Dec 27 21:00:52 ruixj kernel: EIP is at journal_start+0x21/0x9e [jbd]
Dec 27 21:00:52 ruixj kernel: eax: ffffffe2   ebx: 000000b0   ecx: df57a400   
edx: 0000005d
Dec 27 21:00:52 ruixj kernel: esi: df762800   edi: c5ed5000   ebp: 000081b6   
esp: c5ed5edc
Dec 27 21:00:52 ruixj kernel: ds: 007b   es: 007b   ss: 0068
Dec 27 21:00:52 ruixj kernel: Process edldapd (pid: 17944, threadinfo=c5ed5000 
task=c842c700)
Dec 27 21:00:52 ruixj kernel: Stack: e011d120 ceb101a4 ceb101a4 e01098e9 
ddb85ac8 00000000 e011d120 ceb101a4 
Dec 27 21:00:52 ruixj kernel:        ceb101a4 000081b6 c0172887 c5ed5f58 
ddb85ac8 ddb85ac8 ceb101a4 ceb82e10 
Dec 27 21:00:52 ruixj kernel:        c5ed5f58 c0172c58 c5ed5f58 00000000 
00000000 00000006 000001b6 00008243 
Dec 27 21:00:52 ruixj kernel: Call Trace:
Dec 27 21:00:52 ruixj kernel:  [<e01098e9>] ext3_create+0x25/0xb3 [ext3]
Dec 27 21:00:52 ruixj kernel:  [<c0172887>] vfs_create+0xb8/0xef
Dec 27 21:00:52 ruixj kernel:  [<c0172c58>] open_namei+0x181/0x57e
Dec 27 21:00:52 ruixj kernel:  [<c0161412>] filp_open+0x23/0x3c
Dec 27 21:00:52 ruixj kernel:  [<c03003b2>] __cond_resched+0x14/0x3b
Dec 27 21:00:52 ruixj kernel:  [<c01d8e46>] direct_strncpy_from_user+0x3e/0x5d
Dec 27 21:00:52 ruixj kernel:  [<c01618e9>] sys_open+0x31/0x7d
Dec 27 21:00:52 ruixj kernel:  [<c0301bfb>] syscall_call+0x7/0xb
Dec 27 21:00:52 ruixj kernel: Code: 42 10 89 42 14 5b 89 f8 5f c3 57 bf 00 f0 
ff ff 56 89 c6 53 21 e7 8b 07 85 f6 8b 98 a8 05 00 00 b8 e2 ff ff ff 74 7d 85 
db 74 34 <8b> 03 39 30 74 29 68 d0 ce 03 e0 68 12 01 00 00 68 ae cd 03 e0 



Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux AS 4 2.6.9-5.EL #1

How reproducible:
Didn't try

Steps to Reproduce:
1. Doing a long time and high load test for an our server program.
2. After about 20 hours running, our server program was down and 
   the above messages were outputed in /var/log/messages.
3.


Actual Results:  Our server program was down and the following messages were outputed in /var/log/messages.

Dec 27 21:00:52 ruixj kernel: Unable to handle kernel NULL pointer dereference 
at virtual address 000000b0
Dec 27 21:00:52 ruixj kernel:  printing eip:
Dec 27 21:00:52 ruixj kernel: e00326e0
Dec 27 21:00:52 ruixj kernel: *pde = 04e84067
Dec 27 21:00:52 ruixj kernel: Oops: 0000 [#1]
Dec 27 21:00:52 ruixj kernel: Modules linked in: i915 parport_pc lp parport 
autofs4 sunrpc dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd snd_intel8x0 
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc 
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e100 mii floppy ext3 
jbd
Dec 27 21:00:52 ruixj kernel: CPU:    0
Dec 27 21:00:52 ruixj kernel: EIP:    0060:[<e00326e0>]    Not tainted VLI
Dec 27 21:00:52 ruixj kernel: EFLAGS: 00010202   (2.6.9-5.EL) 
Dec 27 21:00:52 ruixj kernel: EIP is at journal_start+0x21/0x9e [jbd]
Dec 27 21:00:52 ruixj kernel: eax: ffffffe2   ebx: 000000b0   ecx: df57a400   
edx: 0000005d
Dec 27 21:00:52 ruixj kernel: esi: df762800   edi: c5ed5000   ebp: 000081b6   
esp: c5ed5edc
Dec 27 21:00:52 ruixj kernel: ds: 007b   es: 007b   ss: 0068
Dec 27 21:00:52 ruixj kernel: Process edldapd (pid: 17944, threadinfo=c5ed5000 
task=c842c700)
Dec 27 21:00:52 ruixj kernel: Stack: e011d120 ceb101a4 ceb101a4 e01098e9 
ddb85ac8 00000000 e011d120 ceb101a4 
Dec 27 21:00:52 ruixj kernel:        ceb101a4 000081b6 c0172887 c5ed5f58 
ddb85ac8 ddb85ac8 ceb101a4 ceb82e10 
Dec 27 21:00:52 ruixj kernel:        c5ed5f58 c0172c58 c5ed5f58 00000000 
00000000 00000006 000001b6 00008243 
Dec 27 21:00:52 ruixj kernel: Call Trace:
Dec 27 21:00:52 ruixj kernel:  [<e01098e9>] ext3_create+0x25/0xb3 [ext3]
Dec 27 21:00:52 ruixj kernel:  [<c0172887>] vfs_create+0xb8/0xef
Dec 27 21:00:52 ruixj kernel:  [<c0172c58>] open_namei+0x181/0x57e
Dec 27 21:00:52 ruixj kernel:  [<c0161412>] filp_open+0x23/0x3c
Dec 27 21:00:52 ruixj kernel:  [<c03003b2>] __cond_resched+0x14/0x3b
Dec 27 21:00:52 ruixj kernel:  [<c01d8e46>] direct_strncpy_from_user+0x3e/0x5d
Dec 27 21:00:52 ruixj kernel:  [<c01618e9>] sys_open+0x31/0x7d
Dec 27 21:00:52 ruixj kernel:  [<c0301bfb>] syscall_call+0x7/0xb
Dec 27 21:00:52 ruixj kernel: Code: 42 10 89 42 14 5b 89 f8 5f c3 57 bf 00 f0 
ff ff 56 89 c6 53 21 e7 8b 07 85 f6 8b 98 a8 05 00 00 b8 e2 ff ff ff 74 7d 85 
db 74 34 <8b> 03 39 30 74 29 68 d0 ce 03 e0 68 12 01 00 00 68 ae cd 03 e0 



Expected Results:  there is no error.

Additional info:

Comment 1 Eric Sandeen 2006-10-26 17:22:16 UTC
000006bf <journal_start>:
     6bf:       57                      push   %edi
     6c0:       bf 00 f0 ff ff          mov    $0xfffff000,%edi
     6c5:       56                      push   %esi
     6c6:       89 c6                   mov    %eax,%esi
     6c8:       53                      push   %ebx
     6c9:       21 e7                   and    %esp,%edi
     6cb:       8b 07                   mov    (%edi),%eax
     6cd:       85 f6                   test   %esi,%esi          if (!journal)
     6cf:       8b 98 a8 05 00 00       mov    0x5a8(%eax),%ebx  
journal_current_handle() ??
     6d5:       b8 e2 ff ff ff          mov    $0xffffffe2,%eax         -EROFS (-30)
     6da:       74 7d                   je     759 <journal_start+0x9a> if no
journal return -EROFS
     6dc:       85 db                   test   %ebx,%ebx                if
(handle) BUT handle/%ebx is 0xb0?!
     6de:       74 34                   je     714 <journal_start+0x55> if no
handle jump to new_handle
     6e0:       8b 03                   mov    (%ebx),%eax      <-- died here
(try to use %ebx/handle)

Unable to handle kernel NULL pointer dereference at virtual address 000000b0
kernel: eax: ffffffe2   ebx: 000000b0   ecx: df57a400   edx: 0000005d
kernel: esi: df762800   edi: c5ed5000   ebp: 000081b6   esp: c5ed5edc
kernel: ds: 007b   es: 007b   ss: 0068

it looks like current->journal_info is corrupt, which could be due to any number
of reasons, all impossible to tell from the info here, I'm afraid.  Has this
been seen since?

Comment 2 Ric Wheeler 2010-03-16 18:47:13 UTC
Please reopen if you still have this issue.