Bug 161823 - Kernel Oops while using CAE application medina (T-Systems)
Summary: Kernel Oops while using CAE application medina (T-Systems)
Keywords:
Status: CLOSED DUPLICATE of bug 73733
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-27 16:08 UTC by Udo Seidel
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-07-07 20:22:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sysreport after the reboot (273.02 KB, application/x-bzip)
2005-06-27 16:08 UTC, Udo Seidel
no flags Details

Description Udo Seidel 2005-06-27 16:08:18 UTC
Description of problem:
While postprocessing with the CAE application Medina the machine stop to work
with a kernel Oops on Xeon EM64T. The only way to get the machine working again
is to hard reset the machine.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.EL

How reproducible:
Every time on Xeon EM64T systems, never on AMD Opteron systems

Steps to Reproduce:
1. start Medina postprocessor
2. load protocol file



  
Actual results:


Expected results:


Additional info:

Comment 1 Udo Seidel 2005-06-27 16:08:19 UTC
Created attachment 116024 [details]
sysreport after the reboot

Comment 2 Udo Seidel 2005-06-27 16:09:21 UTC
The binary Nvidia driver for the graphich adapter is loaded.


Here is a output from the kernel Oops


Jun 27 17:33:25 ibm1 kernel: medpost74: Corrupted page table at address 2aad17e000
Jun 27 17:33:25 ibm1 kernel: PML4 203c71067 PGD 203c80067 PMD 1f2f78067 PTE
7ffffe000000002f
Jun 27 17:33:25 ibm1 kernel: Bad pagetable: 000f [1] SMP 
Jun 27 17:33:25 ibm1 kernel: CPU 2 
Jun 27 17:33:25 ibm1 kernel: Modules linked in: parport_pc lp parport autofs4
i2c_dev i2c_core nfs lockd sunrpc ds yenta_so
cket pcmcia_core button battery ac nvidia(U) md5 ipv6 uhci_hcd ehci_hcd
snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_o
ss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device
snd soundcore tg3 dm_snapshot dm_zero dm_mir
ror ext3 jbd dm_mod aic79xx sd_mod scsi_mod
Jun 27 17:33:25 ibm1 kernel: Pid: 4131, comm: medpost74 Tainted: P     
2.6.9-5.ELsmp
Jun 27 17:33:25 ibm1 kernel: RIP: 0033:[<0000002a96c17f10>] [<0000002a96c17f10>]
Jun 27 17:33:25 ibm1 kernel: RSP: 002b:0000007fbfffbfa8  EFLAGS: 00010202
Jun 27 17:33:25 ibm1 kernel: RAX: 0000000000002480 RBX: 0000000004669a60 RCX:
0000002aad17e000
Jun 27 17:33:25 ibm1 kernel: RDX: 0000000000400000 RSI: 0000000000000000 RDI:
0000002aace10000
Jun 27 17:33:25 ibm1 kernel: RBP: 0000007fbfffc060 R08: 0000000000000000 R09:
0000000000000200
Jun 27 17:33:25 ibm1 kernel: R10: 0000000000000041 R11: 0000000000000003 R12:
0000000000000000
Jun 27 17:33:25 ibm1 kernel: R13: 0000000000400000 R14: 0000000000002010 R15:
0000000004669a70
Jun 27 17:33:25 ibm1 kernel: FS:  0000002a98009140(0000)
GS:ffffffff804bf400(0000) knlGS:0000000000000000
Jun 27 17:33:25 ibm1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 27 17:33:25 ibm1 kernel: CR2: 0000002aad17e000 CR3: 0000000037e2e000 CR4:
00000000000006e0
Jun 27 17:33:25 ibm1 kernel: Process medpost74 (pid: 4131, threadinfo
0000010203438000, task 00000102038a2030)
Jun 27 17:33:25 ibm1 kernel: 
Jun 27 17:33:25 ibm1 kernel: RIP [<0000002a96c17f10>] RSP <0000007fbfffbfa8>
Jun 27 17:33:25 ibm1 kernel:  <1>Unable to handle kernel paging request at
000000fe0e63e3b0 RIP: 
Jun 27 17:33:25 ibm1 kernel: <ffffffff80120224>{unmap_single+50}
Jun 27 17:33:25 ibm1 kernel: PML4 0 
Jun 27 17:33:25 ibm1 kernel: Oops: 0000 [2] SMP 
Jun 27 17:33:25 ibm1 kernel: CPU 2 
Jun 27 17:33:25 ibm1 kernel: Modules linked in: parport_pc lp parport autofs4
i2c_dev i2c_core nfs lockd sunrpc ds yenta_so
cket pcmcia_core button battery ac nvidia(U) md5 ipv6 uhci_hcd ehci_hcd
snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_o
ss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device
snd soundcore tg3 dm_snapshot dm_zero dm_mir
ror ext3 jbd dm_mod aic79xx sd_mod scsi_mod
Jun 27 17:33:25 ibm1 kernel: Pid: 4131, comm: medpost74 Tainted: P     
2.6.9-5.ELsmp
Jun 27 17:33:25 ibm1 kernel: RIP: 0010:[<ffffffff80120224>]
<ffffffff80120224>{unmap_single+50}
Jun 27 17:33:25 ibm1 kernel: RSP: 0000:0000010203439ac8  EFLAGS: 00010293
Jun 27 17:33:25 ibm1 kernel: RAX: 000001000e6e5000 RBX: 001fffffbffeb276 RCX:
0000000000000000
Jun 27 17:33:25 ibm1 kernel: RDX: ffffffffbffeb276 RSI: ffffff0000000000 RDI:
000001023fe90f30
Jun 27 17:33:25 ibm1 kernel: RBP: 0000000000000002 R08: 0000000000001000 R09:
00000101f2ee4000
Jun 27 17:33:25 ibm1 kernel: R10: 0000010000000000 R11: 0000000000000246 R12:
0000000000000000
Jun 27 17:33:25 ibm1 kernel: R13: 000001023fe90f30 R14: 000001023fe90ec0 R15:
0000010000000000
Jun 27 17:33:25 ibm1 kernel: FS:  0000000000000000(0000)
GS:ffffffff804bf400(0000) knlGS:0000000000000000
Jun 27 17:33:25 ibm1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 27 17:33:25 ibm1 kernel: CR2: 000000fe0e63e3b0 CR3: 0000000037e2e000 CR4:
00000000000006e0
Jun 27 17:33:25 ibm1 kernel: Process medpost74 (pid: 4131, threadinfo
0000010203438000, task 00000102038a2030)
Jun 27 17:33:26 ibm1 kernel: Stack: 0000000000000001 00000101f3371968
0000000000000001 ffffffff80120943 
Jun 27 17:33:26 ibm1 kernel:        0000000000000000 000001023e710fc0
00000101f3371950 00000101f3371950 
Jun 27 17:33:26 ibm1 kernel:        000000000000036e ffffffffa03d5432 
Jun 27 17:33:26 ibm1 kernel: Call Trace:<ffffffff80120943>{swiotlb_unmap_sg+191}
<ffffffffa03d5432>{:nvidia:nv_vm_free_page
s+283} 
Jun 27 17:33:26 ibm1 kernel:       
<ffffffffa03d3410>{:nvidia:nv_free_pages+734}
<ffffffffa01ca083>{:nvidia:_nv001716rm+89
} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa01b2acc>{:nvidia:_nv001228rm+150}
<ffffffffa01b1aea>{:nvidia:_nv001241rm+154}
 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa01b1846>{:nvidia:_nv001246rm+60}
<ffffffffa02e5713>{:nvidia:_nv004331rm+33} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa02e4eaf>{:nvidia:_nv004179rm+121}
<ffffffffa01cf9d2>{:nvidia:_nv001226rm+96} 
Jun 27 17:33:26 ibm1 kernel:       
<ffffffffa01d0d50>{:nvidia:rm_free_unused_clients+128} 
Jun 27 17:33:26 ibm1 kernel:       
<ffffffffa03d1144>{:nvidia:nv_kern_ctl_close+175}
<ffffffffa03d1282>{:nvidia:nv_kern_cl
ose+252} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80172bcf>{__fput+99}
<ffffffff80171810>{filp_close+103} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80136dec>{put_files_struct+101}
<ffffffff801375b8>{do_exit+665} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80226ba4>{do_unblank_screen+97}
<ffffffff80121e04>{do_page_fault+0} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80122393>{do_page_fault+1423}
<ffffffff80165be8>{do_mmap_pgoff+1593} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff801dccf8>{__up_write+19}
<ffffffff80110a6d>{error_exit+0} 
Jun 27 17:33:26 ibm1 kernel:        
Jun 27 17:33:26 ibm1 kernel: 
Jun 27 17:33:26 ibm1 kernel: Code: 4c 8b 0c d0 0f 94 c2 85 c9 0f 94 c0 09 d0 a8
01 74 09 fc 4c Jun 27 17:33:26 ibm1 kernel: RIP
<ffffffff80120224>{unmap_single+50} RSP <0000010203439ac8>
Jun 27 17:33:26 ibm1 kernel: CR2: 000000fe0e63e3b0
Jun 27 17:33:26 ibm1 kernel:  <1>Unable to handle kernel NULL pointer
dereference at 0000000000000048 RIP: 
Jun 27 17:33:26 ibm1 kernel: <ffffffff8013331d>{mm_release+70}
Jun 27 17:33:26 ibm1 kernel: PML4 203c71067 PGD 0 
Jun 27 17:33:26 ibm1 kernel: Oops: 0000 [3] SMP 
Jun 27 17:33:26 ibm1 kernel: CPU 2 
Jun 27 17:33:26 ibm1 kernel: Modules linked in: parport_pc lp parport autofs4
i2c_dev i2c_core nfs lockd sunrpc ds yenta_so
cket pcmcia_core button battery ac nvidia(U) md5 ipv6 uhci_hcd ehci_hcd
snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_o
ss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device
snd soundcore tg3 dm_snapshot dm_zero dm_mir
ror ext3 jbd dm_mod aic79xx sd_mod scsi_mod
Jun 27 17:33:26 ibm1 kernel: Pid: 4131, comm: medpost74 Tainted: P     
2.6.9-5.ELsmp
Jun 27 17:33:26 ibm1 kernel: RIP: 0010:[<ffffffff8013331d>]
<ffffffff8013331d>{mm_release+70}
Jun 27 17:33:26 ibm1 kernel: RSP: 0000:00000102034398c8  EFLAGS: 00010202
Jun 27 17:33:26 ibm1 kernel: RAX: 00000102038a2030 RBX: 00000102038a2030 RCX:
0000000000000004
Jun 27 17:33:26 ibm1 kernel: RDX: 0000000000000008 RSI: 0000000000000000 RDI:
0000002a980091d0
Jun 27 17:33:26 ibm1 kernel: RBP: 0000000000000000 R08: 000000000000000f R09:
0000000000000001
Jun 27 17:33:26 ibm1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
Jun 27 17:33:26 ibm1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
Jun 27 17:33:26 ibm1 kernel: FS:  0000000000000000(0000)
GS:ffffffff804bf400(0000) knlGS:0000000000000000
Jun 27 17:33:26 ibm1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 27 17:33:26 ibm1 kernel: CR2: 0000000000000048 CR3: 0000000037e2e000 CR4:
00000000000006e0
Jun 27 17:33:26 ibm1 kernel: Process medpost74 (pid: 4131, threadinfo
0000010203438000, task 00000102038a2030)
Jun 27 17:33:26 ibm1 kernel: Stack: 0000000000000000 0000000000000000
000000fe0e63e3b0 00000102038a2030 
Jun 27 17:33:26 ibm1 kernel:        0000000000000009 ffffffff80137467
ffffffff803c7508 0000000000000046 
Jun 27 17:33:26 ibm1 kernel:        ffffffff803bc66c ffffffffffffffef 
Jun 27 17:33:26 ibm1 kernel: Call Trace:<ffffffff80137467>{do_exit+328}
<ffffffff80111796>{oops_end+38} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80122286>{do_page_fault+1154}
<ffffffff80157130>{free_pages_bulk+682} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80157130>{free_pages_bulk+682}
<ffffffff80110a6d>{error_exit+0} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80120224>{unmap_single+50}
<ffffffff80120943>{swiotlb_unmap_sg+191} 
Jun 27 17:33:26 ibm1 kernel:       
<ffffffffa03d5432>{:nvidia:nv_vm_free_pages+283}
<ffffffffa03d3410>{:nvidia:nv_free_pag
es+734} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa01ca083>{:nvidia:_nv001716rm+89}
<ffffffffa01b2acc>{:nvidia:_nv001228rm+150} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa01b1aea>{:nvidia:_nv001241rm+154}
<ffffffffa01b1846>{:nvidia:_nv001246rm+60} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa02e5713>{:nvidia:_nv004331rm+33}
<ffffffffa02e4eaf>{:nvidia:_nv004179rm+121} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffffa01cf9d2>{:nvidia:_nv001226rm+96}
<ffffffffa01d0d50>{:nvidia:rm_free_unused_cl
ients+128} 
Jun 27 17:33:26 ibm1 kernel:       
<ffffffffa03d1144>{:nvidia:nv_kern_ctl_close+175}
<ffffffffa03d1282>{:nvidia:nv_kern_cl
ose+252} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80172bcf>{__fput+99}
<ffffffff80171810>{filp_close+103} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80136dec>{put_files_struct+101}
<ffffffff801375b8>{do_exit+665} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80226ba4>{do_unblank_screen+97}
<ffffffff80121e04>{do_page_fault+0} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff80122393>{do_page_fault+1423}
<ffffffff80165be8>{do_mmap_pgoff+1593} 
Jun 27 17:33:26 ibm1 kernel:        <ffffffff801dccf8>{__up_write+19}
<ffffffff80110a6d>{error_exit+0} 
Jun 27 17:33:26 ibm1 kernel:        
Jun 27 17:33:26 ibm1 kernel: 
Jun 27 17:33:26 ibm1 kernel: Code: 41 8b 45 48 ff c8 7e 53 48 c7 83 08 02 00 00
00 00 00 00 65 
Jun 27 17:33:26 ibm1 kernel: RIP <ffffffff8013331d>{mm_release+70} RSP
<00000102034398c8>




Comment 3 Dave Jones 2005-07-07 20:22:52 UTC
This looks to me like a bug in the NVidia driver judging from the call trace.
We've heard no other reports of page table corruption which lends more
credibility to this.

I also recommend updating to the U1 kernel which fixes a large number of bugs.

*** This bug has been marked as a duplicate of 73733 ***


Note You need to log in before you can comment on or make changes to this bug.