Bug 245114

Summary: Bug in radix_tree_insert (probably hardware)
Product: [Fedora] Fedora Reporter: Hartmut Horrer <hh>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 6CC: jonstanley
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-08 04:28:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 427887    

Description Hartmut Horrer 2007-06-21 08:42:24 UTC
Description of problem:
Server crashes sporadic

Version-Release number of selected component (if applicable):
[root@koala ~]# uname -r
2.6.20-1.2952.fc6
[root@koala ~]# uname -a
Linux koala.horrynet.de 2.6.20-1.2952.fc6 #1 SMP Wed May 16 18:18:22 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux

Dell PowerEdge 1800

How reproducible:


Steps to Reproduce:
1. seems to be a sporadic error

  
Actual results:
contents of /var/log/messages:

Jun 21 09:46:34 koala kernel: Unable to handle kernel paging request at
ffff810034186688 RIP: 
Jun 21 09:46:34 koala kernel:  [<ffffffff8033fb9a>] radix_tree_insert+0x10e/0x18c
Jun 21 09:46:34 koala kernel: PGD 8063 PUD 9063 PMD 80000000340001e3 PTE
700a0d62696c2f70
Jun 21 09:46:34 koala kernel: Oops: 0000 [1] SMP 
Jun 21 09:46:34 koala kernel: last sysfs file:
/devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:07.0/irq
Jun 21 09:46:34 koala kernel: CPU 3 
Jun 21 09:46:34 koala kernel: Modules linked in: nfsd exportfs lockd nfs_acl
autofs4 hidp rfcomm l2cap bluetooth sunrp
c dm_mirror dm_multipath dm_mod video sbs i2c_ec i2c_core dock button battery
asus_acpi backlight ac radeon drm ipv6 l
p sg floppy e1000 e752x_edac serio_raw ide_cd iTCO_wdt edac_mc pcspkr parport_pc
parport iTCO_vendor_support cdrom ata
_piix libata mptspi mptscsih scsi_transport_spi mptbase shpchp aacraid sd_mod
scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
_hcd
Jun 21 09:46:34 koala kernel: Pid: 2508, comm: smbd Not tainted 2.6.20-1.2952.fc6 #1
Jun 21 09:46:34 koala kernel: RIP: 0010:[<ffffffff8033fb9a>] 
[<ffffffff8033fb9a>] radix_tree_insert+0x10e/0x18c
Jun 21 09:46:34 koala kernel: RSP: 0018:ffff810040595b28  EFLAGS: 00010002
Jun 21 09:46:34 koala kernel: RAX: 2000000000000032 RBX: ffff8100cd6c9a28 RCX:
000000000003280c
Jun 21 09:46:34 koala kernel: RDX: ffff8100341864e0 RSI: 000000000003282e RDI:
ffff8100cd6c9a28
Jun 21 09:46:34 koala kernel: RBP: ffff8100341864e0 R08: ffff81011fd19a90 R09:
0000000000000d73
Jun 21 09:46:34 koala kernel: R10: 0000000000000000 R11: 0000000000000002 R12:
0000000000000032
Jun 21 09:46:34 koala kernel: R13: 0000000000000002 R14: 0000000000000006 R15:
ffff810002385d00
Jun 21 09:46:34 koala kernel: FS:  00002aaaae93edc0(0000)
GS:ffff81011fd19940(0000) knlGS:0000000000000000
Jun 21 09:46:34 koala kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 21 09:46:34 koala kernel: CR2: ffff810034186688 CR3: 0000000041807000 CR4:
00000000000006e0
Jun 21 09:46:34 koala kernel: Process smbd (pid: 2508, threadinfo
ffff810040594000, task ffff81010fd05040)
Jun 21 09:46:34 koala kernel: Stack:  000000000003282e ffff810002385d00
0000000000000000 ffff8100cd6c9a20
Jun 21 09:46:34 koala kernel:  000000000003282e 0000555555b7ea64
ffff810040595ee8 ffffffff8020c353
Jun 21 09:46:34 koala kernel:  0000000000000000 000000000003282e
0000000000000000 0000000000001000
Jun 21 09:46:34 koala kernel: Call Trace:
Jun 21 09:46:34 koala kernel:  [<ffffffff8020c353>] add_to_page_cache+0x3d/0x89
Jun 21 09:46:34 koala kernel:  [<ffffffff8020fcfc>]
generic_file_buffered_write+0x1c4/0x6fd
Jun 21 09:46:34 koala kernel:  [<ffffffff8021606f>]
__generic_file_aio_write_nolock+0x378/0x3eb
Jun 21 09:46:34 koala kernel:  [<ffffffff80260b4d>] lock_kernel+0x2c/0x48
Jun 21 09:46:34 koala kernel:  [<ffffffff80221559>] generic_file_aio_write+0x61/0xc1
Jun 21 09:46:34 koala kernel:  [<ffffffff8803718e>] :ext3:ext3_file_write+0x16/0x94
Jun 21 09:46:34 koala kernel:  [<ffffffff80217bae>] do_sync_write+0xc9/0x10c
Jun 21 09:46:34 koala kernel:  [<ffffffff802609ca>] _write_unlock_irq+0x9/0xc
Jun 21 09:46:34 koala kernel:  [<ffffffff80297bb4>]
autoremove_wake_function+0x0/0x2e
Jun 21 09:46:34 koala kernel:  [<ffffffff8023894c>] fcntl_setlk+0x232/0x25f
Jun 21 09:46:34 koala kernel:  [<ffffffff8021649f>] vfs_write+0xce/0x177
Jun 21 09:46:34 koala kernel:  [<ffffffff80240e61>] sys_pwrite64+0x50/0x70
Jun 21 09:46:34 koala kernel:  [<ffffffff8022dc8b>] sys_fcntl+0x2da/0x2e6
Jun 21 09:46:34 koala kernel:  [<ffffffff8025a11e>] system_call+0x7e/0x83
Jun 21 09:46:34 koala kernel: 
Jun 21 09:46:34 koala kernel: 
Jun 21 09:46:34 koala kernel: Code: 48 8b 54 c2 18 45 85 ed 75 a6 48 85 d2 b8 ef
ff ff ff 75 5e 
Jun 21 09:46:34 koala kernel: RIP  [<ffffffff8033fb9a>]
radix_tree_insert+0x10e/0x18c
Jun 21 09:46:35 koala kernel:  RSP <ffff810040595b28>
Jun 21 09:46:35 koala kernel: CR2: ffff810034186688
Jun 21 09:46:35 koala kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Jun 21 09:46:35 koala kernel: in_atomic():0, irqs_disabled():1
Jun 21 09:46:35 koala kernel: 
Jun 21 09:46:35 koala kernel: Call Trace:
Jun 21 09:46:35 koala kernel:  [<ffffffff80299d90>] down_read+0x15/0x23
Jun 21 09:46:35 koala kernel:  [<ffffffff802a6065>] acct_collect+0x42/0x18e
Jun 21 09:46:35 koala kernel:  [<ffffffff802151ea>] do_exit+0x20b/0x832
Jun 21 09:46:35 koala kernel:  [<ffffffff80262db8>] do_page_fault+0x74f/0x7ca
Jun 21 09:46:35 koala kernel:  [<ffffffff88021c3a>]
:jbd:do_get_write_access+0x4d5/0x507
Jun 21 09:46:35 koala kernel:  [<ffffffff80260eed>] error_exit+0x0/0x84
Jun 21 09:46:35 koala kernel:  [<ffffffff8033fb9a>] radix_tree_insert+0x10e/0x18c
Jun 21 09:46:35 koala kernel:  [<ffffffff8020c353>] add_to_page_cache+0x3d/0x89
Jun 21 09:46:35 koala kernel:  [<ffffffff8020fcfc>]
generic_file_buffered_write+0x1c4/0x6fd
Jun 21 09:46:35 koala kernel:  [<ffffffff8021606f>]
__generic_file_aio_write_nolock+0x378/0x3eb
Jun 21 09:46:35 koala kernel:  [<ffffffff80260b4d>] lock_kernel+0x2c/0x48
Jun 21 09:46:35 koala kernel:  [<ffffffff80221559>] generic_file_aio_write+0x61/0xc1
Jun 21 09:46:35 koala kernel:  [<ffffffff8803718e>] :ext3:ext3_file_write+0x16/0x94
Jun 21 09:46:35 koala kernel:  [<ffffffff80217bae>] do_sync_write+0xc9/0x10c
Jun 21 09:46:35 koala kernel:  [<ffffffff802609ca>] _write_unlock_irq+0x9/0xc
Jun 21 09:46:35 koala kernel:  [<ffffffff80297bb4>]
autoremove_wake_function+0x0/0x2e
Jun 21 09:46:35 koala kernel:  [<ffffffff8023894c>] fcntl_setlk+0x232/0x25f
Jun 21 09:46:35 koala kernel:  [<ffffffff8021649f>] vfs_write+0xce/0x177
Jun 21 09:46:35 koala kernel:  [<ffffffff80240e61>] sys_pwrite64+0x50/0x70
Jun 21 09:46:35 koala kernel:  [<ffffffff8022dc8b>] sys_fcntl+0x2da/0x2e6
Jun 21 09:46:35 koala kernel:  [<ffffffff8025a11e>] system_call+0x7e/0x83
Jun 21 09:46:35 koala kernel: 
Jun 21 09:52:39 koala syslogd 1.4.1: restart.

Comment 1 Chuck Ebbert 2007-06-22 19:10:59 UTC
49 63 c4                movslq %r12d,%rax
48 8b 54 c2 18          mov    0x18(%rdx,%rax,8),%rdx

[objdump is broken, opcode 49 63 is actually movsxq]

r12 == 0000000000000032
rax == 2000000000000032

This looks like a broken CPU to me: the bottom 32 bits of r12 were moved with
sign extension to rax but one bit is now wrong in rax.

Comment 2 Hartmut Horrer 2007-06-23 06:28:20 UTC
Yesterday i ran the full Dell hardware test for PE 1800 (i executed it in
extended mode) and all was ok, no errors.
Is there another tool I can test the CPUs with? Do you think that the Dell Test
Tools might not be able to detect such an error?

[root@koala proc]# cat cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5989.22
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5985.05
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5985.12
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5985.05
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

[root@koala proc]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5989.22
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5985.05
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5985.12
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2992.495
cache size      : 2048 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5985.05
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:


Comment 3 Chuck Ebbert 2007-06-25 18:23:11 UTC
memtest86 is a good start for testing. You can run it by booting the Fedora
install CD/DVD and selecting it.

Also:
http://www.ibm.com/developerworks/library/l-hw1/

Many people swear by this one but it could overheat the system:
http://pages.sbcglobal.net/redelm/

Comment 4 Jon Stanley 2008-01-08 01:55:12 UTC
(This is a mass-update to all current FC6 kernel bugs in NEW state)

Hello,

I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 5 Jon Stanley 2008-02-08 04:28:12 UTC
Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA,
since no information has been lodged for over 30 days.

Please re-open this bug or file a new one if you can provide the requested data,
and thanks for filing the original report!