Bug 223374

Summary: LargeSMP Kernel Tainted w/ 16-Cores
Product: Red Hat Enterprise Linux 4 Reporter: James Sodini <jsodini>
Component: kernelAssignee: Jason Baron <jbaron>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: urgent Docs Contact:
Priority: medium    
Version: 4.4CC: iboverma, jbaron, knoel, rlandry
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: http://www.fabric7.com/products_q80.php
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-29 22:45:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James Sodini 2007-01-18 23:36:16 UTC
Description of problem:
Booting RHEL4u3 or RHEL4u4 with eight-sockets of dual-core processes (16 cores
total) shows a tainted kernel.


Version-Release number of selected component (if applicable):
Standard installation kernel for both U3 (2.6.9-34.ELlargesmp) & U4
(2.6.9-42.ELlargesmp)

How reproducible:
100% 8-Socket
0% < 8-socket

Steps to Reproduce:
1. Perform fresh installation onto 8-socket
2. Boot & login to system
3. cat /proc/sys/kernel/tainted
  
Actual results:
[root@localhost ~]# cat /proc/sys/kernel/tainted
16

Expected results:
[root@localhost ~]# cat /proc/sys/kernel/tainted
0

Additional info:
This is preventing us from passing Red Hat Hardware Certification. This is in
regards to the Fabric7 Q80 which is linked above. We have a machine at Red Hat
for debugging this very problem.

Comment 2 Jason Baron 2007-01-29 19:20:02 UTC
hi James....any input on the cause of this is welcome :) anyways we apparently
have one of these boxes in our lab...however, i'm not able to login to the box.
We were able to connect to the management port (via serial), but i couldn't
figure out how to get to the main system console. Can you please advise. thanks.

Comment 3 Jason Baron 2007-01-29 19:44:47 UTC
apparently the mce records (an mce is causing taint to be set) are posted in
/sys/class/misc/mcelog. Do we have the contents of this directory from a system
exhibiting this problem? 

Comment 4 James Sodini 2007-01-29 21:47:40 UTC
Bug has been tracked down to bad memory. This was extremely difficult to isolate
because it could only be seen with all eight-sockets with the LargeSMP kernel.

Thank you for your effort!

Concerning using the Q80 (which will be probable in the future for support
issues), there should be a printed copy of the usage manual. If not please send
me an email with instructions on where to send a PDF.