Bug 847591 - mcelog: incorrectly reporting CPU model
mcelog: incorrectly reporting CPU model
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: mcelog (Show other bugs)
17
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Prarit Bhargava
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-13 01:00 EDT by scott-brown
Modified: 2012-08-13 17:00 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-13 10:24:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description scott-brown 2012-08-13 01:00:57 EDT
Description of problem:

1. mcelog Reports i3-550 processor as Family 6 Model 25 (it is Family 6 Model 37)
2. mcelog Reports i3-550 processor as "unsupported new" - this CPU is 2 architectures old now... its not new by any stretch.

Version-Release number of selected component (if applicable):

Installed Packages
Name        : mcelog
Arch        : x86_64
Epoch       : 2
Version     : 1.0
Release     : 0.4.6e4e2a00.fc17
Size        : 112 k
Repo        : installed
From repo   : anaconda-0


How reproducible:

Start system - errors show in /var/log/messages

Steps to Reproduce:
1. Start system
2. Wait for errors to show
3.
  
Actual results:

mcelog: Unsupported new Family 6 Model 25 CPU: only decoding architectural errors

Expected results:

no error messages, no inappropriate behaviours basedon incorrect Model decoding

Additional info:

mcelog is able to determine appropriate model - it shows it a few lines further down...

Aug 13 00:16:33 nas01 mcelog[818]: mcelog: Unsupported new Family 6 Model 25 CPU: only decoding architectural errors
Aug 13 00:16:33 nas01 mcelog[818]: Hardware event. This is not a software error.
Aug 13 00:16:33 nas01 mcelog[818]: MCE 5
Aug 13 00:16:33 nas01 mcelog[818]: CPU 1 THERMAL EVENT TSC 130c934b92c58
Aug 13 00:16:33 nas01 mcelog[818]: TIME 1344831382 Mon Aug 13 00:16:22 2012
Aug 13 00:16:33 nas01 mcelog[818]: Processor 1 heated above trip temperature. Throttling enabled.
Aug 13 00:16:33 nas01 mcelog[818]: Please check your system cooling. Performance will be impacted
Aug 13 00:16:33 nas01 mcelog[818]: STATUS 880003c3 MCGSTATUS 0
Aug 13 00:16:33 nas01 mcelog[818]: MCGCAP c09 APICID 4 SOCKETID 0
Aug 13 00:16:33 nas01 mcelog[818]: CPUID Vendor Intel Family 6 Model 37
Comment 1 Prarit Bhargava 2012-08-13 10:20:00 EDT
37 = 0x25

I'll double check that upstream mcelog does not support this processor.

P.
Comment 2 Prarit Bhargava 2012-08-13 10:24:54 EDT
Current mcelog still errors out on

                if (model > 0x1a) {
                        Eprintf("Family 6 Model %x CPU: only decoding architectural errors\n",
                                model);
                        return CPU_INTEL;
                }

Intel has not submitted code to resolve non-arch errors on Family 6.

CLOSED/CANTFIX.

P.
Comment 3 scott-brown 2012-08-13 17:00:44 EDT
(In reply to comment #1)
> 37 = 0x25
> 

You could remove confusion by reporting both Family and model with the same numeric basis... 

                if (model > 0x1a) {
                        Eprintf("Family 6 Model %d CPU: only decoding architectural errors\n",
                                model);
                        return CPU_INTEL;
                }

or

                if (model > 0x1a) {
                        Eprintf("Family 0x06 Model 0x%x CPU: only decoding architectural errors\n",
                                model);
                        return CPU_INTEL;
                }

Consistency is always preferred... or be explicit. Just a thought for those of us digging through log files in the wee hours of the morning.

Note You need to log in before you can comment on or make changes to this bug.