Bug 572533

Summary: [abrt] informational WARN_ON in mtrr generates backtrace
Product: Red Hat Enterprise Linux 6 Reporter: Brock Organ <borgan>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: arozansk, esandeen, james.leddy
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: abrt_hash:62439878
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-06 13:56:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brock Organ 2010-03-11 13:43:31 UTC
abrt 1.0.7 detected a crash.

architecture: x86_64
cmdline: not_applicable
comment: N/A
component: kernel
executable: kernel
kernel: 2.6.32-17.el6.x86_64
package: kernel
release: Red Hat Enterprise Linux release 6.0 Beta (Santiago)
How to reproduce: not sure how to reproduce the issue

kerneloops
-----
------------[ cut here ]------------
WARNING: at arch/x86/kernel/cpu/mtrr/generic.c:467 generic_get_mtrr+0x11e/0x140() (Not tainted)
Hardware name: 649325U
mtrr: your BIOS has set up an incorrect mask, fixing it up.
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.32-17.el6.x86_64 #1
Call Trace:
[<ffffffff81067eb3>] warn_slowpath_common+0x83/0xc0
[<ffffffff81067f51>] warn_slowpath_fmt+0x41/0x50
[<ffffffff81028f9e>] generic_get_mtrr+0x11e/0x140
[<ffffffff819c2028>] mtrr_cleanup+0x8c/0x405
[<ffffffff819c0e77>] ? get_mtrr_state+0x2ec/0x2fb
[<ffffffff819c09bb>] mtrr_bp_init+0x1ab/0x1d2
[<ffffffff819bcbbb>] setup_arch+0x471/0xa52
[<ffffffff8152b2d4>] ? printk+0x41/0x45
[<ffffffff819b7b7b>] start_kernel+0xd0/0x401
[<ffffffff819b733a>] x86_64_start_reservations+0x125/0x129
[<ffffffff819b7438>] x86_64_start_kernel+0xfa/0x109

Comment 2 RHEL Program Management 2010-03-11 14:44:59 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Eric Sandeen 2010-03-15 18:53:44 UTC
It's not a crash, it's the kenel fixing up a bios problem:

> mtrr: your BIOS has set up an incorrect mask, fixing it up.

Comment 4 Brock Organ 2010-03-16 13:18:11 UTC
Hi Eric, if the error is informational, do you think it is still proper behaviour to oops?  Can the detection just be set to notify, or is there something else going on here that warrants the kernel to behave this way? (I'm trying to figure out if this is really a bug, and if not, how to keep the oops from triggering abrt so that many other users don't report a similar problem)

Comment 5 Eric Sandeen 2010-03-16 14:05:45 UTC
As far as I can tell, it did not oops.  What makes you say it was an oops?

(backtrace != oops ... the kernel can dump_stack() for "how'd we get here" information any time it pleases)

Comment 6 Brock Organ 2010-03-16 18:24:34 UTC
I'm referring to the event that triggered abrt ... if the error is only informational, then what changes make sense to keep this error from being repeatedly reported ...?

Comment 7 Eric Sandeen 2010-03-16 19:37:19 UTC
I'm not sure ... I don't know what heuristics abrt uses.  But AFAIK this is just an informational message.

Well, appears that we're not the only one to notice:

commit 942fa3b63eb525aa0512ba28c42e656d8efc6787
Author: Alan Cox <alan.com>
Date:   Mon Feb 8 10:03:17 2010 +0000

    x86, mtrr: Kill over the top warn
    
    Fixes bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12558
    Fixes bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12317
    
    (and if this really needed to be a warn you'd be responding to the bugs left
    in bugzilla from it...)
    
    Signed-off-by: Alan Cox <alan.com>
    LKML-Reference: <20100208100239.2568.2940.stgit>
    Signed-off-by: H. Peter Anvin <hpa>


and from that bug:

"It's not an oops - it's just a noisy warning.  The kernel is boasting that
your bios is busted, and we fixed it up.

That warning should be toned down a bit - it just misleads people."

So I guess perhaps we should look at pulling back that fix/change.

-Eric

Comment 8 Prarit Bhargava 2010-03-16 20:04:40 UTC
Brock, Eric is 100% correct here.  A trace != oops/panic.

If abrt is only supposed to report panics or oopses it should differentiate between BUG() warnings and panics.  I'm not sure if it has the smarts to do so ...

Having said that -- what HW was this seen on?

P.

Comment 9 Brock Organ 2010-03-16 20:25:32 UTC
(In reply to comment #7)
> I'm not sure ... I don't know what heuristics abrt uses.  But AFAIK this is
> just an informational message.

Ok, I'm not sure either ... I just don't want a flood of these informational messages cluttering the bug lists ... I'll have to check with the abrt folks to see what they are looking for ...

Comment 10 Brock Organ 2010-03-16 20:34:06 UTC
(In reply to comment #8)
> Brock, Eric is 100% correct here.  A trace != oops/panic.
> 
> If abrt is only supposed to report panics or oopses it should differentiate
> between BUG() warnings and panics.  I'm not sure if it has the smarts to do so
> ...
> 
> Having said that -- what HW was this seen on?

Hi Prarit,

a Lenovo Thinkstation D10 649325U Desktop Server ...

Regards,

Brock

Comment 11 Prarit Bhargava 2010-04-06 13:56:10 UTC
I took a look at the code, and this is definitely not a bug.  The message is a valid warning about the BIOS on the system.  The BIOS has set an incorrect MTRR Physical Mask value, and the OS has detected the problem.

Bottom line -- if we see this in the field it would let customers and partners know that there was something wrong with their BIOSes.

P.

Comment 12 Eric Sandeen 2010-04-30 16:28:55 UTC
Prarit, maybe we should change this to not WARN_ON, and just printk, as suggested in bug 579563 ?

Backtraces scare people (and abrt) and it's not useful in this case...

-Eric

Comment 13 James M. Leddy 2010-07-13 17:08:53 UTC
(In reply to comment #12)
> Prarit, maybe we should change this to not WARN_ON, and just printk, as
> suggested in bug 579563 ?
> 
> Backtraces scare people (and abrt) and it's not useful in this case...
> 
> -Eric    

This makes sense, we have a couple of bugs of this, perhaps we should mark them all dups of bug 579563.

bug 614021
bug 607246
bug 579563