Bug 579563

Summary: Unwarranted WARN_ON on drivers/pci/dmar.c:616
Product: Red Hat Enterprise Linux 6 Reporter: Ben Woodard <woodard>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: cww, cye, dvlasenk, esandeen, jfeeney, jwest, kklic, mschmidt, npajkovs, prarit, tao
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: abrt_hash:269740827
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-03 13:28:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Woodard 2010-04-05 20:41:16 UTC
abrt 1.0.7 detected a crash.

architecture: x86_64
cmdline: not_applicable
component: kernel
executable: kernel
kernel: 2.6.32.10-90.fc12.x86_64
package: kernel
release: Red Hat Enterprise Linux release 6.0 Beta (Santiago)

kerneloops
-----
------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:616 check_zero_address+0x96/0x19b()
Hardware name: 4057W7N
Your BIOS is broken; DMAR reported at address zero!
BIOS vendor: LENOVO; Ver: 6EET49WW (3.09 ); Product Version: ThinkPad X301
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.32.10-90.fc12.x86_64 #1
Call Trace:
[<ffffffff81056350>] warn_slowpath_common+0x7c/0x94
[<ffffffff8145c243>] ? _etext+0x0/0x1
[<ffffffff810563bf>] warn_slowpath_fmt+0x41/0x43
[<ffffffff8183fd84>] check_zero_address+0x96/0x19b
[<ffffffff8128cc61>] ? acpi_tb_verify_table+0x57/0x5c
[<ffffffff8128c2bf>] ? acpi_get_table_with_size+0x5a/0xb4
[<ffffffff8145c243>] ? _etext+0x0/0x1
[<ffffffff8183fe9b>] detect_intel_iommu+0x12/0x8c
[<ffffffff8181eab5>] pci_iommu_alloc+0x5e/0x6c
[<ffffffff8182d644>] mem_init+0x19/0xec
[<ffffffff81817bf9>] start_kernel+0x20b/0x3ff
[<ffffffff818172c1>] x86_64_start_reservations+0xac/0xb0
[<ffffffff818173bd>] x86_64_start_kernel+0xf8/0x107

Comment 2 RHEL Program Management 2010-04-05 21:22:54 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Eric Sandeen 2010-04-06 15:43:15 UTC
This is the 2nd abrt-reported bug I've seen which is not a kernel bug but a hardware bug that we cannot fix (the other was bug #572533)

I'm sympathetic to the difficulty of determining whether a backtrace is a kernel bug or not, but filing bugzillas when the kernel reports hardware problems is not helpful - is there any way to filter these out?

Thanks,
-Eric

Comment 4 Prarit Bhargava 2010-04-06 17:46:25 UTC
Ditto here -- see 572533.

Is there any chance we could get better abrt messages in bugzilla?

P.

Comment 5 Jiri Moskovcak 2010-04-08 10:36:04 UTC
The direct reporting of kerneloops to BZ is planned only for beta1 after that it should go thru the new gss portal and these false positives will be filtered out much better. Until then we will try to make the heuristics better.

J.

Comment 6 Denys Vlasenko 2010-04-30 10:01:25 UTC
There are steady trickle of patches to kernel which replace such ill-advised WARN_ONs with more appropriate messages. Try this search:

http://www.google.com/search?q=remove+WARN_ON

Example of WARN_ON removal:

https://kerneltrap.org/mailarchive/git-commits-head/2008/12/28/4528564

Here, WARN_ON is a wrong thing to use to print "Your BIOS is broken; DMAR reported at address zero!" - it is not a kernel bug, it's kernel detecting BIOS bug. Backtrace which WARN_ON generates is not useful in fixing that BIOS bug.

I think we should reassign such bugs to kernel and change description to "Unwarranted WARN_ON on drivers/pci/dmar.c:616". Then kernel people would either downgrade it to printk or remove.

Comment 7 Nikola Pajkovsky 2010-04-30 14:06:16 UTC
*** Bug 587657 has been marked as a duplicate of this bug. ***

Comment 8 Eric Sandeen 2010-04-30 16:23:40 UTC
Ok, fair enough.

Comment 10 RHEL Program Management 2010-07-15 14:42:03 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 13 Chao Ye 2011-03-03 09:00:48 UTC
Reproduced on latest kernel-2.6.32-119.el6 when run kdump crashPanic test:
=========================================================================

------------[ cut here ]------------ 
WARNING: at drivers/pci/dmar.c:594 warn_invalid_dmar+0x7a/0x90() (Not tainted) 
Hardware name: HP xw4600 Workstation 
[Firmware Warn]: Your BIOS is broken; DMAR reported at address fed90000 returns all ones! 
BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.13; Product Version:   
Modules linked in: 
Pid: 0, comm: swapper Not tainted 2.6.32-119.el6.x86_64 #1 
Call Trace: 
 [<ffffffff81066ec7>] ? warn_slowpath_common+0x87/0xc0 
 [<ffffffff81066f5f>] ? warn_slowpath_fmt_taint+0x3f/0x50 
 [<ffffffff81036d9d>] ? native_set_pte_at+0xd/0x40 
 [<ffffffff810367d9>] ? native_flush_tlb_single+0x9/0x10 
 [<ffffffff81294caa>] ? warn_invalid_dmar+0x7a/0x90 
 [<ffffffff81bea2b0>] ? check_zero_address+0xd6/0x118 
 [<ffffffff812db2bb>] ? acpi_get_table_with_size+0x5a/0xb4 
 [<ffffffff814e35a8>] ? _etext+0x0/0x17c 
 [<ffffffff81bea304>] ? detect_intel_iommu+0x12/0x91 
 [<ffffffff81bc4831>] ? pci_iommu_alloc+0x5e/0x6c 
 [<ffffffff81bd6591>] ? mem_init+0x19/0xec 
 [<ffffffff81bbcd25>] ? start_kernel+0x21a/0x424 
 [<ffffffff81bbc33a>] ? x86_64_start_reservations+0x125/0x129 
 [<ffffffff81bbc438>] ? x86_64_start_kernel+0xfa/0x109 
---[ end trace a7919e7f17c0a725 ]--- 
Disabling lock debugging due to kernel taint 
AMD-Vi disabled by default: pass amd_iommu=on to enable 
Memory: 105384k/163820k available (5005k kernel code, 32772k absent, 25664k reserved, 6914k data, 1228k init) 
Hierarchical RCU implementation. 
NR_IRQS:33024 nr_irqs:440 
Extended CMOS year: 2000 
Spurious LAPIC timer interrupt on cpu 0 
do_IRQ: 0.169 No irq handler for vector (irq -1) 
Console: colour VGA+ 80x25 
console [ttyS0] enabled 
HPET: 4 timers in total, 0 timers will be used for per-cpu timer 
Fast TSC calibration using PIT 
Detected 2333.537 MHz processor. 
Calibrating delay loop (skipped), value calculated using timer frequency.. 4667.07 BogoMIPS (lpj=2333537) 
pid_max: default: 32768 minimum: 301 
Security Framework initialized 
SELinux:  Initializing. 
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) 
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes) 
Mount-cache hash table entries: 256 
Initializing cgroup subsys ns 
Initializing cgroup subsys cpuacct 
Initializing cgroup subsys memory 
Initializing cgroup subsys devices 
Initializing cgroup subsys freezer 
Initializing cgroup subsys net_cls 
Initializing cgroup subsys blkio 
CPU: Physical Processor ID: 0 
CPU: Processor Core ID: 1 
mce: CPU supports 6 MCE banks 
using mwait in idle threads. 
SMP alternatives: switching to UP code 
ACPI: Core revision 20090903 
ftrace: converting mcount calls to 0f 1f 44 00 00 
ftrace: allocating 20654 entries in 81 pages 
DMAR: Host address width 36 
DMAR: DRHD base: 0x000000fed90000 flags: 0x0 
------------[ cut here ]------------ 
WARNING: at drivers/pci/dmar.c:594 warn_invalid_dmar+0x7a/0x90() (Tainted: G          I----------------  ) 
Hardware name: HP xw4600 Workstation 
[Firmware Warn]: Your BIOS is broken; DMAR reported at address fed90000 returns all ones! 
BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.13; Product Version:   
Modules linked in: 
Pid: 1, comm: swapper Tainted: G          I----------------   2.6.32-119.el6.x86_64 #1 
Call Trace: 
 [<ffffffff81066ec7>] ? warn_slowpath_common+0x87/0xc0 
 [<ffffffff81066f5f>] ? warn_slowpath_fmt_taint+0x3f/0x50 
 [<ffffffff810418a8>] ? __ioremap_caller+0x2a8/0x390 
 [<ffffffff81294caa>] ? warn_invalid_dmar+0x7a/0x90 
 [<ffffffff81294ec3>] ? alloc_iommu+0x203/0x2b0 
 [<ffffffff81bea8f9>] ? dmar_table_init+0x1bd/0x3be 
 [<ffffffff81bcc98d>] ? enable_IR_x2apic+0x23/0x1f9 
 [<ffffffff81bca871>] ? native_smp_prepare_cpus+0x143/0x389 
 [<ffffffff81bbc6f9>] ? kernel_init+0x112/0x2f9 
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 
 [<ffffffff81bbc5e7>] ? kernel_init+0x0/0x2f9 
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 
---[ end trace a7919e7f17c0a726 ]--- 
DMAR: parse DMAR table failure. 
Setting APIC routing to flat 
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 
CPU0: Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz stepping 0b 
Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only. 
[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 53003c) 
NMI watchdog disabled for cpu0: unable to create perf event: -2 
Brought up 1 CPUs 
Total of 1 processors activated (4667.07 BogoMIPS). 

------------------------------------------------
https://beaker.engineering.redhat.com/recipes/119017

Comment 19 Prarit Bhargava 2011-03-03 15:25:13 UTC
The code matches upstream.  It catches a well known issue with BIOS DMAR tables.

Removing this warning is a bad idea.  It notifies both customers and vendors that there is something wrong with their BIOS, and that the BIOS is in need of an upgrade.

P.