Bug 579563
| Summary: | Unwarranted WARN_ON on drivers/pci/dmar.c:616 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Ben Woodard <woodard> |
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
| Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.0 | CC: | cww, cye, dvlasenk, esandeen, jfeeney, jwest, kklic, mschmidt, npajkovs, prarit, tao |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | abrt_hash:269740827 | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-03-03 13:28:34 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben Woodard
2010-04-05 20:41:16 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. This is the 2nd abrt-reported bug I've seen which is not a kernel bug but a hardware bug that we cannot fix (the other was bug #572533) I'm sympathetic to the difficulty of determining whether a backtrace is a kernel bug or not, but filing bugzillas when the kernel reports hardware problems is not helpful - is there any way to filter these out? Thanks, -Eric Ditto here -- see 572533. Is there any chance we could get better abrt messages in bugzilla? P. The direct reporting of kerneloops to BZ is planned only for beta1 after that it should go thru the new gss portal and these false positives will be filtered out much better. Until then we will try to make the heuristics better. J. There are steady trickle of patches to kernel which replace such ill-advised WARN_ONs with more appropriate messages. Try this search: http://www.google.com/search?q=remove+WARN_ON Example of WARN_ON removal: https://kerneltrap.org/mailarchive/git-commits-head/2008/12/28/4528564 Here, WARN_ON is a wrong thing to use to print "Your BIOS is broken; DMAR reported at address zero!" - it is not a kernel bug, it's kernel detecting BIOS bug. Backtrace which WARN_ON generates is not useful in fixing that BIOS bug. I think we should reassign such bugs to kernel and change description to "Unwarranted WARN_ON on drivers/pci/dmar.c:616". Then kernel people would either downgrade it to printk or remove. *** Bug 587657 has been marked as a duplicate of this bug. *** Ok, fair enough. This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Reproduced on latest kernel-2.6.32-119.el6 when run kdump crashPanic test: ========================================================================= ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:594 warn_invalid_dmar+0x7a/0x90() (Not tainted) Hardware name: HP xw4600 Workstation [Firmware Warn]: Your BIOS is broken; DMAR reported at address fed90000 returns all ones! BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.13; Product Version: Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.32-119.el6.x86_64 #1 Call Trace: [<ffffffff81066ec7>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff81066f5f>] ? warn_slowpath_fmt_taint+0x3f/0x50 [<ffffffff81036d9d>] ? native_set_pte_at+0xd/0x40 [<ffffffff810367d9>] ? native_flush_tlb_single+0x9/0x10 [<ffffffff81294caa>] ? warn_invalid_dmar+0x7a/0x90 [<ffffffff81bea2b0>] ? check_zero_address+0xd6/0x118 [<ffffffff812db2bb>] ? acpi_get_table_with_size+0x5a/0xb4 [<ffffffff814e35a8>] ? _etext+0x0/0x17c [<ffffffff81bea304>] ? detect_intel_iommu+0x12/0x91 [<ffffffff81bc4831>] ? pci_iommu_alloc+0x5e/0x6c [<ffffffff81bd6591>] ? mem_init+0x19/0xec [<ffffffff81bbcd25>] ? start_kernel+0x21a/0x424 [<ffffffff81bbc33a>] ? x86_64_start_reservations+0x125/0x129 [<ffffffff81bbc438>] ? x86_64_start_kernel+0xfa/0x109 ---[ end trace a7919e7f17c0a725 ]--- Disabling lock debugging due to kernel taint AMD-Vi disabled by default: pass amd_iommu=on to enable Memory: 105384k/163820k available (5005k kernel code, 32772k absent, 25664k reserved, 6914k data, 1228k init) Hierarchical RCU implementation. NR_IRQS:33024 nr_irqs:440 Extended CMOS year: 2000 Spurious LAPIC timer interrupt on cpu 0 do_IRQ: 0.169 No irq handler for vector (irq -1) Console: colour VGA+ 80x25 console [ttyS0] enabled HPET: 4 timers in total, 0 timers will be used for per-cpu timer Fast TSC calibration using PIT Detected 2333.537 MHz processor. Calibrating delay loop (skipped), value calculated using timer frequency.. 4667.07 BogoMIPS (lpj=2333537) pid_max: default: 32768 minimum: 301 Security Framework initialized SELinux: Initializing. Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) Inode-cache hash table entries: 8192 (order: 4, 65536 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct Initializing cgroup subsys memory Initializing cgroup subsys devices Initializing cgroup subsys freezer Initializing cgroup subsys net_cls Initializing cgroup subsys blkio CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 mce: CPU supports 6 MCE banks using mwait in idle threads. SMP alternatives: switching to UP code ACPI: Core revision 20090903 ftrace: converting mcount calls to 0f 1f 44 00 00 ftrace: allocating 20654 entries in 81 pages DMAR: Host address width 36 DMAR: DRHD base: 0x000000fed90000 flags: 0x0 ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:594 warn_invalid_dmar+0x7a/0x90() (Tainted: G I---------------- ) Hardware name: HP xw4600 Workstation [Firmware Warn]: Your BIOS is broken; DMAR reported at address fed90000 returns all ones! BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.13; Product Version: Modules linked in: Pid: 1, comm: swapper Tainted: G I---------------- 2.6.32-119.el6.x86_64 #1 Call Trace: [<ffffffff81066ec7>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff81066f5f>] ? warn_slowpath_fmt_taint+0x3f/0x50 [<ffffffff810418a8>] ? __ioremap_caller+0x2a8/0x390 [<ffffffff81294caa>] ? warn_invalid_dmar+0x7a/0x90 [<ffffffff81294ec3>] ? alloc_iommu+0x203/0x2b0 [<ffffffff81bea8f9>] ? dmar_table_init+0x1bd/0x3be [<ffffffff81bcc98d>] ? enable_IR_x2apic+0x23/0x1f9 [<ffffffff81bca871>] ? native_smp_prepare_cpus+0x143/0x389 [<ffffffff81bbc6f9>] ? kernel_init+0x112/0x2f9 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff81bbc5e7>] ? kernel_init+0x0/0x2f9 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 ---[ end trace a7919e7f17c0a726 ]--- DMAR: parse DMAR table failure. Setting APIC routing to flat ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz stepping 0b Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only. [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 53003c) NMI watchdog disabled for cpu0: unable to create perf event: -2 Brought up 1 CPUs Total of 1 processors activated (4667.07 BogoMIPS). ------------------------------------------------ https://beaker.engineering.redhat.com/recipes/119017 with this guiltyfunc: bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008 with this guiltyfunc: bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008 The code matches upstream. It catches a well known issue with BIOS DMAR tables. Removing this warning is a bad idea. It notifies both customers and vendors that there is something wrong with their BIOS, and that the BIOS is in need of an upgrade. P. |