|Summary:||Calgary: DMA error on CalIOC2 PHB 0x3|
|Product:||[Fedora] Fedora||Reporter:||Luc Stepniewski <lior>|
|Component:||kernel||Assignee:||Kernel Maintainer List <kernel-maint>|
|Status:||CLOSED WONTFIX||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||12||CC:||dougsland, dwheeler, gansalmon, itamar, kernel-maint, lior, mishu, ngaywood|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2010-12-04 02:45:07 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Luc Stepniewski 2009-11-26 14:01:47 UTC
Description of problem: When doing "intensive" I/O, the mpt* drivers crashes the filesystem, on Fedora 12. The problem is on an IBM x3580 M2 machine, using the integrated LSI SAS1078 C1 PCI-express Fusion-MPT SAS. Steps to Reproduce: 1. Create a big allocated space (20GB for example) 2. dd if=/dev/vg/mybigspace of=/dev/null 3. After a few minutes, the filesystem access becomes impossible. Looking at dmesg, you get the following: Calgary: DMA error on CalIOC2 PHB 0x3 Calgary: 0x80000000@CSR 0x00000000@PLSSR 0xb0008000@CSMR 0x00000000@MCK Calgary: 0x00000000@0x810 0x00000000@0x820 0x00000000@0x830 0x00000000@0x840 0x00000000@0x850 0x00000000@0x860 0x00000000@0x870 Calgary: 0x40000000@0xcb0 irq 46: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper Not tainted 126.96.36.199-127.fc12.x86_64 #1 Call Trace: <IRQ> [<ffffffff8109aefc>] __report_bad_irq+0x3d/0x8c [<ffffffff8109b063>] note_interrupt+0x118/0x17d [<ffffffff8109b6f2>] handle_fasteoi_irq+0xa1/0xc6 [<ffffffff8101463c>] handle_irq+0x8b/0x93 [<ffffffff8141e9cc>] do_IRQ+0x5c/0xbc [<ffffffff810126d3>] ret_from_intr+0x0/0x11 <EOI> [<ffffffff8101907f>] ? mwait_idle+0x91/0xae [<ffffffff8101907f>] ? mwait_idle+0x91/0xae [<ffffffff81019021>] ? mwait_idle+0x33/0xae [<ffffffff8141d079>] ? atomic_notifier_call_chain+0x13/0x15 [<ffffffff81010bb8>] ? enter_idle+0x25/0x27 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9 [<ffffffff814145be>] ? start_secondary+0x1f3/0x234 handlers: [<ffffffffa00e3d7e>] (mpt_interrupt+0x0/0x8bb [mptbase]) Disabling IRQ #46 mptscsih: ioc0: attempting task abort! (sc=ffff880a0d8fa400) sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 fb 5b a7 00 00 60 00 mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)! mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!! mptbase: ioc0: Initiating recovery mptbase: ioc0: WARNING - Unexpected doorbell active! mptbase: ioc0: WARNING - NOT READY WARNING! mptbase: WARNING - (-1) Cannot recover ioc0 mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!! mptscsih: ioc0: task abort: FAILED (sc=ffff880a0d8fa400) mptscsih: ioc0: attempting task abort! (sc=ffff880a04bb3100) sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 28 f5 ff 00 00 08 00 mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)! mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!! mptbase: ioc0: Initiating recovery mptbase: ioc0: WARNING - Unexpected doorbell active! mptbase: ioc0: WARNING - NOT READY WARNING! mptbase: WARNING - (-1) Cannot recover ioc0 mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!! mptscsih: ioc0: task abort: FAILED (sc=ffff880a04bb3100) mptscsih: ioc0: attempting task abort! (sc=ffff880a04bb2600) sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 61 de 27 00 00 08 00 mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)! mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!! mptbase: ioc0: Initiating recovery mptbase: ioc0: WARNING - Unexpected doorbell active! [root@flanders ubuntu]# Message from syslogd@mymachine at Nov 26 10:38:28 ... kernel:mpage_da_map_blocks block allocation failed for inode 11762 at logical offset 2 with max blocks 1 with error -30 Message from syslogd@mymachine at Nov 26 10:38:28 ... kernel:This should not happen.!! Data will be lost The first error message ("Calgary: DMA error on CalIOC2 PHB 0x3") seems to be related to a bug in the Calgary code, as detailed in a thread in LKML: "The calgary code can give drivers addresses above 4GB which is very bad for hardware that is only 32bit DMA addressable" (http://firstname.lastname@example.org/2008-06/05248/Re:_%5BPATCH_-mm%5D_x86_calgary:_fix_handling_of_devces_that_aren%27t_behind_the_Calgary ). But it's from 2008, I thought this would have been corrected... After looking on another bug report (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/343749 ), the temporary solution seems to be to set iommu=soft at boot. But I guess this affects performance... Acccording to that bug report, the bug seems to be corrected on RH 9 ?! The bug exists in Fedora 12, and makes it unusable on a x3580 M2.
Comment 1 David A. Wheeler 2009-12-29 00:03:03 UTC
I also had a SERIOUS problem installing Fedora 12 on a Dell Optiplex GX620 which *appears* to be the same thing. I tried to install Fedora 12 on a Dell Optiplex GX620 as a 64-bit OS (x86_64), using BIOS revision A10. When the disk was busy installing things it suddenly hung with this message: kernel: mpage_da_map_blocks block allocation failed for inode 211 at logical offset 0 with max blocks 1 with error -30 kernel: This should not happen.!! Data will be lost On each boot, I needed to add kernel entry iommu=soft Later I modified /boot/grub/grub.conf so all entries for kernel added: iommu=soft Previously I had tried to upgrade the BIOS to rev. A11; this caused complete loss of the USB keyboard/mouse, so I re-installed revision A10. I'm in the middle of an install now that the iommu=soft setting has been added; so far, this *appears* to have solved the problem.
Comment 2 Norman Gaywood 2010-07-04 01:48:25 UTC
I notice that Redhat 6 Beta 2 has this in the release notes of known kernel problems: Calgary IOMMU default detection has been disabled in this release. If you require Calgary IOMMU support add 'iommu=calgary' as a boot parameter. So perhaps the new Enterprise kernel is now hitting this problem?
Comment 3 Bug Zapper 2010-11-04 05:16:08 UTC
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 4 Bug Zapper 2010-12-04 02:45:07 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.