Bug 863653

Summary: ahci_stop_engine: BUG: unable to handle kernel paging request at ffffc90005337018
Product: [Fedora] Fedora Reporter: Sanne Bregman <sanne>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: acooks, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, sanne
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:dc7866292b3e388707aa8d573697da4e662c5a1b
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 16:02:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanne Bregman 2012-10-06 10:03:07 UTC
Description of problem:
This is most likely related to bug #863652, seeing how both of them appeared in my system tray at the same moment. I copied the description of that bug into this one because of that reason.

I tried to start a FreeBSD VM in KVM with PCI passthrough set on an Asus USB3/SATA3 PCIe card. I forwarded all the PCIe bridges on that card and the USB3 and the SATA3 "card"/chip to the VM. The moment I started the VM, Fedora's virt-manager appeared to hang for a little while, after which I was dropped to a text console with the kernel panic.

The relevant part of lspci is:
01:00.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:01.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:05.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:07.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:09.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
03:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev ff)
04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9120 SATA 6Gb/s Controller (rev ff)

The kernel threw some errors during bootup, before Fedora's boot splash appeared. These were all relating to ata7 and 8, which I presume are on the SATA chip. Those errors were:
[   12.742612] ata7: softreset failed (1st FIS failed)
[   12.743609] ata8: softreset failed (1st FIS failed)
[   12.748917] dmar: DRHD: handling fault status reg 2
[   12.749685] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr fffe0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   12.751850] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr fffc0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   13.047266] dmar: DRHD: handling fault status reg 2
[   13.048034] dmar: DMAR:[DMA Read] Request device [04:00.1] fault addr fffe0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   22.730671] ata7: softreset failed (1st FIS failed)
[   22.731652] ata8: softreset failed (1st FIS failed)
[   22.736939] dmar: DRHD: handling fault status reg 2
[   22.737707] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr fffe0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   22.739872] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr fffc0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   23.035350] dmar: DRHD: handling fault status reg 2
[   23.036117] dmar: DMAR:[DMA Read] Request device [04:00.1] fault addr fffe0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   57.686396] ata7: softreset failed (1st FIS failed)
[   57.687154] ata7: limiting SATA link speed to 1.5 Gbps
[   57.687384] ata8: softreset failed (1st FIS failed)
[   57.688139] ata8: limiting SATA link speed to 1.5 Gbps
[   57.692756] dmar: DRHD: handling fault status reg 2
[   57.693523] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr fffe0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   57.695690] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr fffc0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   57.991054] dmar: DRHD: handling fault status reg 2
[   57.991821] dmar: DMAR:[DMA Read] Request device [04:00.1] fault addr fffe0000 
DMAR:[fault reason 02] Present bit in context entry is clear
[   62.679942] ata7: softreset failed (1st FIS failed)
[   62.680700] ata7: reset failed, giving up
[   62.681965] ata8: softreset failed (1st FIS failed)
[   62.682736] ata8: reset failed, giving up

Additional info:
libreport version: 2.0.15
abrt_version:   2.0.14
cmdline:        BOOT_IMAGE=/vmlinuz-3.6.0-0.rc7.git1.4.fc18.x86_64 root=UUID=502ceada-dbc4-43dc-92cb-88f9196f1f4e ro rd.md=0 rd.lvm=0 rd.dm=0 rd.luks=0 rhgb quiet intel_iommu=on
kernel:         3.6.0-0.rc7.git1.4.fc18.x86_64

backtrace:
:BUG: unable to handle kernel paging request at ffffc90005337018
:IP: [<ffffffff814987c9>] ahci_stop_engine+0x29/0x70
:PGD 1001d4067 PUD 10031a067 PMD 75662067 PTE 0
:Oops: 0000 [#1] SMP 
:Modules linked in: fuse ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc bnep bluetooth rfkill be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support coretemp microcode serio_raw i2c_i801 snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core snd_page_alloc snd_timer snd e1000e mei soundcore ppdev parport_pc parport vhost_net tun macvtap macvlan kvm_intel kvm uinput btrfs libcrc32c zlib_deflate crc32c_intel ghash_clmulni_intel i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
:CPU 0 
:Pid: 681, comm: libvirtd Not tainted 3.6.0-0.rc7.git1.4.fc18.x86_64 #1                  /DH77DF
:RIP: 0010:[<ffffffff814987c9>]  [<ffffffff814987c9>] ahci_stop_engine+0x29/0x70
:RSP: 0018:ffff88004e465b78  EFLAGS: 00010206
:RAX: 0000000000000f00 RBX: 000000000000001e RCX: 000000000000001d
:RDX: ffffc90005337000 RSI: ffffc90005337018 RDI: ffff880037660000
:RBP: ffff88004e465b78 R08: 0000000000000000 R09: 0000000000000000
:R10: 0000000000000001 R11: 0000000000000000 R12: ffff880037660000
:R13: ffff88004e465bd0 R14: ffffc90005336000 R15: ffff88006bfe0ab8
:FS:  00007f841fcd7700(0000) GS:ffff880075a00000(0000) knlGS:0000000000000000
:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
:CR2: ffffc90005337018 CR3: 000000004d973000 CR4: 00000000001407f0
:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
:Process libvirtd (pid: 681, threadinfo ffff88004e464000, task ffff88004e44a680)
:Stack:
: ffff88004e465ba8 ffffffff81499944 ffffffff81a41b54 000000000000001e
: ffffc90005336000 000000000000001e ffff88004e465c08 ffffffff81499ac2
: 0000000000000000 ffff8800706c91e0 ffff88004e465be8 0000000000000000
:Call Trace:
: [<ffffffff81499944>] ahci_deinit_port+0x24/0xb0
: [<ffffffff81499ac2>] ahci_init_controller+0xa2/0xf0
: [<ffffffff814973f2>] ahci_pci_init_controller+0x42/0x50
: [<ffffffff81497d42>] ahci_init_one+0x5e2/0xa70
: [<ffffffff816de3df>] ? _raw_spin_unlock_irqrestore+0x3f/0x80
: [<ffffffff813760d9>] local_pci_probe+0x79/0x100
: [<ffffffff81376271>] pci_device_probe+0x111/0x120
: [<ffffffff8143f3eb>] driver_probe_device+0x8b/0x390
: [<ffffffff8143f7a0>] ? __driver_attach+0xb0/0xb0
: [<ffffffff8143f7eb>] __device_attach+0x4b/0x60
: [<ffffffff8143d2de>] bus_for_each_drv+0x4e/0xa0
: [<ffffffff8143f2f8>] device_attach+0xa8/0xc0
: [<ffffffff8143dfcb>] bus_rescan_devices_helper+0x4b/0x80
: [<ffffffff8143e02a>] store_drivers_probe+0x2a/0x60
: [<ffffffff8124c668>] ? sysfs_write_file+0xc8/0x150
: [<ffffffff8143d087>] bus_attr_store+0x27/0x30
: [<ffffffff8124c680>] sysfs_write_file+0xe0/0x150
: [<ffffffff811cf6cf>] vfs_write+0xaf/0x190
: [<ffffffff811cfa0d>] sys_write+0x4d/0x90
: [<ffffffff816e7369>] system_call_fastpath+0x16/0x1b
:Code: 00 00 55 48 89 e5 0f 1f 44 00 00 48 8b 87 10 36 00 00 48 8b 50 60 8b 47 28 48 8b 12 c1 e0 07 48 8d 94 02 00 01 00 00 48 8d 72 18 <8b> 4a 18 31 c0 f7 c1 01 80 00 00 75 0a 5d c3 0f 1f 84 00 00 00 
:RIP  [<ffffffff814987c9>] ahci_stop_engine+0x29/0x70
: RSP <ffff88004e465b78>
:CR2: ffffc90005337018

Comment 1 Sanne Bregman 2012-10-06 10:11:48 UTC
After removing the PCIe bridges (the PLX technology devices from above) I can try to start the VM, but it gives me an error that the USB and SATA devices are not assignable. So my best guess is that there might be an error in the redirection of PCIe bridges, or that I just did something very stupid :P

Comment 2 Josh Boyer 2012-11-12 21:00:44 UTC
*** Bug 863652 has been marked as a duplicate of this bug. ***

Comment 3 Josh Boyer 2012-11-12 21:00:58 UTC
Are you still able to recreate this with 3.6.6 or newer?

Comment 4 Sanne Bregman 2012-11-13 19:23:20 UTC
Tested this, but couldn't reproduce it. virt-manager now tells me that I can't assign the whole PCIe switch to a VM. So I guess this is "fixed", but at a higher level than the kernel I guess?

Comment 5 Andrew Cooks 2012-11-15 00:12:58 UTC
(In reply to comment #3)
> Are you still able to recreate this with 3.6.6 or newer?

No fix/quirk has been added to enable DMAR with Marvell 88SE91xx SATA controllers as of 3.6.6.

See also bug 757166.

Comment 6 Sanne Bregman 2012-11-15 16:58:57 UTC
What I also noticed, that the card (Asus U3S6) stopped complaining a LOT during bootup (it delayed bootup for ~30 seconds). So I don't know what happened, but it was something good!

Comment 7 Andrew Cooks 2013-03-04 02:39:57 UTC
Patrick, the delay may be gone, but I expected the DMAR errors to continue up to at least 3.9-rc1. Can you confirm whether you can use a drive attached to this controller with the iommu enabled?

Comment 8 Sanne Bregman 2013-03-04 12:29:04 UTC
I could use it, just not in a virtual machine. I've been running ZFS on Linux on that card for a little while, but I've recently replaced the card with a more suitable one.

I could try it again, but I won't have a chance for that before next weekend I'm afraid.

Comment 9 Sanne Bregman 2013-03-10 18:43:19 UTC
A bit late, but I had some hardware issues. Anyway.

I installed the card again, and I get the boot slowdown again (see log below).
In the virtual machine I ran GParted to check it, and I get a few FIS resets and after that I can't actually use it. I connected a 80GB disk to it, but GParted doesn't see the disk. It does see the controller, and I can see the controller BIOS running in the VM.

Long story short, I don't see attached storage in the VM but I do see the cards BIOS loading in the VM.

The log (with a bit of explanation):
--- This is the initial start of the host (running Fedora 18), same as before
[    0.023657] dmar: Host address width 36
[    0.023659] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[    0.023666] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020e60262 ecap f0101a
[    0.023667] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[    0.023671] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
[    0.023672] dmar: RMRR base: 0x000000da23e000 end: 0x000000da24bfff
[    0.023673] dmar: RMRR base: 0x000000db800000 end: 0x000000df9fffff
[    0.023744] IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
---

--- This is upon starting the VM, requests coming from a subdevice???
[  911.428243] pci-stub 0000:03:00.0: claimed by stub
[  911.429493] pci-stub 0000:04:00.0: claimed by stub
[  911.960154] device vnet0 entered promiscuous mode
[  911.971229] br0: port 2(vnet0) entered forwarding state
[  911.971252] br0: port 2(vnet0) entered forwarding state
[  912.070976] br0: port 2(vnet0) entered forwarding state
[  912.803708] assign device 0:3:0.0
[  912.826069] assign device 0:4:0.0
[ 1018.553004] kvm [6637]: vcpu0 unhandled rdmsr: 0x345
[ 1018.555037] kvm_set_msr_common: 22 callbacks suppressed
[ 1018.555039] kvm [6637]: vcpu0 unhandled wrmsr: 0x680 data 0
[ 1018.555042] kvm [6637]: vcpu0 unhandled wrmsr: 0x6c0 data 0
[ 1018.555043] kvm [6637]: vcpu0 unhandled wrmsr: 0x681 data 0
[ 1018.555045] kvm [6637]: vcpu0 unhandled wrmsr: 0x6c1 data 0
[ 1018.555047] kvm [6637]: vcpu0 unhandled wrmsr: 0x682 data 0
[ 1018.555049] kvm [6637]: vcpu0 unhandled wrmsr: 0x6c2 data 0
[ 1018.555051] kvm [6637]: vcpu0 unhandled wrmsr: 0x683 data 0
[ 1018.555052] kvm [6637]: vcpu0 unhandled wrmsr: 0x6c3 data 0
[ 1018.555054] kvm [6637]: vcpu0 unhandled wrmsr: 0x684 data 0
[ 1018.555056] kvm [6637]: vcpu0 unhandled wrmsr: 0x6c4 data 0
[ 1020.535437] pci-stub 0000:03:00.0: irq 49 for MSI/MSI-X
[ 1020.572358] pci-stub 0000:03:00.0: irq 49 for MSI/MSI-X
[ 1020.572368] pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
[ 1020.608267] pci-stub 0000:04:00.0: irq 52 for MSI/MSI-X
[ 1020.640122] dmar: DRHD: handling fault status reg 2
[ 1020.640132] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr 37240000 
DMAR:[fault reason 02] Present bit in context entry is clear
[ 1030.623429] dmar: DRHD: handling fault status reg 2
[ 1030.623438] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr 37240000 
DMAR:[fault reason 02] Present bit in context entry is clear
[ 1040.610126] dmar: DRHD: handling fault status reg 2
[ 1040.610134] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr 37240000 
DMAR:[fault reason 02] Present bit in context entry is clear
[ 1075.563641] dmar: DRHD: handling fault status reg 2
[ 1075.563650] dmar: DMAR:[DMA Write] Request device [04:00.1] fault addr 37240000 
DMAR:[fault reason 02] Present bit in context entry is clear
---

lspci for completeness:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/Ivy Bridge Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.7 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
01:00.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:01.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:05.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:07.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:09.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
03:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9120 SATA 6Gb/s Controller (rev 12)
08:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6315 Series Firewire Controller (rev 01)

As lspci shows, there is no PCIe device 04:00.1... I find this kinda weird, why is a non-existing device sending messages?

Hope this helps, if you need more information, just give me a shout!

Comment 10 Justin M. Forbes 2013-10-18 21:00:18 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 11 Justin M. Forbes 2013-11-27 16:02:37 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  

It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo.  Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue.  As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution

If you experience different issues, please open a new bug report for those.

Comment 12 Sanne Bregman 2014-11-03 20:03:03 UTC
This is a comment to stop getting weekly mails about this old bug. Sorry to bump it!