Bug 1433593 - Random oops in radeon/drm
Summary: Random oops in radeon/drm
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-18 14:13 UTC by Christian Kellner
Modified: 2019-01-09 12:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-06 18:31:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lspci -vvv (62.99 KB, text/plain)
2017-03-18 14:24 UTC, Christian Kellner
no flags Details
dmesg output (145.33 KB, text/plain)
2017-03-18 14:25 UTC, Christian Kellner
no flags Details

Description Christian Kellner 2017-03-18 14:13:38 UTC
Seemingly random during operation (sometimes browsing with firefox but not limited to that) at various intervals the screen goes blank and doesn't come up again.

Kernel is: Linux pewter.cns.bzm 4.9.13-201.fc25.x86_64 #1 SMP Tue Mar 7 23:47:11 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

GPU: 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]


[ 1610.794905] pciehp 0000:00:03.0:pcie004: Slot(5-3): Card not present
[ 1610.794911] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Down
[ 1610.794976] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Fatal) error received: id=0018
[ 1610.794994] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0018(Requester ID)
[ 1610.795029] pcieport 0000:00:03.0:   device [8086:0e08] error status/mask=00004020/00000000
[ 1610.795036] pcieport 0000:00:03.0:    [ 5] Surprise Down Error    (First)
[ 1610.795042] pcieport 0000:00:03.0:    [14] Completion Timeout    
[ 1610.795050] pcieport 0000:00:03.0: broadcast error_detected message
[ 1610.795053] radeon 0000:06:00.0: device has no AER-aware driver
[ 1610.795056] snd_hda_intel 0000:06:00.1: device has no AER-aware driver
[ 1610.795081] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Down event ignored; already powering off
[ 1610.805047] pciehp 0000:00:03.0:pcie004: Slot(5-3): Card present
[ 1610.805055] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Up
[ 1610.806801] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Up event ignored; already powering on
[ 1610.814966] radeon 0000:06:00.0: GPU fault detected: 146 0x05c6480c
[ 1610.814976] radeon 0000:06:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0003A7AE
[ 1610.814981] radeon 0000:06:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0604800C
[ 1610.814986] VM fault (0x0c, vmid 3) at page 239534, read from TC (72)
[ 1610.815117] radeon 0000:06:00.0: GPU fault detected: 146 0x04e6480c
[ 1610.815122] radeon 0000:06:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0003A7A7
[ 1610.815126] radeon 0000:06:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0604800C
[ 1610.815130] VM fault (0x0c, vmid 3) at page 239527, read from TC (72)
[ 1610.973306] Console: switching to colour dummy device 80x25
[ 1610.975706] ------------[ cut here ]------------
[ 1610.975729] WARNING: CPU: 0 PID: 86 at drivers/gpu/drm/drm_crtc.c:1154 drm_mode_config_cleanup+0x204/0x230 [drm]
[ 1610.975730] Modules linked in: rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_mangle ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_mangle iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep nls_utf8 hfsplus b43 mac80211 cfg80211 intel_rapl sb_edac edac_core ssb x86_pkg_temp_thermal joydev mmc_core intel_powerclamp coretemp kvm_intel iTCO_wdt kvm iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore applesmc
[ 1610.975773]  input_polldev intel_rapl_perf btusb btrtl btbcm btintel zfs(POE) bluetooth zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) rfkill i2c_i801 lpc_ich i2c_smbus bcma snd_hda_codec_cirrus snd_hda_codec_generic mei_me snd_hda_codec_hdmi mei snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm thunderbolt apple_gmux apple_bl snd_timer video snd ioatdma soundcore dca shpchp tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc amdkfd amd_iommu_v2 radeon 8021q garp stp llc mrp i2c_algo_bit drm_kms_helper ttm drm crc32c_intel tg3 ptp pps_core fjes uas usb_storage
[ 1610.975820] CPU: 0 PID: 86 Comm: kworker/0:1 Tainted: P        W  OE   4.9.13-201.fc25.x86_64 #1
[ 1610.975822] Hardware name: Apple Inc. MacPro6,1/Mac-F60DEB81FF30ACF6, BIOS MP61.88Z.0116.B21.1610201524 10/20/2016
[ 1610.975827] Workqueue: pciehp-5 pciehp_power_thread
[ 1610.975830]  ffffb1e601dd7bb0 ffffffffbb3f48ed 0000000000000000 0000000000000000
[ 1610.975833]  ffffb1e601dd7bf0 ffffffffbb0a306b 00000482c01fb9c0 ffff9feb6c8d1c80
[ 1610.975836]  ffff9feb6c8d1800 ffff9feb6c8d1b48 ffff9feb68277140 ffff9fea3d0e5548
[ 1610.975839] Call Trace:
[ 1610.975846]  [<ffffffffbb3f48ed>] dump_stack+0x63/0x86
[ 1610.975849]  [<ffffffffbb0a306b>] __warn+0xcb/0xf0
[ 1610.975851]  [<ffffffffbb0a319d>] warn_slowpath_null+0x1d/0x20
[ 1610.975863]  [<ffffffffc01ea2e4>] drm_mode_config_cleanup+0x204/0x230 [drm]
[ 1610.975895]  [<ffffffffc031f6c5>] radeon_modeset_fini+0x95/0xb0 [radeon]
[ 1610.975912]  [<ffffffffc02f7703>] radeon_driver_unload_kms+0x43/0x80 [radeon]
[ 1610.975924]  [<ffffffffc01e53c6>] drm_dev_unregister+0x36/0xc0 [drm]
[ 1610.975935]  [<ffffffffc01e5ac2>] drm_put_dev+0x32/0x60 [drm]
[ 1610.975952]  [<ffffffffc02f3305>] radeon_pci_remove+0x15/0x20 [radeon]
[ 1610.975955]  [<ffffffffbb44ba59>] pci_device_remove+0x39/0xc0
[ 1610.975958]  [<ffffffffbb54bbc1>] __device_release_driver+0xa1/0x160
[ 1610.975960]  [<ffffffffbb54bca3>] device_release_driver+0x23/0x30
[ 1610.975963]  [<ffffffffbb443dda>] pci_stop_bus_device+0x8a/0xa0
[ 1610.975966]  [<ffffffffbb443ed2>] pci_stop_and_remove_bus_device+0x12/0x20
[ 1610.975968]  [<ffffffffbb45fd3a>] pciehp_unconfigure_device+0xaa/0x1a0
[ 1610.975971]  [<ffffffffbb45f822>] pciehp_disable_slot+0x52/0xe0
[ 1610.975973]  [<ffffffffbb45f93a>] pciehp_power_thread+0x8a/0xa0
[ 1610.975976]  [<ffffffffbb0bd4d4>] process_one_work+0x184/0x430
[ 1610.975978]  [<ffffffffbb0bd7ce>] worker_thread+0x4e/0x480
[ 1610.975980]  [<ffffffffbb0bd780>] ? process_one_work+0x430/0x430
[ 1610.975982]  [<ffffffffbb0bd780>] ? process_one_work+0x430/0x430
[ 1610.975985]  [<ffffffffbb0c3549>] kthread+0xd9/0xf0
[ 1610.975988]  [<ffffffffbb0c3470>] ? kthread_park+0x60/0x60
[ 1610.975991]  [<ffffffffbb81ded5>] ret_from_fork+0x25/0x30
[ 1610.975993] ---[ end trace 91376aec7e98af43 ]---
[ 1610.976184] [drm] radeon: finishing device.
[ 1611.818165] pcieport 0000:00:03.0: Root Port link has been reset
[ 1611.818175] pcieport 0000:00:03.0: AER: Device recovery failed

Comment 1 Christian Kellner 2017-03-18 14:24:45 UTC
Created attachment 1264379 [details]
lspci -vvv

Machine is a MacPro6,1.

lspci output

00:00.0 Host bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 DMI2 (rev 04)
00:01.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 1a (rev 04)
00:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 04)
00:03.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3a (rev 04)
00:04.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 0 (rev 04)
00:04.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 1 (rev 04)
00:04.2 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 2 (rev 04)
00:04.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 3 (rev 04)
00:04.4 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 4 (rev 04)
00:04.5 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 5 (rev 04)
00:04.6 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 6 (rev 04)
00:04.7 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DMA Channel 7 (rev 04)
00:05.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
00:05.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Memory Hotplug (rev 04)
00:05.2 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 IIO RAS (rev 04)
00:05.4 PIC: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 IOAPIC (rev 04)
00:11.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 06)
00:16.0 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #1 (rev 05)
00:1b.0 Audio device: Intel Corporation C600/X79 series chipset High Definition Audio Controller (rev 06)
00:1c.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 (rev b6)
00:1c.1 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 2 (rev b6)
00:1c.2 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 3 (rev b6)
00:1c.4 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 5 (rev b6)
00:1d.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation C600/X79 series chipset LPC Controller (rev 06)
00:1f.3 SMBus: Intel Corporation C600/X79 series chipset SMBus Host Controller (rev 06)
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]
02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
0b:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM57762 Gigabit Ethernet PCIe
0c:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM57762 Gigabit Ethernet PCIe
0d:00.0 Network controller: Broadcom Limited BCM4360 802.11ac Wireless Network Adapter (rev 03)
0e:00.0 SATA controller: Samsung Electronics Co Ltd Apple PCIe SSD (rev 01)
10:00.0 PCI bridge: PLX Technology, Inc. Device 8723 (rev ca)
11:01.0 PCI bridge: PLX Technology, Inc. Device 8723 (rev ca)
11:02.0 PCI bridge: PLX Technology, Inc. Device 8723 (rev ca)
11:08.0 PCI bridge: PLX Technology, Inc. Device 8723 (rev ca)
11:09.0 PCI bridge: PLX Technology, Inc. Device 8723 (rev ca)
11:0a.0 PCI bridge: PLX Technology, Inc. Device 8723 (rev ca)
12:00.0 USB controller: Fresco Logic FL1100 USB 3.0 Host Controller (rev 10)
14:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
15:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
15:03.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
15:04.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
15:05.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
15:06.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
16:00.0 System peripheral: Intel Corporation DSL5520 Thunderbolt 2 NHI [Falcon Ridge 4C 2013]
5b:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
5c:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
5c:03.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
5c:04.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
5c:05.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
5c:06.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
5d:00.0 System peripheral: Intel Corporation DSL5520 Thunderbolt 2 NHI [Falcon Ridge 4C 2013]
a2:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
a3:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
a3:03.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
a3:04.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
a3:05.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
a3:06.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt 2 Bridge [Falcon Ridge 4C 2013]
a4:00.0 System peripheral: Intel Corporation DSL5520 Thunderbolt 2 NHI [Falcon Ridge 4C 2013]
ff:08.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
ff:08.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link Reut 0 (rev 04)
ff:08.4 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link Reut 0 (rev 04)
ff:09.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
ff:09.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
ff:09.4 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link Reut 1 (rev 04)
ff:0a.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
ff:0a.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
ff:0a.2 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
ff:0a.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
ff:0b.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 UBOX Registers (rev 04)
ff:0b.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 UBOX Registers (rev 04)
ff:0c.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0c.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0d.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Unicast Registers (rev 04)
ff:0e.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
ff:0e.1 Performance counters: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
ff:0f.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
ff:0f.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
ff:0f.2 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.4 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.5 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
ff:0f.6 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 DDRIO Registers (rev 04)
ff:10.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
ff:10.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
ff:10.2 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
ff:10.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
ff:10.4 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
ff:10.5 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
ff:10.6 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 2 (rev 04)
ff:10.7 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
ff:11.0 System peripheral: Intel Corporation Device 0eb8 (rev 04)
ff:13.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 R2PCIe (rev 04)
ff:13.1 Performance counters: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 R2PCIe (rev 04)
ff:13.4 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
ff:13.5 Performance counters: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
ff:13.6 Performance counters: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
ff:16.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 System Address Decoder (rev 04)
ff:16.1 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
ff:16.2 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Broadcast Registers (rev 04)

Comment 2 Christian Kellner 2017-03-18 14:25:36 UTC
Created attachment 1264380 [details]
dmesg output

Comment 3 Christian Kellner 2017-03-18 18:15:43 UTC
Upgraded the machine to rawhide, just happened again.
Kernel: Linux pewter.cns.bzm 4.11.0-0.rc2.git3.1.fc27.x86_64 #1 SMP Thu Mar 16 16:08:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


[ 1494.462866] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Down
[ 1494.463271] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0018
[ 1494.463403] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0018(Requester ID)
[ 1494.463422] pcieport 0000:00:03.0:   device [8086:0e08] error status/mask=00004020/00000000
[ 1494.463434] pcieport 0000:00:03.0:    [ 5] Surprise Down Error   
[ 1494.463445] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[ 1494.463462] pcieport 0000:00:03.0: broadcast error_detected message
[ 1494.463474] pcieport 0000:00:03.0: AER: Device recovery failed
[ 1494.463482] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Fatal) error received: id=0018
[ 1494.463514] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0018(Requester ID)
[ 1494.463526] pcieport 0000:00:03.0:   device [8086:0e08] error status/mask=00004020/00000000
[ 1494.463538] pcieport 0000:00:03.0:    [ 5] Surprise Down Error   
[ 1494.463551] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[ 1494.463571] pcieport 0000:00:03.0: broadcast error_detected message
[ 1494.463589] radeon 0000:06:00.0: device has no AER-aware driver
[ 1494.463618] snd_hda_intel 0000:06:00.1: device has no AER-aware driver
[ 1494.473138] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Up
[ 1494.474192] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Up event queued; currently getting powered off
[ 1494.737884] Console: switching to colour dummy device 80x25
[ 1494.740868] ------------[ cut here ]------------
[ 1494.740897] WARNING: CPU: 0 PID: 3 at drivers/gpu/drm/drm_mode_config.c:458 drm_mode_config_cleanup+0x290/0x2b0 [drm]
[ 1494.740899] Modules linked in: binfmt_misc rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep nls_utf8 hfsplus b43 mac80211 joydev cfg80211 intel_rapl ssb mmc_core iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm applesmc input_polldev irqbypass crct10dif_pclmul crc32_pclmul
[ 1494.741268]  ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf btusb btrtl btbcm btintel bluetooth rfkill i2c_i801 lpc_ich mei_me bcma mei snd_hda_codec_cirrus snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm thunderbolt snd_timer snd soundcore ioatdma dca shpchp apple_gmux apple_bl video tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc amdkfd amd_iommu_v2 radeon crc32c_intel i2c_algo_bit drm_kms_helper ttm tg3 drm ptp pps_core uas usb_storage fjes
[ 1494.741486] CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: G        W       4.11.0-0.rc2.git3.1.fc27.x86_64 #1
[ 1494.741491] Hardware name: Apple Inc. MacPro6,1/Mac-F60DEB81FF30ACF6, BIOS MP61.88Z.0116.B21.1610201524 10/20/2016
[ 1494.741501] Workqueue: pciehp-5 pciehp_power_thread
[ 1494.741512] Call Trace:
[ 1494.741523]  dump_stack+0x8e/0xd1
[ 1494.741533]  __warn+0xcb/0xf0
[ 1494.741545]  warn_slowpath_null+0x1d/0x20
[ 1494.741568]  drm_mode_config_cleanup+0x290/0x2b0 [drm]
[ 1494.741613]  radeon_modeset_fini+0x95/0xb0 [radeon]
[ 1494.741642]  radeon_driver_unload_kms+0x43/0x80 [radeon]
[ 1494.741664]  drm_dev_unregister+0x3c/0xe0 [drm]
[ 1494.741686]  drm_put_dev+0x36/0x70 [drm]
[ 1494.741711]  radeon_pci_remove+0x15/0x20 [radeon]
[ 1494.741723]  pci_device_remove+0x39/0xc0
[ 1494.741734]  device_release_driver_internal+0x160/0x210
[ 1494.741745]  device_release_driver+0x12/0x20
[ 1494.741751]  pci_stop_bus_device+0x8f/0xa0
[ 1494.741760]  pci_stop_and_remove_bus_device+0x12/0x20
[ 1494.741768]  pciehp_unconfigure_device+0xad/0x1b0
[ 1494.741781]  pciehp_disable_slot+0x5a/0xe0
[ 1494.741792]  pciehp_power_thread+0x93/0xb0
[ 1494.741804]  process_one_work+0x260/0x750
[ 1494.741809]  ? process_one_work+0x1db/0x750
[ 1494.741830]  worker_thread+0x4e/0x4a0
[ 1494.741845]  ? process_one_work+0x750/0x750
[ 1494.741851]  kthread+0x12c/0x150
[ 1494.741858]  ? kthread_create_on_node+0x60/0x60
[ 1494.741870]  ret_from_fork+0x31/0x40
[ 1494.742015] ---[ end trace b246c51c29aef901 ]---
[ 1494.744425] [drm] radeon: finishing device.
[ 1495.513039] pcieport 0000:00:03.0: Root Port link has been reset
[ 1495.513053] pcieport 0000:00:03.0: AER: Device recovery failed

Comment 4 Christian Kellner 2017-03-18 19:20:22 UTC
Actually there is already a oops a bit earlier:

[    9.075044] ------------[ cut here ]------------
[    9.075050] WARNING: CPU: 6 PID: 693 at drivers/platform/x86/apple-gmux.c:700 gmux_probe+0x634/0x780 [apple_gmux]
[    9.075052] Modules linked in: apple_gmux(+) apple_bl video tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc amdkfd amd_iommu_v2 radeon crc32c_intel i2c_algo_bit drm_kms_helper ttm tg3 drm ptp pps_core uas usb_storage fjes
[    9.075107] CPU: 6 PID: 693 Comm: systemd-udevd Not tainted 4.11.0-0.rc2.git3.1.fc27.x86_64 #1
[    9.075109] Hardware name: Apple Inc. MacPro6,1/Mac-F60DEB81FF30ACF6, BIOS MP61.88Z.0116.B21.1610201524 10/20/2016
[    9.075111] Call Trace:
[    9.075118]  dump_stack+0x8e/0xd1
[    9.075125]  __warn+0xcb/0xf0
[    9.075132]  warn_slowpath_null+0x1d/0x20
[    9.075136]  gmux_probe+0x634/0x780 [apple_gmux]
[    9.075144]  ? gmux_update_status+0xd0/0xd0 [apple_gmux]
[    9.075150]  pnp_device_probe+0x65/0xc0
[    9.075157]  driver_probe_device+0x106/0x450
[    9.075163]  __driver_attach+0xa8/0xf0
[    9.075168]  ? driver_probe_device+0x450/0x450
[    9.075171]  bus_for_each_dev+0x75/0xc0
[    9.075178]  driver_attach+0x1e/0x20
[    9.075181]  bus_add_driver+0x1d3/0x270
[    9.075185]  ? 0xffffffffc01fc000
[    9.075190]  driver_register+0x60/0xe0
[    9.075194]  ? 0xffffffffc01fc000
[    9.075197]  pnp_register_driver+0x20/0x30
[    9.075202]  gmux_pnp_driver_init+0x10/0x1000 [apple_gmux]
[    9.075206]  do_one_initcall+0x50/0x1a0
[    9.075211]  ? rcu_read_lock_sched_held+0x79/0x80
[    9.075215]  ? kmem_cache_alloc_trace+0x273/0x2e0
[    9.075219]  ? do_init_module+0x27/0x1e8
[    9.075227]  do_init_module+0x5f/0x1e8
[    9.075233]  load_module+0x2392/0x29e0
[    9.075236]  ? __symbol_put+0x70/0x70
[    9.075243]  ? show_coresize+0x30/0x30
[    9.075264]  SYSC_init_module+0x193/0x1d0
[    9.075280]  SyS_init_module+0xe/0x10
[    9.075284]  do_syscall_64+0x6c/0x1f0
[    9.075291]  entry_SYSCALL64_slow_path+0x25/0x25
[    9.075294] RIP: 0033:0x7fd4d2e46b8a
[    9.075296] RSP: 002b:00007ffd989db6b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    9.075300] RAX: ffffffffffffffda RBX: 000055f680f12f70 RCX: 00007fd4d2e46b8a
[    9.075302] RDX: 00007fd4d397c9c5 RSI: 0000000000004df3 RDI: 000055f680f0c870
[    9.075304] RBP: 00007fd4d397c9c5 R08: 000055f680f13020 R09: 00007fd4d3100b38
[    9.075307] R10: 00007fd4d3100b00 R11: 0000000000000246 R12: 000055f680f0c870
[    9.075309] R13: 000055f680f354a0 R14: 0000000000020000 R15: 000055f67ff7ffca
[    9.075344] ---[ end trace b246c51c29aef900 ]---

Comment 5 Justin M. Forbes 2017-04-11 14:47:06 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-200.fc25.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 6 Christian Kellner 2017-04-16 14:02:34 UTC
Still happens on rawhide, with kernel 4.11.0-0.rc5.git2.1.fc27.x86_64

[ 7207.898252] DMA-API: debugging out of memory - disabling
[ 9418.381066] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018
[ 9418.381100] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0018(Requester ID)
[ 9418.381105] pcieport 0000:00:03.0:   device [8086:0e08] error status/mask=00004000/00000000
[ 9418.381108] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[ 9418.381114] pcieport 0000:00:03.0: broadcast error_detected message
[ 9418.381121] pcieport 0000:00:03.0: AER: Device recovery failed
[ 9418.401192] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018
[ 9418.401210] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0018(Requester ID)
[ 9418.401219] pcieport 0000:00:03.0:   device [8086:0e08] error status/mask=00004000/00000000
[ 9418.401226] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[ 9418.401237] pcieport 0000:00:03.0: broadcast error_detected message
[ 9418.401245] pcieport 0000:00:03.0: AER: Device recovery failed
[ 9418.413516] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Down
[ 9418.413549] pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: id=0018
[ 9418.413564] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0018(Requester ID)
[ 9418.413573] pcieport 0000:00:03.0:   device [8086:0e08] error status/mask=00004020/00000000
[ 9418.413580] pcieport 0000:00:03.0:    [ 5] Surprise Down Error   
[ 9418.413586] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[ 9418.413597] pcieport 0000:00:03.0: broadcast error_detected message
[ 9418.413610] radeon 0000:06:00.0: device has no AER-aware driver
[ 9418.413630] snd_hda_intel 0000:06:00.1: device has no AER-aware driver
[ 9418.423692] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Up
[ 9418.424037] pciehp 0000:00:03.0:pcie004: Slot(5-3): Link Up event queued; currently getting powered off
[ 9418.649818] Console: switching to colour dummy device 80x25
[ 9418.654131] ------------[ cut here ]------------
[ 9418.654171] WARNING: CPU: 0 PID: 3 at drivers/gpu/drm/drm_mode_config.c:458 drm_mode_config_cleanup+0x290/0x2b0 [drm]
[ 9418.654174] Modules linked in: binfmt_misc rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep nls_utf8 hfsplus b43 mac80211 intel_rapl sb_edac edac_core cfg80211 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ssb mmc_core kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate
[ 9418.654358]  intel_uncore applesmc input_polldev intel_rapl_perf btusb btrtl btbcm btintel bluetooth rfkill snd_hda_codec_cirrus snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm joydev thunderbolt i2c_i801 lpc_ich bcma snd_timer snd mei_me mei soundcore ioatdma apple_gmux shpchp apple_bl dca video tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs xor amdkfd amd_iommu_v2 radeon raid6_pq crc32c_intel i2c_algo_bit drm_kms_helper ttm tg3 drm ptp pps_core uas usb_storage
[ 9418.654529] CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: G        W       4.11.0-0.rc5.git2.1.fc27.x86_64 #1
[ 9418.654532] Hardware name: Apple Inc. MacPro6,1/Mac-F60DEB81FF30ACF6, BIOS MP61.88Z.0116.B25.1702171857 02/17/2017
[ 9418.654540] Workqueue: pciehp-5 pciehp_power_thread
[ 9418.654546] Call Trace:
[ 9418.654554]  dump_stack+0x8e/0xd1
[ 9418.654561]  __warn+0xcb/0xf0
[ 9418.654571]  warn_slowpath_null+0x1d/0x20
[ 9418.654586]  drm_mode_config_cleanup+0x290/0x2b0 [drm]
[ 9418.654621]  radeon_modeset_fini+0x95/0xb0 [radeon]
[ 9418.654643]  radeon_driver_unload_kms+0x43/0x80 [radeon]
[ 9418.654659]  drm_dev_unregister+0x3c/0xe0 [drm]
[ 9418.654676]  drm_put_dev+0x36/0x70 [drm]
[ 9418.654695]  radeon_pci_remove+0x15/0x20 [radeon]
[ 9418.654701]  pci_device_remove+0x39/0xc0
[ 9418.654709]  device_release_driver_internal+0x160/0x210
[ 9418.654717]  device_release_driver+0x12/0x20
[ 9418.654721]  pci_stop_bus_device+0x8f/0xa0
[ 9418.654728]  pci_stop_and_remove_bus_device+0x12/0x20
[ 9418.654734]  pciehp_unconfigure_device+0xad/0x1b0
[ 9418.654744]  pciehp_disable_slot+0x5a/0xe0
[ 9418.654752]  pciehp_power_thread+0x93/0xb0
[ 9418.654761]  process_one_work+0x25e/0x750
[ 9418.654776]  worker_thread+0x4e/0x4a0
[ 9418.654795]  ? process_one_work+0x750/0x750
[ 9418.654803]  kthread+0x12c/0x150
[ 9418.654812]  ? kthread_create_on_node+0x70/0x70
[ 9418.654826]  ret_from_fork+0x31/0x40
[ 9418.654895] ---[ end trace 621e866607429a91 ]---

Comment 7 Laura Abbott 2018-04-06 18:31:39 UTC
Doing some pruning, this bug looks to be several kernel versions old. Please test on a newer kernel and reopen if the problem still exists.


Note You need to log in before you can comment on or make changes to this bug.