Description of problem: Booting kernel-3.8.0-0.rc6.git0.1.fc19.x86_64 reports in order: e100 0000:00:0e.0: DMA-API: device driver failed to check map error followed by BUG: sleeping function called from invalid context at mm/slub.c:925 and then BUG: scheduling while atomic: X/1258/0x10000002 with traces apparently implicating a radeon driver. Detail in an attached dmesg output. After all of this an attempt to start a Gnome desktop session causes a total lockup of a computer (with network connections also going away). OTOH one can log on a desktop if running a sawfish session instead. Version-Release number of selected component (if applicable): kernel-3.8.0-0.rc6.git0.1.fc19.x86_64 How reproducible: It looks consistent. I failed not lock up thing with Gnome session even once.
Created attachment 693121 [details] dmesg output for 3.8.0-0.rc6.git0.1.fc19.x86_64 with multiple bugs
The mm/slub.c:925 BUG is bug 906296
(In reply to comment #2) > The mm/slub.c:925 BUG is bug 906296 Correct. The other issue is an e100 driver bug, so we'll use this bug report for that. For the gnome thing, use the other bug. [ 46.656594] ------------[ cut here ]------------ [ 46.657004] WARNING: at lib/dma-debug.c:933 check_unmap+0x47b/0x950() [ 46.657004] Hardware name: To Be Filled By O.E.M. [ 46.657004] e100 0000:00:0e.0: DMA-API: device driver failed to check map error[device address=0x000000007a4540fa] [size=90 bytes] [mapped as single] [ 46.657004] Modules linked in: [ 46.657004] w83627hf hwmon_vid snd_via82xx ppdev snd_ac97_codec ac97_bus snd_seq snd_pcm snd_mpu401 snd_mpu401_uart ns558 snd_rawmidi gameport parport_pc e100 snd_seq_device parport snd_page_alloc snd_timer snd soundcore skge shpchp k8temp mii edac_core i2c_viapro edac_mce_amd nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc uinput ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm firewire_ohci drm firewire_core pata_via sata_via i2c_core sata_promise crc_itu_t [ 46.657004] Pid: 792, comm: ip Not tainted 3.8.0-0.rc6.git0.1.fc19.x86_64 #1 [ 46.657004] Call Trace: [ 46.657004] <IRQ> [<ffffffff81065ed0>] warn_slowpath_common+0x70/0xa0 [ 46.657004] [<ffffffff81065f4c>] warn_slowpath_fmt+0x4c/0x50 [ 46.657004] [<ffffffff81364cfb>] check_unmap+0x47b/0x950 [ 46.657004] [<ffffffff8136522f>] debug_dma_unmap_page+0x5f/0x70 [ 46.657004] [<ffffffffa030f0f0>] ? e100_tx_clean+0x30/0x210 [e100] [ 46.657004] [<ffffffffa030f1a8>] e100_tx_clean+0xe8/0x210 [e100] [ 46.657004] [<ffffffffa030fc6f>] e100_poll+0x56f/0x6c0 [e100] [ 46.657004] [<ffffffff8159dce1>] ? net_rx_action+0xa1/0x370 [ 46.657004] [<ffffffff8159ddb2>] net_rx_action+0x172/0x370 [ 46.657004] [<ffffffff810703bf>] __do_softirq+0xef/0x3d0 [ 46.657004] [<ffffffff816e4ebc>] call_softirq+0x1c/0x30 [ 46.657004] [<ffffffff8101c485>] do_softirq+0x85/0xc0 [ 46.657004] [<ffffffff81070885>] irq_exit+0xd5/0xe0 [ 46.657004] [<ffffffff816e5756>] do_IRQ+0x56/0xc0 [ 46.657004] [<ffffffff816dacb2>] common_interrupt+0x72/0x72 [ 46.657004] <EOI> [<ffffffff816da1eb>] ? _raw_spin_unlock_irqrestore+0x3b/0x70 [ 46.657004] [<ffffffff816d124d>] __slab_free+0x58/0x38b [ 46.657004] [<ffffffff81214424>] ? fsnotify_clear_marks_by_inode+0x34/0x120 [ 46.657004] [<ffffffff811b0417>] ? kmem_cache_free+0x97/0x320 [ 46.657004] [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff811b0692>] kmem_cache_free+0x312/0x320 [ 46.657004] [<ffffffff8157fc14>] sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff811e8c28>] destroy_inode+0x38/0x60 [ 46.657004] [<ffffffff811e8d5e>] evict+0x10e/0x1a0 [ 46.657004] [<ffffffff811e9605>] iput+0xf5/0x180 [ 46.657004] [<ffffffff811e4338>] dput+0x248/0x310 [ 46.657004] [<ffffffff811ce0e1>] __fput+0x171/0x240 [ 46.657004] [<ffffffff811ce26e>] ____fput+0xe/0x10 [ 46.657004] [<ffffffff8108d54c>] task_work_run+0xac/0xe0 [ 46.657004] [<ffffffff8106c6ed>] do_exit+0x26d/0xc30 [ 46.657004] [<ffffffff8109eccc>] ? finish_task_switch+0x7c/0x120 [ 46.657004] [<ffffffff816dad58>] ? retint_swapgs+0x13/0x1b [ 46.657004] [<ffffffff8106d139>] do_group_exit+0x49/0xc0 [ 46.657004] [<ffffffff8106d1c4>] sys_exit_group+0x14/0x20 [ 46.657004] [<ffffffff816e3b19>] system_call_fastpath+0x16/0x1b [ 46.657004] ---[ end trace 4468c44e2156e7d1 ]--- [ 46.657004] Mapped at: [ 46.657004] [<ffffffff813663d1>] debug_dma_map_page+0x91/0x140 [ 46.657004] [<ffffffffa030e8eb>] e100_xmit_prepare+0x12b/0x1c0 [e100] [ 46.657004] [<ffffffffa030c924>] e100_exec_cb+0x84/0x140 [e100] [ 46.657004] [<ffffffffa030e56a>] e100_xmit_frame+0x3a/0x190 [e100] [ 46.657004] [<ffffffff8159ee89>] dev_hard_start_xmit+0x259/0x6c0
I've emailed the upstream developers about the e100 issue. Should be a simple fix, but we'll see what they say. http://article.gmane.org/gmane.linux.network/257893
*** Bug 929434 has been marked as a duplicate of this bug. ***
Created attachment 730284 [details] [PATCH] e100: Add dma mapping error check e100 uses pci_map_single, but fails to check for a dma mapping error after its use, resulting in a stack trace: [ 46.656594] ------------[ cut here ]------------ [ 46.657004] WARNING: at lib/dma-debug.c:933 check_unmap+0x47b/0x950() [ 46.657004] Hardware name: To Be Filled By O.E.M. [ 46.657004] e100 0000:00:0e.0: DMA-API: device driver failed to check map error[device address=0x000000007a4540fa] [size=90 bytes] [mapped as single] [ 46.657004] Modules linked in: [ 46.657004] w83627hf hwmon_vid snd_via82xx ppdev snd_ac97_codec ac97_bus snd_seq snd_pcm snd_mpu401 snd_mpu401_uart ns558 snd_rawmidi gameport parport_pc e100 snd_seq_device parport snd_page_alloc snd_timer snd soundcore skge shpchp k8temp mii edac_core i2c_viapro edac_mce_amd nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc uinput ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm firewire_ohci drm firewire_core pata_via sata_via i2c_core sata_promise crc_itu_t [ 46.657004] Pid: 792, comm: ip Not tainted 3.8.0-0.rc6.git0.1.fc19.x86_64 #1 [ 46.657004] Call Trace: [ 46.657004] <IRQ> [<ffffffff81065ed0>] warn_slowpath_common+0x70/0xa0 [ 46.657004] [<ffffffff81065f4c>] warn_slowpath_fmt+0x4c/0x50 [ 46.657004] [<ffffffff81364cfb>] check_unmap+0x47b/0x950 [ 46.657004] [<ffffffff8136522f>] debug_dma_unmap_page+0x5f/0x70 [ 46.657004] [<ffffffffa030f0f0>] ? e100_tx_clean+0x30/0x210 [e100] [ 46.657004] [<ffffffffa030f1a8>] e100_tx_clean+0xe8/0x210 [e100] [ 46.657004] [<ffffffffa030fc6f>] e100_poll+0x56f/0x6c0 [e100] [ 46.657004] [<ffffffff8159dce1>] ? net_rx_action+0xa1/0x370 [ 46.657004] [<ffffffff8159ddb2>] net_rx_action+0x172/0x370 [ 46.657004] [<ffffffff810703bf>] __do_softirq+0xef/0x3d0 [ 46.657004] [<ffffffff816e4ebc>] call_softirq+0x1c/0x30 [ 46.657004] [<ffffffff8101c485>] do_softirq+0x85/0xc0 [ 46.657004] [<ffffffff81070885>] irq_exit+0xd5/0xe0 [ 46.657004] [<ffffffff816e5756>] do_IRQ+0x56/0xc0 [ 46.657004] [<ffffffff816dacb2>] common_interrupt+0x72/0x72 [ 46.657004] <EOI> [<ffffffff816da1eb>] ? _raw_spin_unlock_irqrestore+0x3b/0x70 [ 46.657004] [<ffffffff816d124d>] __slab_free+0x58/0x38b [ 46.657004] [<ffffffff81214424>] ? fsnotify_clear_marks_by_inode+0x34/0x120 [ 46.657004] [<ffffffff811b0417>] ? kmem_cache_free+0x97/0x320 [ 46.657004] [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff811b0692>] kmem_cache_free+0x312/0x320 [ 46.657004] [<ffffffff8157fc14>] sock_destroy_inode+0x34/0x40 [ 46.657004] [<ffffffff811e8c28>] destroy_inode+0x38/0x60 [ 46.657004] [<ffffffff811e8d5e>] evict+0x10e/0x1a0 [ 46.657004] [<ffffffff811e9605>] iput+0xf5/0x180 [ 46.657004] [<ffffffff811e4338>] dput+0x248/0x310 [ 46.657004] [<ffffffff811ce0e1>] __fput+0x171/0x240 [ 46.657004] [<ffffffff811ce26e>] ____fput+0xe/0x10 [ 46.657004] [<ffffffff8108d54c>] task_work_run+0xac/0xe0 [ 46.657004] [<ffffffff8106c6ed>] do_exit+0x26d/0xc30 [ 46.657004] [<ffffffff8109eccc>] ? finish_task_switch+0x7c/0x120 [ 46.657004] [<ffffffff816dad58>] ? retint_swapgs+0x13/0x1b [ 46.657004] [<ffffffff8106d139>] do_group_exit+0x49/0xc0 [ 46.657004] [<ffffffff8106d1c4>] sys_exit_group+0x14/0x20 [ 46.657004] [<ffffffff816e3b19>] system_call_fastpath+0x16/0x1b [ 46.657004] ---[ end trace 4468c44e2156e7d1 ]--- [ 46.657004] Mapped at: [ 46.657004] [<ffffffff813663d1>] debug_dma_map_page+0x91/0x140 [ 46.657004] [<ffffffffa030e8eb>] e100_xmit_prepare+0x12b/0x1c0 [e100] [ 46.657004] [<ffffffffa030c924>] e100_exec_cb+0x84/0x140 [e100] [ 46.657004] [<ffffffffa030e56a>] e100_xmit_frame+0x3a/0x190 [e100] [ 46.657004] [<ffffffff8159ee89>] dev_hard_start_xmit+0x259/0x6c0 Easy fix, modify the cb paramter to e100_exec_cb to return an error, and do the dma_mapping_error check in the obvious place This was reported previously here: http://article.gmane.org/gmane.linux.network/257893 But nobody stepped up and fixed it. Signed-off-by: Neil Horman <nhorman> Reported-by: Michal Jaegermann <michal> CC: Josh Boyer <jwboyer> CC: "David S. Miller" <davem> CC: Jeff Kirsher <jeffrey.t.kirsher> CC: e1000-devel.net --- drivers/net/ethernet/intel/e100.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-)
Hey could you test out the above patch please. It should fix your problem. Thanks!
(In reply to comment #7) > Hey could you test out the above patch please. It should fix your problem. > Thanks! Indeed, e100 driver with that patch applied and recompiled for 3.9.0-0.rc4.git0.1.fc20.x86_64 kernel did not show up the error in question in few reboots.
cool, I'll send this upstream shortly then, thanks!
http://marc.info/?l=e1000-devel&m=136491511705277&w=2 Posted upstream
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
(In reply to comment #11) > This bug appears to have been reported against 'rawhide' during the Fedora > 19 development cycle. > Changing version to '19'. This bug is present in the current rawhide kernels (and others too). The patch by Neil Horman (see comment 10 and attachment 730284 [details]) is working AFAICT but it is not applied so far to rawhide kernels.
This was fixed in 3.9-rc7.