Bug 485778
Summary: | dma-debug: tg3: dma free with wrong function | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Evan McNabb <emcnabb> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 11 | CC: | benlu, bruno, gavinflower, jfeeney, kernel-maint, kmcmartin, mcarlson, mgahagan, tmraz |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-06-28 11:17:53 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 487882 |
Description
Evan McNabb
2009-02-16 19:10:47 UTC
It looks like the bug might be in skb_dma_map() / skb_dma_unmap(). Where can I get access to the kernel sources to investigate further? Thanks Matt. I see this on the latest Rawhide build which is based on 2.6.29-rc5. So I'm assuming it might be easiest to pull it down from kernel.org. Can you tell me what tg3 device this bug was generated against? This system has two BCM5715 NICs. I think this is related: http://patchwork.kernel.org/patch/3142/ Source for that kernel is here. It has an earlier version of the DMA debug patches applied: http://kojipkgs.fedoraproject.org/packages/kernel/2.6.29/0.119.rc5.fc11/src/kernel-2.6.29-0.119.rc5.fc11.src.rpm Ah. Thanks. Yes, this is version 2 of the DMA-API debugging patchset. This version is known to have false positives. Can you try with v3? *** Bug 490306 has been marked as a duplicate of this bug. *** This got worse over the weekend. But since there were over 1000 packages updated after the freeze ended I am not sure what is triggering the issue more often. I now get devices dropped in xfce without doing the things that were previously triggering the problem. I am trying an even simpler window manager to see if I can use some sort of graphic desktop while waiting for a fix. Here is another trace from a failed install of the 4-1 rawhide tree. I am not sure if this actually caused the install to fail, looks like the system continued after the warning. It looks like it happened just after networkmanager brought up the interface and just before we tried to nfs mount the repository. <4>------------[ cut here ]------------ <4>WARNING: at lib/dma-debug.c:479 check_unmap+0x2b4/0x3dd() (Not tainted) <4>Hardware name: PowerEdge 860 <4>tg3 0000:05:00.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x000000007f09b000] [size=306 bytes] [mapped as page] [unmapped as single] <4>Modules linked in: ata_generic pata_acpi radeon drm tg3 i2c_algo_bit pata_sil680 i2c_core iscsi_ibft iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext2 ext4 jbd2 crc16 squashfs pcspkr edd nfs lockd nfs_acl auth_rpcgss sunrpc vfat fat cramfs <4>Pid: 0, comm: swapper Not tainted 2.6.29-21.fc11.x86_64 #1 <4>Call Trace: <4> <IRQ> [<ffffffff8104cf6f>] warn_slowpath+0xbc/0xf0 <4> [<ffffffff8106e484>] ? graph_unlock+0x6b/0x77 <4> [<ffffffff81071b08>] ? __lock_acquire+0xb60/0xc06 <4> [<ffffffff81070200>] ? check_usage_backwards+0x8/0x53 <4> [<ffffffff81396308>] ? _spin_lock_irqsave+0x7d/0x8b <4> [<ffffffff811a6cb5>] ? get_hash_bucket+0x28/0x34 <4> [<ffffffff81070632>] ? mark_held_locks+0x68/0x86 <4> [<ffffffff811a7560>] check_unmap+0x2b4/0x3dd <4> [<ffffffff811a77d6>] debug_dma_unmap_page+0x50/0x52 <4> [<ffffffff812f765b>] dma_unmap_single+0x6c/0x75 <4> [<ffffffff812f768d>] dma_unmap_page+0x29/0x45 <4> [<ffffffff812f770e>] skb_dma_unmap+0x65/0x77 <4> [<ffffffffa01b2ead>] tg3_poll+0x12f/0x916 [tg3] <4> [<ffffffff812f8b8e>] ? net_rx_action+0x1a5/0x1ee <4> [<ffffffff812f8a9f>] net_rx_action+0xb6/0x1ee <4> [<ffffffff812f8b8e>] ? net_rx_action+0x1a5/0x1ee <4> [<ffffffff81052777>] __do_softirq+0x94/0x179 <4> [<ffffffff810127ac>] call_softirq+0x1c/0x30 <4> [<ffffffff8101392e>] do_softirq+0x52/0xb9 <4> [<ffffffff8105239a>] irq_exit+0x53/0x90 <4> [<ffffffff81013c47>] do_IRQ+0x12c/0x151 <4> [<ffffffff81011e93>] ret_from_intr+0x0/0x2e <4> <EOI> [<ffffffff81017cca>] ? mwait_idle+0x9e/0xc7 <4> [<ffffffff81017cc1>] ? mwait_idle+0x95/0xc7 <4> [<ffffffff813993de>] ? atomic_notifier_call_chain+0xf/0x11 <4> [<ffffffff8101023c>] ? enter_idle+0x27/0x29 <4> [<ffffffff810102a6>] ? cpu_idle+0x68/0xb3 <4> [<ffffffff81381227>] ? rest_init+0x6b/0x6d <4>---[ end trace c0c71be348be4c86 ]--- Has anyone looked into updating the DMA-API debugging patchset? There was a known bug in v2 of the patchset that could show itself like this. FWIW, the driver is using skb_dma_map and skb_dma_unmap correctly as far as I can tell. We have updated it to v3... With today's updates (some of which were gnome stuff) I can run gnome again without having my disk devices shutdown. So things are a lot more usable for me while waiting for the mptsas driver to get updated. This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Well it has happened to a fully updated version of Fedora 11 final... # uname -a Linux saturn 2.6.30.5-43.fc11.x86_64 #1 SMP Thu Aug 27 21:39:52 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux # up to date Fedora 11 install AMD 810 quad core 64 bit 8 GB DDR3 RAM 5 * 500GB in software RAID-6 configuration ASUS M4A78T-E motherboard ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:549 check_unmap+0x312/0x4fe() (Not tainted) Hardware name: System Product Name ATL1E 0000:02:00.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x000000002006f2d2] [size=90 bytes] [mapped as single] [unmapped as page] Modules linked in: sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath kvm_amd kvm snd_hda_codec_atihdmi snd_hda_codec_via snd_hda_intel snd_hda_codec ata_generic pata_acpi snd_hwdep snd_pcm serio_raw pcspkr snd_timer pata_atiixp i2c_piix4 joydev snd soundcore snd_page_alloc firewire_ohci atl1e firewire_core shpchp wmi floppy asus_atk0110 crc_itu_t hwmon raid456 raid6_pq async_xor async_memcpy async_tx xor radeon drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.30.5-43.fc11.x86_64.debug #1 Call Trace: <IRQ> [<ffffffff81059d97>] warn_slowpath_common+0x95/0xc3 [<ffffffff81059e52>] warn_slowpath_fmt+0x50/0x66 [<ffffffff8124fcf6>] ? get_hash_bucket+0x3b/0x5d [<ffffffff81250a99>] check_unmap+0x312/0x4fe [<ffffffff810197a1>] ? native_sched_clock+0x2d/0x54 [<ffffffff81250f03>] debug_dma_unmap_page+0x66/0x7c [<ffffffffa0175571>] pci_unmap_page.clone.3+0x7a/0x99 [atl1e] [<ffffffffa01766b3>] atl1e_intr+0x32e/0x42a [atl1e] [<ffffffff810b8158>] handle_IRQ_event+0x62/0x13c [<ffffffff810ba5e4>] handle_edge_irq+0xde/0x13c [<ffffffff810858f8>] ? lock_release_holdtime+0x3f/0x147 [<ffffffff81014ea5>] handle_irq+0x9a/0xb9 [<ffffffff814c2991>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff810143f6>] do_IRQ+0x6f/0xee [<ffffffff81012a93>] ret_from_intr+0x0/0x16 [<ffffffff810607f0>] ? __do_softirq+0x6a/0x1d2 [<ffffffff8101328c>] ? call_softirq+0x1c/0x30 [<ffffffff81014bf3>] ? do_softirq+0x5f/0xd7 [<ffffffff81060337>] ? irq_exit+0x66/0xb7 [<ffffffff810283a9>] ? smp_apic_timer_interrupt+0x99/0xbf [<ffffffff81012c93>] ? apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff81032da5>] ? native_safe_halt+0xb/0xd [<ffffffff8101a632>] ? default_idle+0x5b/0x9a [<ffffffff8101a7a1>] ? c1e_idle+0x130/0x14b [<ffffffff81010f4e>] ? cpu_idle+0xbf/0x10a [<ffffffff814baf17>] ? start_secondary+0x211/0x268 ---[ end trace 15e9581402876acc ]--- That stack trace is against the atl1e driver though. This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. FWIW, I think I found the problem. Commit 32aa7ed75b3adaef6040d2cbe745fdd1c899415, entitled "tg3: Cleanup transmit error path" should fix the problem. This bug still exists in the latest versions of RH5.7 and RH6.1, so it is still relevant. |