Bug 485778 - dma-debug: tg3: dma free with wrong function
Summary: dma-debug: tg3: dma free with wrong function
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 490306 (view as bug list)
Depends On:
Blocks: 487882
TreeView+ depends on / blocked
 
Reported: 2009-02-16 19:10 UTC by Evan McNabb
Modified: 2011-06-03 17:55 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2010-06-28 11:17:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Evan McNabb 2009-02-16 19:10:47 UTC
Description of problem:

When booting, I see the following oops:

------------[ cut here ]------------
WARNING: at lib/dma-debug.c:448 check_unmap+0x2b4/0x3dd() (Tainted: G        W )
Hardware name: Anaheim
tg3 0000:08:04.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x00000001239fa000] [size=21 bytes] [mapped as page] [unmapped as single]
Modules linked in: sco bridge stp llc bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath uinput mptsas mptscsih mptbase tg3 pcspkr scsi_transport_sas i2c_piix4 tpm_infineon tpm joydev tpm_bios i2c_core pata_serverworks shpchp ata_generic pata_acpi sata_svw ext4 jbd2 crc16 [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Tainted: G        W  2.6.29-0.119.rc5.fc11.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810488f6>] warn_slowpath+0xb7/0xe7
 [<ffffffff8137bd9f>] ? _spin_lock_irqsave+0x78/0x86
 [<ffffffff811985eb>] ? get_hash_bucket+0x28/0x34
 [<ffffffff8106b5da>] ? trace_hardirqs_on_caller+0x1f/0x153
 [<ffffffff81198c2f>] check_unmap+0x2b4/0x3dd
 [<ffffffff8137ba4e>] ? _spin_unlock_irqrestore+0x3c/0x53
 [<ffffffff81198ea5>] debug_dma_unmap_page+0x50/0x52
 [<ffffffff812e03cd>] dma_unmap_page+0x67/0x70
 [<ffffffff812e0432>] skb_dma_unmap+0x5c/0x75
 [<ffffffffa00cac82>] tg3_poll+0x12a/0x919 [tg3]
 [<ffffffff812e1720>] ? net_rx_action+0x1a0/0x1e9
 [<ffffffff812e1631>] net_rx_action+0xb1/0x1e9
 [<ffffffff812e1720>] ? net_rx_action+0x1a0/0x1e9
 [<ffffffff8104de7c>] __do_softirq+0x8f/0x173
 [<ffffffff810126ac>] call_softirq+0x1c/0x30
 [<ffffffff81013799>] do_softirq+0x4d/0xb4
 [<ffffffff8104dac7>] irq_exit+0x4e/0x8b
 [<ffffffff81013aa8>] do_IRQ+0x127/0x14b
 [<ffffffff81011d93>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff8137ed77>] ? __atomic_notifier_call_chain+0x0/0x86
 [<ffffffff81017a5a>] ? default_idle+0x47/0x77
 [<ffffffff8106b71b>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8102927a>] ? native_safe_halt+0x6/0x8
 [<ffffffff8106b71b>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81017a5f>] ? default_idle+0x4c/0x77
 [<ffffffff81017bb7>] ? c1e_idle+0x11f/0x126
 [<ffffffff81010240>] ? cpu_idle+0x63/0xae
 [<ffffffff8136719d>] ? rest_init+0x61/0x63
---[ end trace a751a6f228924727 ]---

I see several other related BZ's for e1000e / forcedeth (BZ 484494) and sky2 (BZ 484787). However, none of them cover tg3 as far as I can see. If this should be covered in one of those feel free to close this as a dup.

Version-Release number of selected component (if applicable):
kernel-2.6.29-0.119.rc5.fc11.x86_64

How reproducible:
Every time.

Steps to Reproduce:
1. Boot system
2. 
3.

Comment 1 Matt Carlson 2009-02-18 19:26:23 UTC
It looks like the bug might be in skb_dma_map() / skb_dma_unmap().  Where can I get access to the kernel sources to investigate further?

Comment 2 Evan McNabb 2009-02-18 19:43:58 UTC
Thanks Matt. I see this on the latest Rawhide build which is based on 2.6.29-rc5. So I'm assuming it might be easiest to pull it down from kernel.org.

Comment 3 Matt Carlson 2009-02-18 20:41:39 UTC
Can you tell me what tg3 device this bug was generated against?

Comment 4 Evan McNabb 2009-02-18 21:11:09 UTC
This system has two BCM5715 NICs.

I think this is related:
http://patchwork.kernel.org/patch/3142/

Comment 5 Chuck Ebbert 2009-02-18 22:35:53 UTC
Source for that kernel is here. It has an earlier version of the DMA debug patches applied:

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.29/0.119.rc5.fc11/src/kernel-2.6.29-0.119.rc5.fc11.src.rpm

Comment 6 Matt Carlson 2009-02-18 23:07:04 UTC
Ah.  Thanks.  Yes, this is version 2 of the DMA-API debugging patchset.  This version is known to have false positives.  Can you try with v3?

Comment 7 Dave Jones 2009-03-26 18:35:31 UTC
*** Bug 490306 has been marked as a duplicate of this bug. ***

Comment 8 Bruno Wolff III 2009-03-30 20:31:33 UTC
This got worse over the weekend. But since there were over 1000 packages updated after the freeze ended I am not sure what is triggering the issue more often.
I now get devices dropped in xfce without doing the things that were previously triggering the problem.
I am trying an even simpler window manager to see if I can use some sort of graphic desktop while waiting for a fix.

Comment 9 Mike Gahagan 2009-04-01 21:49:19 UTC
Here is another trace from a failed install of the 4-1 rawhide tree. I am not sure if this actually caused the install to fail, looks like the system continued after the warning. It looks like it happened just after networkmanager brought up the interface and just before we tried to nfs mount the repository.

<4>------------[ cut here ]------------
<4>WARNING: at lib/dma-debug.c:479 check_unmap+0x2b4/0x3dd() (Not tainted)
<4>Hardware name: PowerEdge 860
<4>tg3 0000:05:00.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x000000007f09b000] [size=306 bytes] [mapped as page] [unmapped as single]
<4>Modules linked in: ata_generic pata_acpi radeon drm tg3 i2c_algo_bit pata_sil680 i2c_core iscsi_ibft iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext2 ext4 jbd2 crc16 squashfs pcspkr edd nfs lockd nfs_acl auth_rpcgss sunrpc vfat fat cramfs
<4>Pid: 0, comm: swapper Not tainted 2.6.29-21.fc11.x86_64 #1
<4>Call Trace:
<4> <IRQ>  [<ffffffff8104cf6f>] warn_slowpath+0xbc/0xf0
<4> [<ffffffff8106e484>] ? graph_unlock+0x6b/0x77
<4> [<ffffffff81071b08>] ? __lock_acquire+0xb60/0xc06
<4> [<ffffffff81070200>] ? check_usage_backwards+0x8/0x53
<4> [<ffffffff81396308>] ? _spin_lock_irqsave+0x7d/0x8b
<4> [<ffffffff811a6cb5>] ? get_hash_bucket+0x28/0x34
<4> [<ffffffff81070632>] ? mark_held_locks+0x68/0x86
<4> [<ffffffff811a7560>] check_unmap+0x2b4/0x3dd
<4> [<ffffffff811a77d6>] debug_dma_unmap_page+0x50/0x52
<4> [<ffffffff812f765b>] dma_unmap_single+0x6c/0x75
<4> [<ffffffff812f768d>] dma_unmap_page+0x29/0x45
<4> [<ffffffff812f770e>] skb_dma_unmap+0x65/0x77
<4> [<ffffffffa01b2ead>] tg3_poll+0x12f/0x916 [tg3]
<4> [<ffffffff812f8b8e>] ? net_rx_action+0x1a5/0x1ee
<4> [<ffffffff812f8a9f>] net_rx_action+0xb6/0x1ee
<4> [<ffffffff812f8b8e>] ? net_rx_action+0x1a5/0x1ee
<4> [<ffffffff81052777>] __do_softirq+0x94/0x179
<4> [<ffffffff810127ac>] call_softirq+0x1c/0x30
<4> [<ffffffff8101392e>] do_softirq+0x52/0xb9
<4> [<ffffffff8105239a>] irq_exit+0x53/0x90
<4> [<ffffffff81013c47>] do_IRQ+0x12c/0x151
<4> [<ffffffff81011e93>] ret_from_intr+0x0/0x2e
<4> <EOI>  [<ffffffff81017cca>] ? mwait_idle+0x9e/0xc7
<4> [<ffffffff81017cc1>] ? mwait_idle+0x95/0xc7
<4> [<ffffffff813993de>] ? atomic_notifier_call_chain+0xf/0x11
<4> [<ffffffff8101023c>] ? enter_idle+0x27/0x29
<4> [<ffffffff810102a6>] ? cpu_idle+0x68/0xb3
<4> [<ffffffff81381227>] ? rest_init+0x6b/0x6d
<4>---[ end trace c0c71be348be4c86 ]---

Comment 11 Matt Carlson 2009-04-01 22:02:26 UTC
Has anyone looked into updating the DMA-API debugging patchset?   There was a known bug in v2 of the patchset that could show itself like this.  FWIW, the driver is using skb_dma_map and skb_dma_unmap correctly as far as I can tell.

Comment 12 Kyle McMartin 2009-04-01 23:14:39 UTC
We have updated it to v3...

Comment 13 Bruno Wolff III 2009-04-10 17:38:18 UTC
With today's updates (some of which were gnome stuff) I can run gnome again without having my disk devices shutdown. So things are a lot more usable for me while waiting for the mptsas driver to get updated.

Comment 14 Bug Zapper 2009-06-09 11:25:27 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Nivag 2009-09-10 05:16:00 UTC
Well it has happened to a fully updated version of Fedora 11 final...

# uname -a
Linux saturn 2.6.30.5-43.fc11.x86_64 #1 SMP Thu Aug 27 21:39:52 EDT 2009 x86_64
x86_64 x86_64 GNU/Linux
# 

up to date Fedora 11 install
AMD 810 quad core 64 bit
8 GB DDR3 RAM
5 * 500GB in software RAID-6 configuration
ASUS M4A78T-E motherboard  


------------[ cut here ]------------
WARNING: at lib/dma-debug.c:549 check_unmap+0x312/0x4fe() (Not tainted)
Hardware name: System Product Name
ATL1E 0000:02:00.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x000000002006f2d2] [size=90 bytes] [mapped as single] [unmapped as page]
Modules linked in: sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath kvm_amd kvm snd_hda_codec_atihdmi snd_hda_codec_via snd_hda_intel snd_hda_codec ata_generic pata_acpi snd_hwdep snd_pcm serio_raw pcspkr snd_timer pata_atiixp i2c_piix4 joydev snd soundcore snd_page_alloc firewire_ohci atl1e firewire_core shpchp wmi floppy asus_atk0110 crc_itu_t hwmon raid456 raid6_pq async_xor async_memcpy async_tx xor radeon drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.30.5-43.fc11.x86_64.debug #1
Call Trace:
 <IRQ>  [<ffffffff81059d97>] warn_slowpath_common+0x95/0xc3
 [<ffffffff81059e52>] warn_slowpath_fmt+0x50/0x66
 [<ffffffff8124fcf6>] ? get_hash_bucket+0x3b/0x5d
 [<ffffffff81250a99>] check_unmap+0x312/0x4fe
 [<ffffffff810197a1>] ? native_sched_clock+0x2d/0x54
 [<ffffffff81250f03>] debug_dma_unmap_page+0x66/0x7c
 [<ffffffffa0175571>] pci_unmap_page.clone.3+0x7a/0x99 [atl1e]
 [<ffffffffa01766b3>] atl1e_intr+0x32e/0x42a [atl1e]
 [<ffffffff810b8158>] handle_IRQ_event+0x62/0x13c
 [<ffffffff810ba5e4>] handle_edge_irq+0xde/0x13c
 [<ffffffff810858f8>] ? lock_release_holdtime+0x3f/0x147
 [<ffffffff81014ea5>] handle_irq+0x9a/0xb9
 [<ffffffff814c2991>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff810143f6>] do_IRQ+0x6f/0xee
 [<ffffffff81012a93>] ret_from_intr+0x0/0x16
 [<ffffffff810607f0>] ? __do_softirq+0x6a/0x1d2
 [<ffffffff8101328c>] ? call_softirq+0x1c/0x30
 [<ffffffff81014bf3>] ? do_softirq+0x5f/0xd7
 [<ffffffff81060337>] ? irq_exit+0x66/0xb7
 [<ffffffff810283a9>] ? smp_apic_timer_interrupt+0x99/0xbf
 [<ffffffff81012c93>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81032da5>] ? native_safe_halt+0xb/0xd
 [<ffffffff8101a632>] ? default_idle+0x5b/0x9a
 [<ffffffff8101a7a1>] ? c1e_idle+0x130/0x14b
 [<ffffffff81010f4e>] ? cpu_idle+0xbf/0x10a
 [<ffffffff814baf17>] ? start_secondary+0x211/0x268
---[ end trace 15e9581402876acc ]---

Comment 16 Matt Carlson 2009-09-15 22:54:07 UTC
That stack trace is against the atl1e driver though.

Comment 17 Bug Zapper 2010-04-27 12:59:23 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Bug Zapper 2010-06-28 11:17:53 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 19 Matt Carlson 2011-06-03 17:54:27 UTC
FWIW, I think I found the problem.  Commit 32aa7ed75b3adaef6040d2cbe745fdd1c899415, entitled "tg3: Cleanup transmit error path" should fix the problem.

This bug still exists in the latest versions of RH5.7 and RH6.1, so it is still relevant.


Note You need to log in before you can comment on or make changes to this bug.