Bug 484494

Summary: forcedeth: driver frees DMA memory with wrong function
Product: [Fedora] Fedora Reporter: sangu <sangu.fedora>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: ellenshull, jesse.brandeburg, kernel-maint, madko, mike, riku.seppala, rjones, sdodson, tvujec, uwe, vedran
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-06 09:31:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 487882    
Attachments:
Description Flags
proposed fix for e1000e tx unmap
none
correct e1000e only patch
none
fedora kernel based e1000e patch none

Description sangu 2009-02-07 12:47:16 UTC
Description of problem:
$ dmesg
[skip]
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:448 check_unmap+0x2f7/0x412() (Tainted: P          )
Hardware name: System Product Name
forcedeth 0000:00:0a.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x0000000032fca000] [size=100 bytes] [mapped as page] [unmapped as single]
Modules linked in: vfat fat fuse ipv6 cpufreq_ondemand powernow_k8 tuner_simple tuner_types lgdt330x snd_hda_codec_nvhdmi snd_emu10k1_synth snd_emux_synth snd_hda_codec_realtek snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_rawmidi snd_ac97_codec snd_hda_intel nvidia(P) snd_hda_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq dvb_usb_cxusb dib7000p dibx000_common snd_pcm_oss dvb_usb snd_mixer_oss dvb_core snd_pcm snd_seq_device snd_util_mem dib0070 firewire_ohci firewire_core snd_hwdep snd_timer snd forcedeth crc_itu_t snd_page_alloc pata_amd soundcore asus_atk0110 emu10k1_gp i2c_core gameport wmi hwmon pcspkr serio_raw joydev sata_nv ata_generic pata_acpi [last unloaded: scsi_wait_scan]
Pid: 3470, comm: clock-applet Tainted: P           2.6.29-0.85.rc3.git7.fc11.i686 #1
Call Trace:
 [<c042f695>] warn_slowpath+0x77/0xb3
 [<c044d4e7>] ? trace_hardirqs_off_caller+0x18/0xa3
 [<c044d57d>] ? trace_hardirqs_off+0xb/0xd
 [<c06e33e5>] ? _spin_unlock_irqrestore+0x39/0x50
 [<c060fdef>] ? ohci_urb_enqueue+0x69b/0x6af
 [<c061b280>] ? evdev_event+0xac/0xb8
 [<c05fc8c3>] ? usb_hcd_submit_urb+0x82e/0x920
 [<c044d57d>] ? trace_hardirqs_off+0xb/0xd
 [<c0544208>] ? check_unmap+0x58/0x412
 [<c05444a7>] check_unmap+0x2f7/0x412
 [<c0645421>] ? hid_input_report+0x198/0x1a9
 [<c060e416>] ? finish_urb+0x88/0xb1
 [<c0544763>] debug_dma_unmap_page+0x5a/0x62
 [<f7d7c643>] pci_unmap_page+0x4d/0x57 [forcedeth]
 [<f7d7e3b4>] nv_tx_done_optimized+0x3b/0x195 [forcedeth]
 [<f7d7eec1>] nv_nic_irq_optimized+0xa2/0x22e [forcedeth]
 [<c046f574>] handle_IRQ_event+0x1a/0x4f
 [<c0470627>] handle_edge_irq+0xac/0xed
 [<c047057b>] ? handle_edge_irq+0x0/0xed
 <IRQ>  [<c040412c>] ? common_interrupt+0x2c/0x34
---[ end trace 08821d239397ad94 ]---


Version-Release number of selected component (if applicable):
2.6.29-0.85.rc3.git7.fc11.i686 and 2.6.29-0.93.rc3.git10.fc11.i686.PAE

How reproducible:
always

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Richard W.M. Jones 2009-02-09 15:54:08 UTC
*** Bug 484615 has been marked as a duplicate of this bug. ***

Comment 2 Richard W.M. Jones 2009-02-09 15:55:54 UTC
The report and also bug 484615 are for the forcedeth driver.
The same thing occurs with the e1000e driver.  At boot I
get this message:

------------[ cut here ]------------
WARNING: at lib/dma-debug.c:448 check_unmap+0x2b4/0x3dd() (Not tainted)
Hardware name: To Be Filled By O.E.M.
e1000e 0000:00:19.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x0000000071556000] [size=21 bytes] [mapped as page] [unmapped as single]
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat bridge stp llc bnep sco l2cap bluetooth sunrpc ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm 8139cp i2c_i801 snd_timer i2c_core 8139too snd e1000e pcspkr soundcore mii snd_page_alloc joydev ata_generic pata_acpi [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.29-0.93.rc3.git10.fc11.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104881a>] warn_slowpath+0xb7/0xe7
 [<ffffffff8106923c>] ? graph_unlock+0x6b/0x77
 [<ffffffff8106c8c5>] ? __lock_acquire+0xb67/0xc0d
 [<ffffffff8137ba1f>] ? _spin_lock_irqsave+0x78/0x86
 [<ffffffff811983fb>] ? get_hash_bucket+0x28/0x34
 [<ffffffff8106b5c3>] ? trace_hardirqs_on_caller+0x118/0x153
 [<ffffffff81198a3f>] check_unmap+0x2b4/0x3dd
 [<ffffffff810174cb>] ? native_sched_clock+0x2d/0x5a
 [<ffffffff8137b6d5>] ? _spin_unlock_irqrestore+0x43/0x53
 [<ffffffff81198cb5>] debug_dma_unmap_page+0x50/0x52
 [<ffffffffa0040036>] pci_unmap_page+0x6d/0x76 [e1000e]
 [<ffffffffa0040068>] e1000_put_txbuf+0x29/0x4a [e1000e]
 [<ffffffffa0040185>] e1000_clean_tx_irq+0xc3/0x2bd [e1000e]
 [<ffffffffa00432e5>] ? e1000_clean+0x66/0x241 [e1000e]
 [<ffffffffa00432f1>] e1000_clean+0x72/0x241 [e1000e]
 [<ffffffff812e12ed>] net_rx_action+0xb1/0x1e9
 [<ffffffff812e13dc>] ? net_rx_action+0x1a0/0x1e9
 [<ffffffff8104de50>] __do_softirq+0x8f/0x173
 [<ffffffff810126ac>] call_softirq+0x1c/0x30
 [<ffffffff81013799>] do_softirq+0x4d/0xb4
 [<ffffffff8104da9b>] irq_exit+0x4e/0x8b
 [<ffffffff81013aa8>] do_IRQ+0x127/0x14b
 [<ffffffff81011d93>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff8106a634>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff81017911>] ? mwait_idle+0x6b/0x94
 [<ffffffff81017908>] ? mwait_idle+0x62/0x94
 [<ffffffff8137ea8c>] ? atomic_notifier_call_chain+0xf/0x11
 [<ffffffff810101bb>] ? enter_idle+0x22/0x24
 [<ffffffff81010220>] ? cpu_idle+0x63/0xae
 [<ffffffff813752a1>] ? start_secondary+0x199/0x19e
---[ end trace 55062643c8100105 ]---

Comment 3 Jesse Brandeburg 2009-02-11 02:26:42 UTC
(In reply to comment #2)
> The report and also bug 484615 are for the forcedeth driver.
> The same thing occurs with the e1000e driver.  At boot I
> get this message:

I'll attach a patch for testing.  Well, it was obvious that put_txbuf would sometimes call unmap_page on a buffer that was pci_map_single.

Comment 4 Jesse Brandeburg 2009-02-11 02:31:44 UTC
Created attachment 331508 [details]
proposed fix for e1000e tx unmap

seems like this patch should fix, but it needs to be tested.  I don't have a solid repro of this here yet.

I haven't been able to notice any performance difference with this patch.

code is specific to 2.6.28 and newer.

Comment 5 Jesse Brandeburg 2009-02-11 02:40:23 UTC
Good news everyone! I reproduced the issue.  More tomorrow.

Comment 6 Jesse Brandeburg 2009-02-11 02:47:02 UTC
Created attachment 331509 [details]
correct e1000e only patch

Comment 7 Jesse Brandeburg 2009-02-11 19:29:15 UTC
Created attachment 331613 [details]
fedora kernel based e1000e patch

this is the same patch (also with sentinel descriptor workaround removed)
but against kernel-2.6.29-0.99.rc4.git1.fc11.src.rpm

Comment 8 Edouard Bourguignon 2009-03-19 07:23:02 UTC
does it mean this patch also fix the bug which is about forcedeth? And I have to unplug the power cord at each reboot to have network link.

WARNING: at lib/dma-debug.c:461 check_unmap+0xd4/0x3dd() (Not tainted)
Hardware name: System Product Name
forcedeth 0000:00:08.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000114e6c242] [size=90 bytes]
Modules linked in: ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand powernow_k8 freq_table dm_multipath uinput snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_hda_codec_atihdmi snd_hda_intel arc4 snd_rawmidi ecb snd_hda_codec snd_ac97_codec ppdev rt61pci crc_itu_t ac97_bus snd_seq_dummy rt2x00pci snd_seq_oss rt2x00lib snd_seq_midi_event snd_seq snd_pcm_oss rfkill snd_mixer_oss mac80211 snd_pcm snd_seq_device snd_util_mem radeon snd_hwdep cfg80211 snd_timer drm snd snd_page_alloc forcedeth eeprom_93cx6 soundcore pcspkr i2c_algo_bit k8temp parport_pc parport asus_atk0110 i2c_nforce2 hwmon i2c_core pata_amd ata_generic pata_acpi sata_nv [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.29-0.258.rc8.git2.fc11.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104bae3>] warn_slowpath+0xbc/0xf0
 [<ffffffff81396b58>] ? _spin_lock_irqsave+0x7d/0x8b
 [<ffffffff811a6a5d>] ? get_hash_bucket+0x28/0x34
 [<ffffffff813967df>] ? _spin_unlock_irqrestore+0x41/0x58
 [<ffffffff811a7128>] check_unmap+0xd4/0x3dd
 [<ffffffff811a757e>] debug_dma_unmap_page+0x50/0x52
 [<ffffffffa007e85d>] T.795+0x4b/0x54 [forcedeth]
 [<ffffffffa007ed00>] nv_tx_done_optimized+0x49/0x1d5 [forcedeth]
 [<ffffffffa007f22f>] nv_nic_irq_optimized+0xba/0x280 [forcedeth]
 [<ffffffff81098763>] handle_IRQ_event+0x27/0x63
 [<ffffffff8109a091>] handle_edge_irq+0xe0/0x129
 [<ffffffff81013c04>] do_IRQ+0xd9/0x151
 [<ffffffff81011e93>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff81399b76>] ? __atomic_notifier_call_chain+0x0/0x86
 [<ffffffff8102b4c8>] ? native_safe_halt+0xb/0xd
 [<ffffffff8106fbf9>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81017dd8>] ? default_idle+0x51/0x7c
 [<ffffffff81017f3a>] ? c1e_idle+0x124/0x12b
 [<ffffffff810102c7>] ? cpu_idle+0x68/0xb3
 [<ffffffff8139014e>] ? start_secondary+0x199/0x19e
---[ end trace a9285f26e6a634c1 ]---


00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
	Subsystem: ASUSTeK Computer Inc. Device 8239
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0 (250ns min, 5000ns max)
	Interrupt: pin A routed to IRQ 25
	Region 0: Memory at fe02a000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at b000 [size=8]
	Region 2: Memory at fe029000 (32-bit, non-prefetchable) [size=256]
	Region 3: Memory at fe028000 (32-bit, non-prefetchable) [size=16]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
	Capabilities: [70] MSI-X: Enable- Mask- TabSize=8
		Vector table: BAR=2 offset=00000000
		PBA: BAR=3 offset=00000000
	Capabilities: [50] MSI: Mask+ 64bit+ Count=1/8 Enable+
		Address: 00000000fee0100c  Data: 4189
		Masking: 000000fe  Pending: 00000000
	Capabilities: [6c] HyperTransport: MSI Mapping Enable+ Fixed+
	Kernel driver in use: forcedeth
	Kernel modules: forcedeth

Comment 9 Tomislav Vujec 2009-04-01 14:35:04 UTC
Oops still happens with e1000e on kernel-PAE-2.6.29-21.fc11.i686 whenever using wired network.

Comment 10 Riku Seppala 2009-04-02 14:23:11 UTC
*** Bug 493589 has been marked as a duplicate of this bug. ***

Comment 11 Edouard Bourguignon 2009-04-02 14:52:08 UTC
still having oops with forcedeth AND e1000e with kernel 2.6.29.1-37.rc1

Comment 12 Mike Chambers 2009-04-02 20:44:31 UTC
Have been experiencing the same thing myself for a while now.

Comment 13 Edouard Bourguignon 2009-04-03 06:28:10 UTC
With Intel e1000e or Nvidia forcedeth? Does you card work on reboot? Mine fails to detect the ethernet wire at reboot, have to electrically unplug the computer to clean everything. May be I should open a new bug for this problem?

Comment 14 Mike Chambers 2009-04-03 09:44:41 UTC
Mine is nvidia and it's buit-in to the motherboard.  Mine doesn't crash, or at least I don't see it do anything.  System works mostly with no problems.  Just see the oops now and again in my logs.

Comment 15 Jesse Brandeburg 2009-04-03 15:36:11 UTC
these two upstream commits (patches) should fix the e1000e issues.

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=8ddc951c73cbc317148c0b9973dde81eece57e4c

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=1b7719c4559dc1522065d4cfd033f8bb8f969159

could someone build a test kernel for the reporters to be able to test?

Comment 16 Edouard Bourguignon 2009-04-03 16:41:25 UTC
this seems on the good way to be fixed, no more kernel crash on boot, but forcedeth driver still doesnt permit clean release of the card at shutdown/reboot, preventing access to the ethernet link ("Please check the LAN cable has been connected to Onboard LAN correctly" at boot).

Comment 17 Edouard Bourguignon 2009-04-03 16:42:31 UTC
forgot to mention the kernel version:
2.6.29.1-46.fc11.x86_64

Comment 18 Tomislav Vujec 2009-04-04 15:59:42 UTC
Works with PAE-2.6.29.1-46.fc11.i686.

Comment 19 Mike Chambers 2009-04-04 18:27:56 UTC
I upgraded kernel as well and don't think I got the oops again neither.

kernel-2.6.29.1-46.fc11.x86_64

Comment 20 Scott Dodson 2009-04-05 16:52:54 UTC
The errors went away because the debugging facilities to check for those errors were disabled, not necessarily because the problem was fixed.

* Thu Apr 02 2009 Chuck Ebbert <cebbert> 2.6.29.1-46 
- Enable debug builds and turn of debugging in the regular kernel. 
- Remove dma-debug patches. 
- Leave CONFIG_PCI_MSI_DEFAULT_ON set.

Comment 21 Edouard Bourguignon 2009-04-07 18:03:06 UTC
So problem still here with kernel 2.6.29.1-46.fc11.x86_64. will try 2.6.29.1-52.fc11

Comment 22 Edouard Bourguignon 2009-04-08 09:31:03 UTC
Same problem on 2.6.29.1-52.fc11

Comment 23 Edouard Bourguignon 2009-04-20 18:08:28 UTC
still here on 2.6.29.1-100.fc11.x86_64

Comment 24 Edouard Bourguignon 2009-05-12 07:48:29 UTC
Not sure if it is related but I have open a new bug for my problem about rawhide that break hardware: cf bug #497052
Sorry to have nvidia hardware.

Comment 25 Bug Zapper 2009-06-09 11:06:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 26 Vedran Miletić 2009-09-06 07:14:07 UTC
Edouard, can you test Fedora 12 Snap1?

Comment 27 Edouard Bourguignon 2009-09-06 08:33:04 UTC
It is fixed in F11, will try f12 snap1 asap. Only wifi is still broken on f11

Comment 28 Vedran Miletić 2009-09-06 09:31:38 UTC
Thanks for testing.