Bug 490306

Summary: Tracebacks at lib/dma-debug.c:479
Product: [Fedora] Fedora Reporter: Bruno Wolff III <bruno>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: kernel-maint, madko, quintela, sdodson
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-26 18:35:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci output
none
Dmesg log
none
My smolt profile none

Description Bruno Wolff III 2009-03-15 03:22:42 UTC
Description of problem:
I have been getting tracebacks during boots for a while now on my machine with a sas host controller.

Version-Release number of selected component (if applicable):
2.6.29-0.247.rc8.fc11.x86_64

How reproducible:
100%

Steps to Reproduce:
1. reboot
2.
3.
  
Actual results:
Mar 13 15:41:36 cerberus kernel: ------------[ cut here ]------------
Mar 13 15:41:36 cerberus kernel: WARNING: at lib/dma-debug.c:479 check_unmap+0x2b4/0x3dd() (Tainted: G        W )
Mar 13 15:41:36 cerberus kernel: Hardware name: Precision WorkStation 690    
Mar 13 15:41:36 cerberus kernel: tg3 0000:0b:00.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x0000000069db4000] [size=32 byte
s] [mapped as page] [unmapped as single]
Mar 13 15:41:36 cerberus kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss ppdev snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss dcdbas snd_mixer_oss serio_raw pcspkr i2c_i801 snd_pcm parport_pc snd_timer sndparport soundcore i5k_amb iTCO_wdt hwmon iTCO_vendor_support tg3 snd_page_alloc i5000_edac edac_core shpchp mptsas mptscsih mptbase scsi_transport_sas raid1 sha256_generic cbc aes_x86_64 aes_generic dm_crypt radeon drm i2c_algo_bit i2c_core [last unloaded: microcode]
Mar 13 15:41:36 cerberus kernel: Pid: 0, comm: swapper Tainted: G        W  2.6.29-0.247.rc8.fc11.x86_64 #1
Mar 13 15:41:36 cerberus kernel: Call Trace:
Mar 13 15:41:36 cerberus kernel: <IRQ>  [<ffffffff8104bae3>] warn_slowpath+0xbc/0xf0
Mar 13 15:41:36 cerberus kernel: [<ffffffffa00e88cc>] ? r1bio_pool_free+0xe/0x10 [raid1]
Mar 13 15:41:36 cerberus kernel: [<ffffffff81396a98>] ? _spin_lock_irqsave+0x7d/0x8b
Mar 13 15:41:36 cerberus kernel: [<ffffffff811a6a1d>] ? get_hash_bucket+0x28/0x34
Mar 13 15:41:36 cerberus kernel: [<ffffffff811a72c8>] check_unmap+0x2b4/0x3dd
Mar 13 15:41:36 cerberus kernel: [<ffffffff811a753e>] debug_dma_unmap_page+0x50/0x52
Mar 13 15:41:36 cerberus kernel: [<ffffffff812f76af>] dma_unmap_single+0x6c/0x75
Mar 13 15:41:36 cerberus kernel: [<ffffffff812f76e1>] dma_unmap_page+0x29/0x45
Mar 13 15:41:36 cerberus kernel: [<ffffffff812f775e>] skb_dma_unmap+0x61/0x7a
Mar 13 15:41:36 cerberus kernel: [<ffffffffa015cf30>] tg3_poll+0x12f/0x91e [tg3]
Mar 13 15:41:36 cerberus kernel: [<ffffffff812f8c1c>] ? net_rx_action+0x1a5/0x1ee
Mar 13 15:41:36 cerberus kernel: [<ffffffff812f8b2d>] net_rx_action+0xb6/0x1ee
Mar 13 15:41:36 cerberus kernel: [<ffffffff812f8c1c>] ? net_rx_action+0x1a5/0x1ee
Mar 13 15:41:36 cerberus kernel: [<ffffffff81051353>] __do_softirq+0x94/0x179
Mar 13 15:41:36 cerberus kernel: [<ffffffff810127ac>] call_softirq+0x1c/0x30
Mar 13 15:41:36 cerberus kernel: [<ffffffff8101393e>] do_softirq+0x52/0xb9
Mar 13 15:41:36 cerberus kernel: [<ffffffff81050f76>] irq_exit+0x53/0x90
Mar 13 15:41:36 cerberus kernel: [<ffffffff81013c57>] do_IRQ+0x12c/0x151
Mar 13 15:41:36 cerberus kernel: [<ffffffff81011e93>] ret_from_intr+0x0/0x2e
Mar 13 15:41:36 cerberus kernel: <EOI>  [<ffffffff81017ca5>] mwait_idle+0x99/0xcb
Mar 13 15:41:36 cerberus kernel: [<ffffffff8106fbf9>] ? trace_hardirqs_on+0xd/0xf
Mar 13 15:41:36 cerberus kernel: [<ffffffff81017cae>] ? mwait_idle+0xa2/0xcb
Mar 13 15:41:36 cerberus kernel: [<ffffffff81017ca5>] ? mwait_idle+0x99/0xcb
Mar 13 15:41:36 cerberus kernel: [<ffffffff81399b4b>] ? atomic_notifier_call_chain+0xf/0x11
Mar 13 15:41:36 cerberus kernel: [<ffffffff8101025d>] ? enter_idle+0x27/0x29
Mar 13 15:41:36 cerberus kernel: [<ffffffff810102c7>] ? cpu_idle+0x68/0xb3
Mar 13 15:41:36 cerberus kernel: [<ffffffff81390090>] ? start_secondary+0x199/0x19e
Mar 13 15:41:36 cerberus kernel: ---[ end trace 4eaa2a86a8e2da29 ]---


Expected results:
No tracebacks

Additional info:

Comment 1 Bruno Wolff III 2009-03-15 03:25:36 UTC
Created attachment 335241 [details]
lspci output

Comment 2 Bruno Wolff III 2009-03-17 05:28:40 UTC
Possibly related to this is a problem I am only seeing on the machine with the kernel tracebacks and not on my machine at home that has pretty much the same packages on it (but i586 instead of x86_64).
Logging in under gnome or running desktop applications that bring of file lists (such as saving a downloaded file in firefox) under xfce breaks the system.
Sometimes X is totally locked up, other times I can get to a VT, but login attempts fail between entering the login name and getting a password prompt.
Also in that case there are some log messages about dm-0 being read only and problems with directory entries and inodes. Nothing gets left in the logs after forcably rebooting the system.

Comment 3 Bruno Wolff III 2009-03-20 20:02:57 UTC
I retested the gnome issue with the hope of getting some sysrq info.
I didn't work like I thought, but by waiting for my time delayed sysrq commands to run eventually gnome died and I got access to a VT session I had had running.
I couldn't save anything to any of the file systems. A lot of commands didn't work, dying with bus errors or segfaults.
I noticed that none of the /dev/sd devices were listed any more. Though stuff from device mapper was.
I tried putting in an encrypted usb stick and that resulted in /dev/sdc and /dev/sdc1 being created as normal, but I couldn't successfully run cryptsetup.

There could also be a relation to a problem I started seeing a while back with smartctl. Using it on a device without forcing it to treat the drives as scsi instead of sas would knock out the device similar to what I saw with the gnome issue. However since I am using raid 1, I could still keep doing stuff. Though a reboot was needed to fix things.

So while something changed relatively recently with how this problem gets triggered, I suspect the underlying issue has been there for a while.

Also I saw that an issue with mptsas in 2.6.29 was mentioned on lmkl, though I think that was taken care of in rc3.

Comment 4 Bruno Wolff III 2009-03-21 14:47:28 UTC
The change in triggering may be related to Devicekit. Bug 489397 was closed with a fix that seems to be tied to dm devices and matches up with when I was having lockouts occur.
I'll be testing this on Monday. If gnome session startup no longer locks up my machine the severity of the underlying issue for me will be greatly reduced.

Comment 5 Scott Dodson 2009-03-21 16:41:59 UTC
I got the same thing during installation but it's not happening post install.

Smolt profile:
http://www.smolts.org/client/show/pub_d0cee8ac-9106-4efc-8460-8b2d43a9b780

This seems very similar to Bug 487894, not sure if it's a dupe or not.

------------[ cut here ]------------
WARNING: at lib/dma-debug.c:479 check_unmap+0x2b4/0x3dd() (Not tainted)
Hardware name: MS-7125
sky2 0000:03:00.0: DMA-API: device driver frees DMA memory with wrong function [device address=0x00000
0007aca9000] [size=209 bytes] [mapped as page] [unmapped as single]
Modules linked in: ata_generic nouveau drm i2c_core sky2 sata_sil firewire_ohci firewire_core crc_itu_
t forcedeth pata_acpi sata_nv pata_amd iscsi_ibft iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
 ext2 ext4 jbd2 crc16 squashfs pcspkr edd floppy nfs lockd nfs_acl auth_rpcgss sunrpc vfat fat cramfs
Pid: 0, comm: swapper Not tainted 2.6.29-0.258.rc8.git2.fc11.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104bae3>] warn_slowpath+0xbc/0xf0
 [<ffffffff8106d828>] ? graph_unlock+0x6b/0x77
 [<ffffffff81070eb3>] ? __lock_acquire+0xb67/0xc0d
 [<ffffffff811a6a5d>] ? get_hash_bucket+0x28/0x34
 [<ffffffff81396b58>] ? _spin_lock_irqsave+0x7d/0x8b
 [<ffffffff811a6a5d>] ? get_hash_bucket+0x28/0x34
 [<ffffffff8106f9d6>] ? mark_held_locks+0x68/0x86
 [<ffffffff811a7308>] check_unmap+0x2b4/0x3dd
 [<ffffffff810db081>] ? kmem_cache_free+0xd3/0xed
 [<ffffffff811a757e>] debug_dma_unmap_page+0x50/0x52
 [<ffffffffa01ea919>] dma_unmap_single+0x6c/0x75 [sky2]
 [<ffffffffa01ea95c>] pci_unmap_page+0x3a/0x57 [sky2]
 [<ffffffffa01eb2c5>] sky2_tx_complete+0xb7/0x153 [sky2]
 [<ffffffffa01eb5a9>] sky2_tx_done+0x2d/0x3a [sky2]
 [<ffffffffa01ed2b0>] sky2_poll+0x96f/0xabd [sky2]
 [<ffffffff812f8bad>] net_rx_action+0xb6/0x1ee
 [<ffffffff812f8c9c>] ? net_rx_action+0x1a5/0x1ee
 [<ffffffff81051353>] __do_softirq+0x94/0x179
 [<ffffffff810127ac>] call_softirq+0x1c/0x30
 [<ffffffff8101393e>] do_softirq+0x52/0xb9
 [<ffffffff81050f76>] irq_exit+0x53/0x90
 [<ffffffff81013c57>] do_IRQ+0x12c/0x151
 [<ffffffff81011e93>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff8102b4c8>] ? native_safe_halt+0xb/0xd
 [<ffffffff8106fbf9>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81017dd8>] ? default_idle+0x51/0x7c
 [<ffffffff81399c0b>] ? atomic_notifier_call_chain+0xf/0x11
 [<ffffffff8101025d>] ? enter_idle+0x27/0x29
 [<ffffffff810102c7>] ? cpu_idle+0x68/0xb3
 [<ffffffff813819e7>] ? rest_init+0x6b/0x6d
---[ end trace 5f893f17106c5153 ]---

Comment 6 Bruno Wolff III 2009-03-22 14:18:28 UTC
The message I remembered was actually on the patch list and if it was applied, would have been after rc3 since the complaint had it occuring in 2.6.29-0.66.rc3.fc11.x86_64.
http://patchwork.kernel.org/patch/5960/

Comment 7 Bruno Wolff III 2009-03-23 15:58:35 UTC
The DeviceKit-disks update didn't help with gnome triggering my disk devices getting removed. I still can't usefully use gnome.

Comment 8 Bruno Wolff III 2009-03-23 20:03:49 UTC
I have some dmesg output from after running smartctl -a -d sat /dev/sdb:
mptscsih: ioc0: attempting task abort! (sc=ffff88005cc08780)
sd 8:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptsas: ioc0: removing sata device, channel 0, id 1, phy 1
 port-8:1: mptsas: ioc0: delete port (1)
sd 8:0:1:0: [sdb] Synchronizing SCSI cache
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88005cc08780)
mptscsih: ioc0: attempting task abort! (sc=ffff88005cc083c0)
sd 8:0:1:0: [sdb] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88005cc083c0)
mptscsih: ioc0: attempting target reset! (sc=ffff88005cc08780)
sd 8:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
mptscsih: ioc0: target reset: SUCCESS (sc=ffff88005cc08780)
mptscsih: ioc0: attempting bus reset! (sc=ffff88005cc08780)
sd 8:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
mptscsih: ioc0: bus reset: SUCCESS (sc=ffff88005cc08780)
mptscsih: ioc0: attempting host reset! (sc=ffff88005cc08780)
mptbase: ioc0: Initiating recovery
mptscsih: ioc0: host reset: SUCCESS (sc=ffff88005cc08780)
sd 8:0:1:0: Device offlined - not ready after error recovery
sd 8:0:1:0: Device offlined - not ready after error recovery
end_request: I/O error, dev sdb, sector 168312775
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdb3, disabling device.
raid1: Operation continuing on 1 devices.
raid1: sdb3: rescheduling sector 58984048
program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
sd 8:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
scsi 8:0:1:0: rejecting I/O to dead device
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda3
 disk 1, wo:1, o:0, dev:sdb3
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda3
raid1: sda3: redirecting sector 58984048 to another mirror
scsi 8:0:1:0: rejecting I/O to dead device
scsi 8:0:1:0: rejecting I/O to dead device
end_request: I/O error, dev sdb, sector 299001472
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdb6, disabling device.
raid1: Operation continuing on 1 devices.
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda6
 disk 1, wo:1, o:0, dev:sdb6
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda6

Comment 9 Michal Nowak 2009-03-25 12:11:28 UTC
Created attachment 336628 [details]
Dmesg log

Comment 10 Michal Nowak 2009-03-25 12:12:42 UTC
Created attachment 336629 [details]
My smolt profile

Linux dhcp-lab-124.englab.brq.redhat.com 2.6.29-0.258.rc8.git2.fc11.x86_64 #1 SMP Mon Mar 16 20:53:26 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Comment 11 Dave Jones 2009-03-26 18:35:31 UTC

*** This bug has been marked as a duplicate of bug 485778 ***