Bug 751797 - [abrt] kernel: WARNING: at lib/dma-debug.c:965 check_sync+0x2a8/0x530()
Summary: [abrt] kernel: WARNING: at lib/dma-debug.c:965 check_sync+0x2a8/0x530()
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: FreeBSD
unspecified
unspecified
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:303f4a13d7372bf412012e4ef1f...
: 751005 752113 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-07 15:35 UTC by Clyde E. Kunkel
Modified: 2012-01-20 14:49 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-12-13 16:30:45 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
patch to log debug actions in dma-debug library (4.68 KB, patch)
2011-11-10 21:12 UTC, Neil Horman
no flags Details | Diff
requested /var/log/messages after kernel error (344.29 KB, application/x-gzip)
2011-12-10 01:47 UTC, Clyde E. Kunkel
no flags Details

Description Clyde E. Kunkel 2011-11-07 15:35:17 UTC
libreport version: 2.0.6
abrt_version:   2.0.4.981
cmdline:        BOOT_IMAGE=/vmlinuz-3.2.0-0.rc0.git6.0.fc17.x86_64 root=/dev/mapper/VolGroup00-rawhide ro rd.dm=0 rd.md.uuid=b9438b55:1d815c8b:bfe78010:bc810f04 rd.lvm.lv=VolGroup00/rawhide KEYTABLE=us SYSFONT=latarcyrheb-sun16 rd.luks=0 LANG=en_US.UTF-8 rd.shell=1 debug log_buf_len=2M
comment:        Booted system, autologon with 10 sec delay to gnome shell desktop.  Notification appeared of kerneloops.
kernel:         undefined
reason:         WARNING: at lib/dma-debug.c:965 check_sync+0x2a8/0x530()
time:           Mon Nov  7 11:34:54 2011

backtrace:
:WARNING: at lib/dma-debug.c:965 check_sync+0x2a8/0x530()
:Hardware name: P5K-E
:sky2 0000:02:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000105258040] [size=60 bytes]
:Modules linked in: fuse lp parport ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle tun bridge lockd stp llc ip6t_REJECT nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 ip6table_filter xt_state nf_conntrack ip6_tables raid1 uvcvideo videodev snd_usb_audio snd_hda_codec_hdmi media v4l2_compat_ioctl32 snd_usbmidi_lib joydev snd_rawmidi snd_seq_device snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd microcode i2c_i801 iTCO_wdt iTCO_vendor_support serio_raw sky2 asus_atk0110 soundcore snd_page_alloc configfs virtio_net kvm_intel kvm uinput sunrpc raid10 btrfs zlib_deflate libcrc32c ata_generic pata_acpi firewire_ohci firewire_core crc_itu_t pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
:Pid: 2520, comm: boinc Not tainted 3.2.0-0.rc0.git6.0.fc17.x86_64 #1
:Call Trace:
: <IRQ>  [<ffffffff8107ce9f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8107cf96>] warn_slowpath_fmt+0x46/0x50
: [<ffffffff81325658>] check_sync+0x2a8/0x530
: [<ffffffff81311c8e>] ? random32+0x2e/0x40
: [<ffffffff81325b62>] debug_dma_sync_single_for_cpu+0x42/0x50
: [<ffffffff81192cac>] ? ksize+0x1c/0xc0
: [<ffffffff813217cc>] ? is_swiotlb_buffer+0x3c/0x50
: [<ffffffff81321fe8>] ? swiotlb_sync_single+0x38/0x80
: [<ffffffff8132212c>] ? swiotlb_sync_single_for_cpu+0xc/0x10
: [<ffffffffa0331873>] sky2_poll+0x573/0xd90 [sky2]
: [<ffffffff815454e1>] ? net_rx_action+0xa1/0x460
: [<ffffffff815455a9>] net_rx_action+0x169/0x460
: [<ffffffff81020c89>] ? sched_clock+0x9/0x10
: [<ffffffff810ab9b5>] ? sched_clock_local+0x25/0x90
: [<ffffffff810858f8>] __do_softirq+0xc8/0x3a0
: [<ffffffff810ab9b5>] ? sched_clock_local+0x25/0x90
: [<ffffffff81685efc>] call_softirq+0x1c/0x30
: [<ffffffff8101b385>] do_softirq+0xa5/0xe0
: [<ffffffff81085f2e>] irq_exit+0xbe/0xf0
: [<ffffffff816867d3>] do_IRQ+0x63/0xe0
: [<ffffffff8167b673>] common_interrupt+0x73/0x73
: <EOI>  [<ffffffff8167b719>] ? retint_swapgs+0x13/0x1b

event_log:
:2011-11-07-11:36:29> Smolt profile successfully saved
:2011-11-07-11:36:58> Submitting oops report to http://submit.kerneloops.org/submitoops.php
:2011-11-07-11:38:01  Kernel oops has not been sent due to Couldn't connect to server
:2011-11-07-11:38:01* (exited with 1)

smolt_data:
:
:
:General
:=================================
:UUID: 84425b93-5f99-4baa-9bd0-f398c479a123
:OS: Fedora release 17 (Rawhide)
:Default run level: Unknown
:Language: en_US.utf8
:Platform: x86_64
:BogoMIPS: 5343.57
:CPU Vendor: GenuineIntel
:CPU Model: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
:CPU Stepping: 7
:CPU Family: 6
:CPU Model Num: 23
:Number of CPUs: 4
:CPU Speed: 2664
:System Memory: 3933
:System Swap: 3999
:Vendor: System manufacturer
:System: P5K-E System Version
:Form factor: Desktop
:Kernel: 3.2.0-0.rc0.git6.0.fc17.x86_64
:SELinux Enabled: 0
:SELinux Policy: targeted
:SELinux Enforce: Unknown
:MythTV Remote: Unknown
:MythTV Role: Unknown
:MythTV Theme: Unknown
:MythTV Plugin: 
:MythTV Tuner: -1
:
:
:Devices
:=================================
:(32902:10688:4163:33429) pci, None, HOST/PCI, 82G33/G31/P35/P31 Express DRAM Controller
:(32902:10689:4163:33429) pci, pcieport, PCI/PCI, 82G33/G31/P35/P31 Express PCI Express Root Port
:(32902:10544:4163:33399) pci, i801_smbus, SERIAL, 82801I (ICH9 Family) SMBus Controller
:(32902:10530:4163:33399) pci, ahci, STORAGE, 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller
:(32902:10518:4163:33399) pci, None, PCI/ISA, 82801IR (ICH9R) LPC Interface Controller
:(32902:9294:4163:33399) pci, None, PCI/PCI, 82801 PCI Bridge
:(32902:10549:4163:33399) pci, uhci_hcd, USB, 82801I (ICH9 Family) USB UHCI Controller #2
:(32902:10548:4163:33399) pci, uhci_hcd, USB, 82801I (ICH9 Family) USB UHCI Controller #1
:(32902:10553:4163:33399) pci, uhci_hcd, USB, 82801I (ICH9 Family) USB UHCI Controller #6
:(32902:10550:4163:33399) pci, uhci_hcd, USB, 82801I (ICH9 Family) USB UHCI Controller #3
:(32902:10554:4163:33399) pci, ehci_hcd, USB, 82801I (ICH9 Family) USB2 EHCI Controller #1
:(32902:10556:4163:33399) pci, ehci_hcd, USB, 82801I (ICH9 Family) USB2 EHCI Controller #2
:(6523:9059:4163:33359) pci, pata_jmicron, STORAGE, JMB362/JMB363 Serial ATA Controller
:(6523:9059:4163:33359) pci, ahci, STORAGE, JMB362/JMB363 Serial ATA Controller
:(32902:10560:4163:33399) pci, pcieport, PCI/PCI, 82801I (ICH9 Family) PCI Express Port 1
:(32902:10568:4163:33399) pci, pcieport, PCI/PCI, 82801I (ICH9 Family) PCI Express Port 5
:(32902:10558:4163:33399) pci, snd_hda_intel, MULTIMEDIA, 82801I (ICH9 Family) HD Audio Controller
:(32902:10551:4163:33399) pci, uhci_hcd, USB, 82801I (ICH9 Family) USB UHCI Controller #4
:(32902:10552:4163:33399) pci, uhci_hcd, USB, 82801I (ICH9 Family) USB UHCI Controller #5
:(4098:43528:4098:43528) pci, snd_hda_intel, MULTIMEDIA, RV630/M76 audio device [Radeon HD 2600 Series]
:(4098:38281:4098:834) pci, radeon, VIDEO, RV630 [Radeon HD 2600 Series]
:(32902:10570:4163:33399) pci, pcieport, PCI/PCI, 82801I (ICH9 Family) PCI Express Port 6
:(4545:22545:4163:33428) pci, firewire_ohci, FIREWIRE, IEEE 1394a Firewire Controller
:(4523:17252:4163:33272) pci, sky2, ETHERNET, 88E8056 PCI-E Gigabit Ethernet Controller
:
:
:Filesystem Information
:=================================
:device mtpt type bsize frsize blocks bfree bavail file ffree favail
:-------------------------------------------------------------------
:/dev/mapper/VolGroup00-rawhide / ext4 4096 4096 4089774 2592752 2551808 1024000 856690 856690
:/dev/sdf7 /boot ext4 1024 1024 508745 334333 308733 128016 127731 127731
:/dev/sdf3 WITHHELD ext4 4096 4096 1276204 1212681 1148681 320000 319799 319799
:/dev/sdf2 WITHHELD ext4 4096 4096 25561025 19593761 18313761 6406144 6405722 6405722
:/dev/mapper/VolGroup00-fedora15 WITHHELD ext4 4096 4096 4082111 2223196 2018396 1024000 797090 797090
:/dev/mapper/VolGroup00-fedora16 WITHHELD ext4 4096 4096 4082111 2626734 2421934 1024000 852120 852120
:/dev/mapper/VolGroup00-fedora14 WITHHELD ext4 4096 4096 4082111 2624419 2419619 1024000 848618 848618
:/dev/sdf1 WITHHELD ext4 1024 1024 508745 417090 391490 128016 127916 127916
:/dev/sdf8 WITHHELD ext4 1024 1024 508745 367304 341704 128016 127712 127712
:/dev/sdf11 WITHHELD ext4 1024 1024 495759 429379 403677 128520 128475 128475
:/dev/mapper/VolGroup01-clydehome /home ext4 4096 4096 5103904 4065992 3809992 1281120 1216431 1216431
:

Comment 1 Josh Boyer 2011-11-08 15:52:18 UTC
*** Bug 752113 has been marked as a duplicate of this bug. ***

Comment 2 Neil Horman 2011-11-09 15:33:18 UTC
clyde I see you're also the reporter on bz 751005, can you describe any common traits of these systems (same vendor/model/etc)?  I just note this is a sky2 card and the other bug is a tg3 card.  I'm wondering, given the fact that you're seeign this on multiple systems what the common thread might be.

Comment 3 Clyde E. Kunkel 2011-11-09 20:58:53 UTC
Same machine, rawhide, two different recent 3.2 kernels.

ASUS P5K-EWIFI mobo (wifi disabled in bios)

$ lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc RV630 [Radeon HD 2600 Series]
01:00.1 Audio device: ATI Technologies Inc RV630/M76 audio device [Radeon HD 2600 Series]
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
05:03.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 70)

Comment 4 Neil Horman 2011-11-09 21:26:10 UTC
Ok, so their both using the software iotlb then.  Thats good to know.  I'll try to construct a stap script to see if we can trace where the failure is occuring.

Comment 5 Neil Horman 2011-11-10 21:12:04 UTC
Created attachment 532930 [details]
patch to log debug actions in dma-debug library

hey I know I said I'd write a stap script, but I decided it would be more efficient to do this as a patch, and since its rawhide I figured you wouldn't mind a rebuild :).  Anyway, this patch (currently untested) should log the last 4096 actions taken in the dma debug library.  When the error you are reporting is hit, it will stop preforming dma-debug.  Via debugfs, you should have a log file entry for the dma api now.  It acts a bit wierd (sorry, didn't want to mess up the buffer count management stuff you have to do with debugfs files).  At any rate, _writing_ anything to the log file, will cause the kernel to spit out the log history to /var/log/messages. Which you can then add in here.  Hopefully this will give us a better clue as to where this problem is originating.

Again, its untested.  I'll be testing in the AM, but It should work and I wanted to get it to you asap.  Thanks!

Comment 6 Clyde E. Kunkel 2011-11-11 18:01:46 UTC
Sorry, patches are beyond me. Unless rawhide of 20111111 includes it.  Also, rawhide is a bit borked for me right now due to xorg rebuilds.  Using F16 for now which does not exhibit this bug.

Comment 7 Neil Horman 2011-11-11 18:23:32 UTC
ok, I'll toss this into koji for you in a bit, you should be able to boot the kernel on an F16 userspace box

Comment 8 Neil Horman 2011-11-11 19:07:49 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=3508255

There you go

Comment 9 Clyde E. Kunkel 2011-12-07 17:05:56 UTC
[  120.955135] ------------[ cut here ]------------
[  120.955153] WARNING: at lib/dma-debug.c:1041 check_sync+0x56c/0x5f0()
[  120.955156] Hardware name: P5K-E
[  120.955159] sky2 0000:02:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000001063d8040] [size=60 bytes]
[  120.955163] Modules linked in: fuse lp parport ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc lockd nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_state nf_conntrack ip6_tables raid1 uvcvideo snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device snd_hda_codec_hdmi snd_hda_codec_analog videodev joydev media v4l2_compat_ioctl32 snd_hda_intel snd_hda_codec snd_hwdep sky2 microcode serio_raw snd_pcm iTCO_wdt snd_timer snd i2c_i801 iTCO_vendor_support soundcore snd_page_alloc asus_atk0110 configfs virtio_net kvm_intel kvm sunrpc uinput raid10 btrfs zlib_deflate libcrc32c ata_generic pata_acpi firewire_ohci firewire_core crc_itu_t pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[  120.955259] Pid: 2651, comm: setiathome-5.28 Not tainted 3.2.0-0.rc1.git2.1.fc17.x86_64 #1
[  120.955262] Call Trace:
[  120.955265]  <IRQ>  [<ffffffff8107ceaf>] warn_slowpath_common+0x7f/0xc0
[  120.955276]  [<ffffffff8107cfa6>] warn_slowpath_fmt+0x46/0x50
[  120.955284]  [<ffffffff81326efc>] check_sync+0x56c/0x5f0
[  120.955289]  [<ffffffff81312e8e>] ? random32+0x2e/0x40
[  120.955293]  [<ffffffff81327352>] debug_dma_sync_single_for_cpu+0x42/0x50
[  120.955299]  [<ffffffff811930ec>] ? ksize+0x1c/0xc0
[  120.955303]  [<ffffffff813229cc>] ? is_swiotlb_buffer+0x3c/0x50
[  120.955307]  [<ffffffff813231e8>] ? swiotlb_sync_single+0x38/0x80
[  120.955312]  [<ffffffff8132332c>] ? swiotlb_sync_single_for_cpu+0xc/0x10
[  120.955322]  [<ffffffffa0371ff3>] sky2_poll+0x573/0xd90 [sky2]
[  120.955329]  [<ffffffff81109672>] ? check_for_new_grace_period+0xc2/0x160
[  120.955335]  [<ffffffff815472f1>] ? net_rx_action+0xa1/0x460
[  120.955339]  [<ffffffff815473b9>] net_rx_action+0x169/0x460
[  120.955344]  [<ffffffff81020cc9>] ? sched_clock+0x9/0x10
[  120.955351]  [<ffffffff810aba75>] ? sched_clock_local+0x25/0x90
[  120.955357]  [<ffffffff810859b8>] __do_softirq+0xc8/0x3a0
[  120.955367]  [<ffffffff810aba75>] ? sched_clock_local+0x25/0x90
[  120.955372]  [<ffffffff81687ebc>] call_softirq+0x1c/0x30
[  120.955378]  [<ffffffff8101b385>] do_softirq+0xa5/0xe0
[  120.955381]  [<ffffffff81085fee>] irq_exit+0xbe/0xf0
[  120.955385]  [<ffffffff81688793>] do_IRQ+0x63/0xe0
[  120.955392]  [<ffffffff8167d633>] common_interrupt+0x73/0x73
[  120.955394]  <EOI>  [<ffffffff8167d6d9>] ? retint_swapgs+0x13/0x1b
[  120.955400] ---[ end trace 9ea8b974ec78d8cd ]---

Comment 10 Neil Horman 2011-12-07 18:26:57 UTC
Ok, I'm glad you re-hit the problem, but I need the history log out of debugfs that the patch provides (as per comment 5).  You'll find the log in /sys/kernel/debug/dma_debug/log, IIRC

Comment 11 Clyde E. Kunkel 2011-12-08 01:37:22 UTC
(In reply to comment #10)
> Ok, I'm glad you re-hit the problem, but I need the history log out of debugfs
> that the patch provides (as per comment 5).  You'll find the log in
> /sys/kernel/debug/dma_debug/log, IIRC

No log there.

Comment 12 Neil Horman 2011-12-08 11:44:56 UTC
Is the debugfs filesystem mounted?  Can you tell me:

1) What the output of the mount command
2 What the contents of the directory are from (1) that specifies the location of the debugfs filesystem?

Comment 13 Clyde E. Kunkel 2011-12-08 16:14:11 UTC
(In reply to comment #12)
> Is the debugfs filesystem mounted?  Can you tell me:
> 
> 1) What the output of the mount command
> 2 What the contents of the directory are from (1) that specifies the location
> of the debugfs filesystem?

[kunkelc@P5K-EWIFI <17> ~]$ uname -r
3.2.0-0.rc1.git2.1.fc17.x86_64
[kunkelc@P5K-EWIFI <17> ~]$ mount | grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
[kunkelc@P5K-EWIFI <17> ~]$ sudo ls -ahl /sys/kernel/debug
total 0
drwxr-xr-x 27 root root 0 Dec  8 06:03 .
drwxr-xr-x  7 root root 0 Dec  8 06:03 ..
drwxr-xr-x  2 root root 0 Dec  8 06:03 acpi
drwxr-xr-x  2 root root 0 Dec  8 11:03 asus_atk0110
drwxr-xr-x 42 root root 0 Dec  8 06:03 bdi
drwxr-xr-x  3 root root 0 Dec  8 06:03 boot_params
drwxr-xr-x  2 root root 0 Dec  8 06:03 debug_objects
drwxr-xr-x  2 root root 0 Dec  8 06:03 dma-api
drwxr-xr-x  4 root root 0 Dec  8 06:03 dri
drwxr-xr-x  2 root root 0 Dec  8 06:03 dynamic_debug
drwxr-xr-x  2 root root 0 Dec  8 06:03 ext4
drwxr-xr-x  2 root root 0 Dec  8 06:03 extfrag
drwxr-xr-x  2 root root 0 Dec  8 06:03 fail_io_timeout
drwxr-xr-x  2 root root 0 Dec  8 06:03 fail_make_request
drwxr-xr-x  2 root root 0 Dec  8 06:03 fail_page_alloc
drwxr-xr-x  2 root root 0 Dec  8 06:03 failslab
-r--r--r--  1 root root 0 Dec  8 06:03 gpio
drwxr-xr-x  5 root root 0 Dec  8 06:03 hid
drwxr-xr-x  2 root root 0 Dec  8 06:03 jbd2
-rw-------  1 root root 0 Dec  8 06:03 kernel_page_tables
drwxr-xr-x  2 root root 0 Dec  8 06:03 kprobes
drwxr-xr-x  2 root root 0 Dec  8 11:03 kvm
drwxr-xr-x  2 root root 0 Dec  8 06:03 mce
drwxr-xr-x  2 root root 0 Dec  8 06:03 regmap
-rw-r--r--  1 root root 0 Dec  8 06:03 sched_features
-r--r--r--  1 root root 0 Dec  8 06:03 suspend_stats
drwxr-xr-x  6 root root 0 Dec  8 06:03 tracing
drwxr-xr-x  3 root root 0 Dec  8 06:03 usb
drwxr-xr-x  2 root root 0 Dec  8 06:03 virtio-ports
-r--r--r--  1 root root 0 Dec  8 06:03 wakeup_sources
drwxr-xr-x  2 root root 0 Dec  8 06:03 x86
drwxr-xr-x  3 root root 0 Dec  8 06:03 xen
[kunkelc@P5K-EWIFI <17> ~]$

Comment 14 Neil Horman 2011-12-08 19:29:56 UTC
sorry, the log file is under dma-api/

Comment 15 Clyde E. Kunkel 2011-12-09 15:55:19 UTC
Things are getting weird, error occurred today, but no debugfs.  However, dmesg trace was larger:


[  106.656435] ------------[ cut here ]------------
[  106.656446] WARNING: at lib/dma-debug.c:1041 check_sync+0x56c/0x5f0()
[  106.656448] Hardware name: P5K-E
[  106.656451] sky2 0000:02:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000104e40040] [size=91 bytes]
[  106.656453] Modules linked in: fuse lp parport ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle lockd tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ip6table_filter nf_conntrack ip6_tables raid1 snd_hda_codec_hdmi snd_hda_codec_analog uvcvideo snd_hda_intel snd_usb_audio videodev snd_hda_codec snd_usbmidi_lib snd_rawmidi media snd_seq_device v4l2_compat_ioctl32 joydev snd_hwdep snd_pcm snd_timer snd i2c_i801 microcode sky2 iTCO_wdt soundcore iTCO_vendor_support snd_page_alloc serio_raw asus_atk0110 configfs virtio_net kvm_intel kvm uinput sunrpc raid10 btrfs zlib_deflate libcrc32c ata_generic firewire_ohci pata_acpi firewire_core crc_itu_t pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[  106.656517] Pid: 2160, comm: tracker-store Not tainted 3.2.0-0.rc1.git2.1.fc17.x86_64 #1
[  106.656519] Call Trace:
[  106.656521]  <IRQ>  [<ffffffff8107ceaf>] warn_slowpath_common+0x7f/0xc0
[  106.656529]  [<ffffffff8107cfa6>] warn_slowpath_fmt+0x46/0x50
[  106.656535]  [<ffffffff81326efc>] check_sync+0x56c/0x5f0
[  106.656538]  [<ffffffff81312e8e>] ? random32+0x2e/0x40
[  106.656541]  [<ffffffff81327352>] debug_dma_sync_single_for_cpu+0x42/0x50
[  106.656546]  [<ffffffff811930ec>] ? ksize+0x1c/0xc0
[  106.656549]  [<ffffffff813229cc>] ? is_swiotlb_buffer+0x3c/0x50
[  106.656553]  [<ffffffff813231e8>] ? swiotlb_sync_single+0x38/0x80
[  106.656556]  [<ffffffff8132332c>] ? swiotlb_sync_single_for_cpu+0xc/0x10
[  106.656563]  [<ffffffffa033eff3>] sky2_poll+0x573/0xd90 [sky2]
[  106.656568]  [<ffffffff815472f1>] ? net_rx_action+0xa1/0x460
[  106.656571]  [<ffffffff815473b9>] net_rx_action+0x169/0x460
[  106.656575]  [<ffffffff810bfe86>] ? mark_held_locks+0x86/0x150
[  106.656579]  [<ffffffff810aba75>] ? sched_clock_local+0x25/0x90
[  106.656583]  [<ffffffff810859b8>] __do_softirq+0xc8/0x3a0
[  106.656586]  [<ffffffff810aba75>] ? sched_clock_local+0x25/0x90
[  106.656590]  [<ffffffff81687ebc>] call_softirq+0x1c/0x30
[  106.656594]  [<ffffffff8101b385>] do_softirq+0xa5/0xe0
[  106.656596]  [<ffffffff81085fee>] irq_exit+0xbe/0xf0
[  106.656600]  [<ffffffff81688793>] do_IRQ+0x63/0xe0
[  106.656604]  [<ffffffff8167d633>] common_interrupt+0x73/0x73
[  106.656606]  <EOI>  [<ffffffff810bfe86>] ? mark_held_locks+0x86/0x150
[  106.656612]  [<ffffffff8167362c>] ? __slab_alloc+0x441/0x45b
[  106.656615]  [<ffffffff810bfe86>] ? mark_held_locks+0x86/0x150
[  106.656618]  [<ffffffff811916c2>] ? set_track+0x62/0x1a0
[  106.656622]  [<ffffffff811471a5>] ? mempool_alloc_slab+0x15/0x20
[  106.656625]  [<ffffffff811471a5>] ? mempool_alloc_slab+0x15/0x20
[  106.656628]  [<ffffffff810c00fd>] ? trace_hardirqs_on+0xd/0x10
[  106.656631]  [<ffffffff81312e8e>] ? random32+0x2e/0x40
[  106.656633]  [<ffffffff811471a5>] ? mempool_alloc_slab+0x15/0x20
[  106.656636]  [<ffffffff81196c19>] kmem_cache_alloc+0x239/0x250
[  106.656639]  [<ffffffff811471a5>] mempool_alloc_slab+0x15/0x20
[  106.656642]  [<ffffffff811474e9>] mempool_alloc+0x59/0x150
[  106.656645]  [<ffffffff811471a5>] ? mempool_alloc_slab+0x15/0x20
[  106.656648]  [<ffffffff81196c19>] ? kmem_cache_alloc+0x239/0x250
[  106.656650]  [<ffffffff811471a5>] ? mempool_alloc_slab+0x15/0x20
[  106.656655]  [<ffffffff811e543e>] bio_alloc_bioset+0x3e/0xf0
[  106.656659]  [<ffffffff814e3cbc>] clone_bio+0x4c/0x100
[  106.656663]  [<ffffffff814e539c>] __split_and_process_bio+0x56c/0x660
[  106.656666]  [<ffffffff814e4e61>] ? __split_and_process_bio+0x31/0x660
[  106.656669]  [<ffffffff810abc08>] ? sched_clock_cpu+0xa8/0x110
[  106.656673]  [<ffffffff810b993d>] ? trace_hardirqs_off+0xd/0x10
[  106.656676]  [<ffffffff810abcdf>] ? local_clock+0x6f/0x80
[  106.656679]  [<ffffffff810ba1d5>] ? lock_release_holdtime.part.9+0x15/0x1a0
[  106.656682]  [<ffffffff814e5608>] dm_request+0x178/0x220
[  106.656685]  [<ffffffff814e54ce>] ? dm_request+0x3e/0x220
[  106.656689]  [<ffffffff812e93aa>] generic_make_request+0xca/0x100
[  106.656692]  [<ffffffff812e9467>] submit_bio+0x87/0x110
[  106.656696]  [<ffffffff81144ea0>] ? add_to_page_cache_locked+0xf0/0x170
[  106.656699]  [<ffffffff811ebcd5>] mpage_readpages+0x105/0x130
[  106.656704]  [<ffffffff81235a40>] ? noalloc_get_block_write+0x30/0x30
[  106.656707]  [<ffffffff81020c73>] ? native_sched_clock+0x13/0x60
[  106.656710]  [<ffffffff81235a40>] ? noalloc_get_block_write+0x30/0x30
[  106.656713]  [<ffffffff810b993d>] ? trace_hardirqs_off+0xd/0x10
[  106.656716]  [<ffffffff810abcdf>] ? local_clock+0x6f/0x80
[  106.656719]  [<ffffffff81230fcd>] ext4_readpages+0x1d/0x20
[  106.656723]  [<ffffffff8115130a>] __do_page_cache_readahead+0x23a/0x2e0
[  106.656726]  [<ffffffff811511aa>] ? __do_page_cache_readahead+0xda/0x2e0
[  106.656730]  [<ffffffff81151732>] ondemand_readahead+0x1e2/0x2f0
[  106.656733]  [<ffffffff81151690>] ? ondemand_readahead+0x140/0x2f0
[  106.656735]  [<ffffffff811446e0>] ? iov_iter_copy_from_user+0xd0/0xd0
[  106.656739]  [<ffffffff81151916>] page_cache_sync_readahead+0x36/0x50
[  106.656742]  [<ffffffff811463b8>] generic_file_aio_read+0x478/0x7e0
[  106.656744]  [<ffffffff81020c73>] ? native_sched_clock+0x13/0x60
[  106.656749]  [<ffffffff811ab932>] do_sync_read+0xd2/0x110
[  106.656753]  [<ffffffff812a8826>] ? security_file_permission+0x96/0xb0
[  106.656756]  [<ffffffff811abe01>] ? rw_verify_area+0x61/0x100
[  106.656759]  [<ffffffff811ac2f3>] vfs_read+0xb3/0x180
[  106.656762]  [<ffffffff811ac40d>] sys_read+0x4d/0x90
[  106.656766]  [<ffffffff81685bc2>] system_call_fastpath+0x16/0x1b
[  106.656768] ---[ end trace 4a1c47639abbd7a2 ]---
[kunkelc@P5K-EWIFI <17> ~]$

Comment 16 Neil Horman 2011-12-09 18:23:44 UTC
The depth of the stack trace isn't relevant, it just reflects the fact that the stack was already pretty deep before we got an interrupt from sky2 hw that led us to the rx path that hits the error.

What do you mean by 'no debugfs'?

I need that log if we want to figure out whats going on here.

Comment 17 Clyde E. Kunkel 2011-12-09 18:43:17 UTC
Probably because on that boot there was no traceback.  subsequently there was a traceback; however:


[kunkelc@P5K-EWIFI <17> ~]$ sudo ls -ahl /sys/kernel/debug/dma-api
total 0
drwxr-xr-x  2 root root 0 Dec  9 05:31 .
drwxr-xr-x 27 root root 0 Dec  9 05:31 ..
-rw-r--r--  1 root root 0 Dec  9 05:31 all_errors
-r--r--r--  1 root root 0 Dec  9 05:31 disabled
-rw-r--r--  1 root root 0 Dec  9 05:31 driver_filter
-r--r--r--  1 root root 0 Dec  9 05:31 error_count
--w--w--w-  1 root root 0 Dec  9 05:31 log
-r--r--r--  1 root root 0 Dec  9 05:31 min_free_entries
-rw-r--r--  1 root root 0 Dec  9 05:31 num_errors
-r--r--r--  1 root root 0 Dec  9 05:31 num_free_entries
[kunkelc@P5K-EWIFI <17> ~]$ sudo cat /sys/kernel/debug/dma-api/log
cat: /sys/kernel/debug/dma-api/log: Invalid argument
[kunkelc@P5K-EWIFI <17> ~]$ 

empty file?

Comment 18 Neil Horman 2011-12-09 19:44:33 UTC
No, you're doing it wrong (sorry, I know its wierd, but it was easier to write the debug file this way with the limited time I had).  Note my description in comment 5 - you have to echo a value into the log file there, and as a result, the contents of the dma debug history log will get dumped to /var/log/messages.

Comment 19 Clyde E. Kunkel 2011-12-10 01:47:06 UTC
Created attachment 544821 [details]
requested /var/log/messages after kernel error

Yeah, so weird, I just glossed over the write part and thought you were talking about your patch.  Sorry for all the confusion.  hope the attached helps.

Comment 20 Neil Horman 2011-12-12 14:36:32 UTC
Thank you, yes, that helps.  Unfortunately the news isn't good.  The first thing that jumps out at me, even before I go trying to figure out what exactly happened to the sky2 card is that the log shows several devices (the ata controller, the video card and the sky2 card), all mapping devices using the same pci device pointer.  That suggests that all of those devices are pointing to the same  device structure, which shouldn't happen.  It could be a flaw in my debug code, but seeing as I take a global lock and each entry updates the device pointer, I don't see how.  I'm worried that something is getting rather corrupted in your system.  Given that the only commonality is the sky2 driver, I think the best thing to do is build a rawhide kernel for you with a back level sky2 driver to see if the problem clears, then move forward in drivers commits, bisecting until we find the problem.

I'll have a kernel for you shortly.

Comment 21 Neil Horman 2011-12-12 15:24:35 UTC
 http://koji.fedoraproject.org/koji/taskinfo?taskID=3579977


There you go, its a rawhide kernel with the 2.6.41 sky2 driver.  Lets see if that makes the problem go away.

Comment 22 Clyde E. Kunkel 2011-12-12 15:39:57 UTC
BuildError: error building package (arch noarch), mock exited with status 1; see build.log for more information

Comment 23 Neil Horman 2011-12-12 16:18:28 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=3580092

Stupid patching issue, my bad.

Comment 24 Konrad Rzeszutek Wilk 2011-12-13 14:32:28 UTC
I think this is fixed by


commit 91ec37cc1015220965e39bf342fb846810d19e79
Author: Thomas Jarosch <thomas.jarosch>
Date:   Thu Nov 17 20:31:02 2011 +0100

    Fix comparison using wrong pointer variable in dma debug code
    
    cppcheck reported:
    [lib/dma-debug.c:248] -> [lib/dma-debug.c:248]: (style) Same expression on both sides of '=='.
    
    Signed-off-by: Thomas Jarosch <thomas.jarosch>
    Signed-off-by: Joerg Roedel <joerg.roedel>


as I don't see the tg3 error anymore with 3.2-rc5.

Comment 25 Clyde E. Kunkel 2011-12-13 14:57:43 UTC
Not seeing issue with test1 kernel 3.2.0-0.rc5.git0.1.test.1.fc17.x86_64.

Comment 26 Clyde E. Kunkel 2011-12-13 15:28:57 UTC
Nor with 3.2.0-0.rc5.git0.1.fc17.x86_64.  Only a sample of two system boots, however.  Will try more as day wears on.

Comment 27 Neil Horman 2011-12-13 16:30:45 UTC
yeah, that would certainly do it.  Closing as NEXTRELEASE.  Please re-open if the problem re-appears.  Thanks!

Comment 28 Neil Horman 2012-01-20 14:49:04 UTC
*** Bug 751005 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.