Description of problem: Version-Release number of selected component (if applicable): 2.6.35-0.19.rc3.git4.fc14.x86_64 How reproducible: Always (so far, anyway) Steps to Reproduce: 1.ifconfig a cxgb3 as 172.31.1.1 2.ping the remote machine 172.31.1.2 3.look at dmesg Actual results: ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:902 check_sync+0xdd/0x48c() Hardware name: To be filled by O.E.M. cxgb3 0000:01:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000000fff97800] [size=1984 bytes] Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer e1000e snd soundcore r8169 cxgb3 iTCO_wdt snd_page_alloc mii shpchp i2c_i801 iTCO_vendor_support mdio microcode firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1818, comm: ifconfig Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1 Call Trace: [<ffffffff81050f71>] warn_slowpath_common+0x85/0x9d [<ffffffff8105102c>] warn_slowpath_fmt+0x46/0x48 [<ffffffff8124658e>] ? check_sync+0x39/0x48c [<ffffffff8107c470>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81246632>] check_sync+0xdd/0x48c [<ffffffff81246ca6>] debug_dma_sync_single_for_device+0x3f/0x41 [<ffffffffa011615c>] ? pci_map_page+0x84/0x97 [cxgb3] [<ffffffffa0117bc3>] pci_dma_sync_single_for_device.clone.0+0x65/0x6e [cxgb3] [<ffffffffa0117ed1>] refill_fl+0x305/0x30a [cxgb3] [<ffffffffa011857d>] t3_sge_alloc_qset+0x6a7/0x821 [cxgb3] [<ffffffffa010a07b>] cxgb_up+0x4d0/0xe62 [cxgb3] [<ffffffff81086037>] ? __module_text_address+0x12/0x58 [<ffffffffa010aa4c>] cxgb_open+0x3f/0x309 [cxgb3] [<ffffffff813e9f6c>] __dev_open+0x8e/0xbc [<ffffffff813e7ca5>] __dev_change_flags+0xbe/0x142 [<ffffffff813e9ea8>] dev_change_flags+0x21/0x57 [<ffffffff81445937>] devinet_ioctl+0x29a/0x54b [<ffffffff811f9a87>] ? inode_has_perm+0xaa/0xce [<ffffffff81446ed2>] inet_ioctl+0x8f/0xa7 [<ffffffff813d683a>] sock_do_ioctl+0x29/0x48 [<ffffffff813d6c83>] sock_ioctl+0x213/0x222 [<ffffffff81137f78>] vfs_ioctl+0x32/0xa6 [<ffffffff811384e2>] do_vfs_ioctl+0x47a/0x4b3 [<ffffffff81138571>] sys_ioctl+0x56/0x79 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b ---[ end trace 69a4d4cc77b58004 ]--- Expected results: no warning in dmesg Additional info:
Is it still happening in 2.6.35-rc5?
Can you report this upstream? Looks like the conversion to the generic DMA API exposed this bug.
I can reproduce it with CONFIG_DMA_API_DEBUG on. In my setup, the warning happens invariably: - only once, right after a reboot - on the second dma sync the driver executes. I start believing that the issue lies in the dma debug code. It is as if the first dma sync created the bucket entry for the device, and the subsequent first lookup for the debug ref fails. For debug puposes, I hacked the driver to allocate a page, map it and sync it twice, and do it only once, prior to filling up the Rx descriptors. The warning appears in this sequence on the second sync. diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 775152c..b641434 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -501,6 +501,7 @@ static int refill_fl(struct adapter *adap, struct sge_fl *q, int n, gfp_t gfp) struct rx_desc *d = &q->desc[q->pidx]; unsigned int count = 0; + static int dbg = 0; while (n--) { dma_addr_t mapping; int err; @@ -515,6 +516,8 @@ nomem: q->alloc_failed++; dma_unmap_addr_set(sd, dma_addr, mapping); add_one_rx_chunk(mapping, d, q->gen); + if (dbg++ < 4) + printk("%s: mapping %x\n", __func__, mapping); pci_dma_sync_single_for_device(adap->pdev, mapping, q->buf_size - SGE_PG_RSVD, PCI_DMA_FROMDEVICE); @@ -2992,6 +2995,34 @@ int t3_sge_alloc_qset(struct adapter *adapter, unsigned int id, int nports, { int i, avail, ret = -ENOMEM; struct sge_qset *q = &adapter->sge.qs[id]; + static int alloc_page_once = 0; + + if (!alloc_page_once) { + struct page *page = alloc_pages(GFP_KERNEL | __GFP_COMP, 0); + + dma_addr_t mapping = pci_map_page(adapter->pdev, page, 0, + PAGE_SIZE, + PCI_DMA_FROMDEVICE); + + /* Map page's first half */ + pci_dma_sync_single_for_device(adapter->pdev, mapping, + PAGE_SIZE / 2, + PCI_DMA_FROMDEVICE); + + /* Map page'second half */ + pci_dma_sync_single_for_device(adapter->pdev, mapping + PAGE_SIZE / 2, + PAGE_SIZE / 2, + PCI_DMA_FROMDEVICE); + + /* Unmap page'second half */ + pci_unmap_page(adapter->pdev, mapping, + PAGE_SIZE, PCI_DMA_FROMDEVICE); + + /* Unmap all page */ + __free_pages(page, 0); + alloc_page_once = 1; + } + + + printk("%s: dev %s, queue set %d\n", __func__, dev->name, id); which gives this output: WARNING: at /mnt/net-next-2.6/lib/dma-debug.c:902 check_sync+0xc8/0x497() Hardware name: X6DH8-XG2 cxgb3 0000:04:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000007adcf800] [size=2048 bytes] Modules linked in: autofs4 sunrpc ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad iw_nes iw_cxgb3 mlx4_ib mlx4_core ib_mthca ib_mad ib_core dm_multipath scsi_dh lp parport option usb_wwan usbserial e1000 ide_cd_mod cdrom cxgb3 mdio rtc_cmos rtc_core rtc_lib serio_raw pcspkr i2c_i801 intel_rng e752x_edac shpchp edac_core floppy dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 2970, comm: ip Tainted: G W 2.6.35-rc1 #4 Call Trace: [<ffffffff8117072e>] ? check_sync+0xc8/0x497 [<ffffffff81039142>] warn_slowpath_common+0x80/0x99 [<ffffffff8103923e>] warn_slowpath_fmt+0x69/0x6b [<ffffffff8105f1ea>] ? __module_text_address+0xd/0x5d [<ffffffff8105f243>] ? is_module_text_address+0x9/0x14 [<ffffffff8117072e>] check_sync+0xc8/0x497 [<ffffffff81005149>] ? dump_trace+0x2ef/0x2fe [<ffffffff81170d21>] debug_dma_sync_single_for_device+0x3f/0x41 [<ffffffff811716cc>] ? add_dma_entry+0x39/0x44 [<ffffffff81171f21>] ? debug_dma_map_page+0x10c/0x11b [<ffffffffa01d94c8>] t3_sge_alloc_qset+0x193/0x803 [cxgb3] [<ffffffffa01cfd1e>] ? t3_phy_lasi_intr_clear+0x1b/0x26 [cxgb3] [<ffffffffa01c8702>] cxgb_up+0x362/0xd8d [cxgb3] [<ffffffff812c4c84>] ? inetdev_event+0x23/0x397 [<ffffffffa01c96dc>] cxgb_open+0x3a/0x270 [cxgb3] [<ffffffff8127bc50>] __dev_open+0x89/0xb2 [<ffffffff81279f72>] __dev_change_flags+0xa8/0x12a [<ffffffff8127b846>] dev_change_flags+0x1c/0x52 [<ffffffff812c5293>] devinet_ioctl+0x269/0x5da [<ffffffff8112ddf3>] ? inode_has_perm+0x77/0x89 [<ffffffff812c5dbe>] inet_ioctl+0x92/0xaa [<ffffffff8126b1ff>] sock_do_ioctl+0x26/0x45 [<ffffffff8126b425>] sock_ioctl+0x207/0x216 [<ffffffff810cf7d4>] vfs_ioctl+0x2a/0x9d [<ffffffff810cfcd1>] do_vfs_ioctl+0x412/0x463 [<ffffffff810cfd79>] sys_ioctl+0x57/0x7a [<ffffffff810028eb>] system_call_fastpath+0x16/0x1b ---[ end trace 6d450e935ee1897e ]--- t3_sge_alloc_qset: dev toe0, queue set 0 refill_fl: mapping 7adcf000 refill_fl: mapping 7adcf800 refill_fl: mapping 7a201000 refill_fl: mapping 7a201800 t3_sge_alloc_qset: dev toe0, queue set 1 t3_sge_alloc_qset: dev toe0, queue set 2 t3_sge_alloc_qset: dev toe0, queue set 3 t3_sge_alloc_qset: dev toe1, queue set 4 t3_sge_alloc_qset: dev toe1, queue set 5 t3_sge_alloc_qset: dev toe1, queue set 6 t3_sge_alloc_qset: dev toe1, queue set 7 The warning now triggers in the debug code.
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I agree with Divy, it appears the debug_dma_sync_single_for_device code uses a hash bucked to track dma mappings that presumes the dma_handle is immutable during teh lifetime of a mapping (i.e. the address returned to pci_map_[page|single|etc] is what will always be passed to the sync_* api calls without any offset. The dma debug code needs to be taught that a particular sync may not be given at the start of a mapping, and may cross mapping boundaries. I'll try put together a patch.
Created attachment 504916 [details] patch to dma library to allow flexibility in entry discovery Jay,Divy, would one of you mind testing this out to see if it resolves this problem? Thanks!
ok, been almost a month here, any feedback on this patch?
The patch quiets the warning.
Thank you, sent upstream: http://marc.info/?l=linux-kernel&m=131109732210343&w=2
new patch version sent to that thread based on reviews.
Jay, the new version of the patch, with some minor adjustments has been accepted upstream. I just noted today that F16 alpha is on its way out. Do you still want this backported to F14 or should I close it as NEXTRELEASE?
NEXTRELEASE is fine with me.
copy that, it should get pulled into mainline and rawhide by the time F16 releases.