Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 610964 - cxgb3 driver attempts to DMA sync memory it didn't allocate
cxgb3 driver attempts to DMA sync memory it didn't allocate
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
14
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Neil Horman
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-02 17:45 EDT by Jay Fenlason
Modified: 2014-08-31 19:29 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-08-23 14:14:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to dma library to allow flexibility in entry discovery (4.49 KB, patch)
2011-06-15 13:51 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description Jay Fenlason 2010-07-02 17:45:17 EDT
Description of problem:


Version-Release number of selected component (if applicable):
2.6.35-0.19.rc3.git4.fc14.x86_64

How reproducible:
Always (so far, anyway)

Steps to Reproduce:
1.ifconfig a cxgb3 as 172.31.1.1
2.ping the remote machine 172.31.1.2
3.look at dmesg
  
Actual results:
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:902 check_sync+0xdd/0x48c()
Hardware name: To be filled by O.E.M.
cxgb3 0000:01:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000000fff97800] [size=1984 bytes]
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer e1000e snd soundcore r8169 cxgb3 iTCO_wdt snd_page_alloc mii shpchp i2c_i801 iTCO_vendor_support mdio microcode firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
Pid: 1818, comm: ifconfig Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1
Call Trace:
 [<ffffffff81050f71>] warn_slowpath_common+0x85/0x9d
 [<ffffffff8105102c>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff8124658e>] ? check_sync+0x39/0x48c
 [<ffffffff8107c470>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81246632>] check_sync+0xdd/0x48c
 [<ffffffff81246ca6>] debug_dma_sync_single_for_device+0x3f/0x41
 [<ffffffffa011615c>] ? pci_map_page+0x84/0x97 [cxgb3]
 [<ffffffffa0117bc3>] pci_dma_sync_single_for_device.clone.0+0x65/0x6e [cxgb3]
 [<ffffffffa0117ed1>] refill_fl+0x305/0x30a [cxgb3]
 [<ffffffffa011857d>] t3_sge_alloc_qset+0x6a7/0x821 [cxgb3]
 [<ffffffffa010a07b>] cxgb_up+0x4d0/0xe62 [cxgb3]
 [<ffffffff81086037>] ? __module_text_address+0x12/0x58
 [<ffffffffa010aa4c>] cxgb_open+0x3f/0x309 [cxgb3]
 [<ffffffff813e9f6c>] __dev_open+0x8e/0xbc
 [<ffffffff813e7ca5>] __dev_change_flags+0xbe/0x142
 [<ffffffff813e9ea8>] dev_change_flags+0x21/0x57
 [<ffffffff81445937>] devinet_ioctl+0x29a/0x54b
 [<ffffffff811f9a87>] ? inode_has_perm+0xaa/0xce
 [<ffffffff81446ed2>] inet_ioctl+0x8f/0xa7
 [<ffffffff813d683a>] sock_do_ioctl+0x29/0x48
 [<ffffffff813d6c83>] sock_ioctl+0x213/0x222
 [<ffffffff81137f78>] vfs_ioctl+0x32/0xa6
 [<ffffffff811384e2>] do_vfs_ioctl+0x47a/0x4b3
 [<ffffffff81138571>] sys_ioctl+0x56/0x79
 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
---[ end trace 69a4d4cc77b58004 ]---


Expected results:
no warning in dmesg

Additional info:
Comment 1 Chuck Ebbert 2010-07-14 02:59:33 EDT
Is it still happening in 2.6.35-rc5?
Comment 2 Chuck Ebbert 2010-07-22 16:09:01 EDT
Can you report this upstream? Looks like the conversion to the generic DMA API exposed this bug.
Comment 3 Divy Le Ray 2010-07-23 13:50:33 EDT
I can reproduce it with CONFIG_DMA_API_DEBUG on.

In my setup, the warning happens invariably:
- only once, right after a reboot
- on the second dma sync the driver executes.

I start believing that the issue lies in the dma debug code. It is as if the
first dma sync created the bucket entry for the device, and the subsequent first lookup for the debug ref fails.

For debug puposes, I hacked the driver to allocate a page, map it and sync it twice, and do it only once, prior to filling up the Rx descriptors.
The warning appears in this sequence on the second sync.
 
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 775152c..b641434 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -501,6 +501,7 @@ static int refill_fl(struct adapter *adap, struct sge_fl
*q, int n, gfp_t gfp)
        struct rx_desc *d = &q->desc[q->pidx];
        unsigned int count = 0;

+       static int dbg = 0;
        while (n--) {
                dma_addr_t mapping;
                int err;
@@ -515,6 +516,8 @@ nomem:                              q->alloc_failed++;
                        dma_unmap_addr_set(sd, dma_addr, mapping);

                        add_one_rx_chunk(mapping, d, q->gen);
+                       if (dbg++ < 4)
+                               printk("%s: mapping %x\n", __func__, mapping);
                        pci_dma_sync_single_for_device(adap->pdev, mapping,
                                                q->buf_size - SGE_PG_RSVD,
                                                PCI_DMA_FROMDEVICE);
@@ -2992,6 +2995,34 @@ int t3_sge_alloc_qset(struct adapter *adapter, unsigned
int id, int nports,
 {
        int i, avail, ret = -ENOMEM;
        struct sge_qset *q = &adapter->sge.qs[id];
+       static int alloc_page_once = 0;
+
+       if (!alloc_page_once) {
+               struct page *page = alloc_pages(GFP_KERNEL | __GFP_COMP, 0);
+
+               dma_addr_t mapping = pci_map_page(adapter->pdev, page, 0,
+                                                 PAGE_SIZE,
+                                                 PCI_DMA_FROMDEVICE);
+
+               /* Map page's first half */
+               pci_dma_sync_single_for_device(adapter->pdev, mapping,
+                                               PAGE_SIZE / 2,
+                                               PCI_DMA_FROMDEVICE);
+
+               /* Map page'second half */
+               pci_dma_sync_single_for_device(adapter->pdev, mapping +
PAGE_SIZE / 2,
+                                               PAGE_SIZE / 2,
+                                               PCI_DMA_FROMDEVICE);
+
+               /* Unmap page'second half */
+               pci_unmap_page(adapter->pdev, mapping,
+                                PAGE_SIZE, PCI_DMA_FROMDEVICE);
+
+               /* Unmap all page */
+               __free_pages(page, 0);
+               alloc_page_once = 1;
+       }
+
+
+       printk("%s: dev %s, queue set %d\n", __func__, dev->name, id);

which gives this output:

WARNING: at /mnt/net-next-2.6/lib/dma-debug.c:902 check_sync+0xc8/0x497()
Hardware name: X6DH8-XG2
cxgb3 0000:04:00.0: DMA-API: device driver tries to sync DMA memory it has not
allocated [device address=0x000000007adcf800] [size=2048 bytes]
Modules linked in: autofs4 sunrpc ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad
iw_nes iw_cxgb3 mlx4_ib mlx4_core ib_mthca ib_mad ib_core dm_multipath scsi_dh
lp parport option usb_wwan usbserial e1000 ide_cd_mod cdrom cxgb3 mdio rtc_cmos
rtc_core rtc_lib serio_raw pcspkr i2c_i801 intel_rng e752x_edac shpchp
edac_core floppy dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
[last unloaded: microcode]
Pid: 2970, comm: ip Tainted: G        W   2.6.35-rc1 #4
Call Trace:
 [<ffffffff8117072e>] ? check_sync+0xc8/0x497
 [<ffffffff81039142>] warn_slowpath_common+0x80/0x99
 [<ffffffff8103923e>] warn_slowpath_fmt+0x69/0x6b
 [<ffffffff8105f1ea>] ? __module_text_address+0xd/0x5d
 [<ffffffff8105f243>] ? is_module_text_address+0x9/0x14
 [<ffffffff8117072e>] check_sync+0xc8/0x497
 [<ffffffff81005149>] ? dump_trace+0x2ef/0x2fe
 [<ffffffff81170d21>] debug_dma_sync_single_for_device+0x3f/0x41
 [<ffffffff811716cc>] ? add_dma_entry+0x39/0x44
 [<ffffffff81171f21>] ? debug_dma_map_page+0x10c/0x11b
 [<ffffffffa01d94c8>] t3_sge_alloc_qset+0x193/0x803 [cxgb3]
 [<ffffffffa01cfd1e>] ? t3_phy_lasi_intr_clear+0x1b/0x26 [cxgb3]
 [<ffffffffa01c8702>] cxgb_up+0x362/0xd8d [cxgb3]
 [<ffffffff812c4c84>] ? inetdev_event+0x23/0x397
 [<ffffffffa01c96dc>] cxgb_open+0x3a/0x270 [cxgb3]
 [<ffffffff8127bc50>] __dev_open+0x89/0xb2
 [<ffffffff81279f72>] __dev_change_flags+0xa8/0x12a
 [<ffffffff8127b846>] dev_change_flags+0x1c/0x52
 [<ffffffff812c5293>] devinet_ioctl+0x269/0x5da
 [<ffffffff8112ddf3>] ? inode_has_perm+0x77/0x89
 [<ffffffff812c5dbe>] inet_ioctl+0x92/0xaa
 [<ffffffff8126b1ff>] sock_do_ioctl+0x26/0x45
 [<ffffffff8126b425>] sock_ioctl+0x207/0x216
 [<ffffffff810cf7d4>] vfs_ioctl+0x2a/0x9d
 [<ffffffff810cfcd1>] do_vfs_ioctl+0x412/0x463
 [<ffffffff810cfd79>] sys_ioctl+0x57/0x7a
 [<ffffffff810028eb>] system_call_fastpath+0x16/0x1b
---[ end trace 6d450e935ee1897e ]---
t3_sge_alloc_qset: dev toe0, queue set 0
refill_fl: mapping 7adcf000
refill_fl: mapping 7adcf800
refill_fl: mapping 7a201000
refill_fl: mapping 7a201800
t3_sge_alloc_qset: dev toe0, queue set 1
t3_sge_alloc_qset: dev toe0, queue set 2
t3_sge_alloc_qset: dev toe0, queue set 3
t3_sge_alloc_qset: dev toe1, queue set 4
t3_sge_alloc_qset: dev toe1, queue set 5
t3_sge_alloc_qset: dev toe1, queue set 6
t3_sge_alloc_qset: dev toe1, queue set 7

The warning now triggers in the debug code.
Comment 4 Bug Zapper 2010-07-30 08:24:28 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 5 Neil Horman 2011-06-15 08:52:28 EDT
I agree with Divy, it appears the debug_dma_sync_single_for_device code uses a hash bucked to track dma mappings that presumes the dma_handle is immutable during teh lifetime of a mapping (i.e. the address returned to pci_map_[page|single|etc] is what will always be passed to the sync_* api calls without any offset.  The dma debug code needs to be taught that a particular sync may not be given at the start of a mapping, and may cross mapping boundaries.  I'll try put together a patch.
Comment 6 Neil Horman 2011-06-15 13:51:08 EDT
Created attachment 504916 [details]
patch to dma library to allow flexibility in entry discovery

Jay,Divy, would one of you mind testing this out to see if it resolves this problem?  Thanks!
Comment 7 Neil Horman 2011-07-12 09:41:45 EDT
ok, been almost a month here, any feedback on this patch?
Comment 8 Jay Fenlason 2011-07-19 13:23:30 EDT
The patch quiets the warning.
Comment 9 Neil Horman 2011-07-19 13:43:10 EDT
Thank you, sent upstream:
http://marc.info/?l=linux-kernel&m=131109732210343&w=2
Comment 10 Neil Horman 2011-08-08 15:18:17 EDT
new patch version sent to that thread based on reviews.
Comment 11 Neil Horman 2011-08-23 13:30:39 EDT
Jay, the new version of the patch, with some minor adjustments has been accepted upstream.  I just noted today that F16 alpha is on its way out.  Do you still want this backported to F14 or should I close it as NEXTRELEASE?
Comment 12 Jay Fenlason 2011-08-23 13:52:31 EDT
NEXTRELEASE is fine with me.
Comment 13 Neil Horman 2011-08-23 14:14:58 EDT
copy that, it should get pulled into mainline and rawhide by the time F16 releases.

Note You need to log in before you can comment on or make changes to this bug.