Bug 610964
| Summary: | cxgb3 driver attempts to DMA sync memory it didn't allocate | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jay Fenlason <fenlason> | ||||
| Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 14 | CC: | anton, divy, dougsland, gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, nhorman | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-08-23 18:14:58 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jay Fenlason
2010-07-02 21:45:17 UTC
Is it still happening in 2.6.35-rc5? Can you report this upstream? Looks like the conversion to the generic DMA API exposed this bug. I can reproduce it with CONFIG_DMA_API_DEBUG on.
In my setup, the warning happens invariably:
- only once, right after a reboot
- on the second dma sync the driver executes.
I start believing that the issue lies in the dma debug code. It is as if the
first dma sync created the bucket entry for the device, and the subsequent first lookup for the debug ref fails.
For debug puposes, I hacked the driver to allocate a page, map it and sync it twice, and do it only once, prior to filling up the Rx descriptors.
The warning appears in this sequence on the second sync.
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 775152c..b641434 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -501,6 +501,7 @@ static int refill_fl(struct adapter *adap, struct sge_fl
*q, int n, gfp_t gfp)
struct rx_desc *d = &q->desc[q->pidx];
unsigned int count = 0;
+ static int dbg = 0;
while (n--) {
dma_addr_t mapping;
int err;
@@ -515,6 +516,8 @@ nomem: q->alloc_failed++;
dma_unmap_addr_set(sd, dma_addr, mapping);
add_one_rx_chunk(mapping, d, q->gen);
+ if (dbg++ < 4)
+ printk("%s: mapping %x\n", __func__, mapping);
pci_dma_sync_single_for_device(adap->pdev, mapping,
q->buf_size - SGE_PG_RSVD,
PCI_DMA_FROMDEVICE);
@@ -2992,6 +2995,34 @@ int t3_sge_alloc_qset(struct adapter *adapter, unsigned
int id, int nports,
{
int i, avail, ret = -ENOMEM;
struct sge_qset *q = &adapter->sge.qs[id];
+ static int alloc_page_once = 0;
+
+ if (!alloc_page_once) {
+ struct page *page = alloc_pages(GFP_KERNEL | __GFP_COMP, 0);
+
+ dma_addr_t mapping = pci_map_page(adapter->pdev, page, 0,
+ PAGE_SIZE,
+ PCI_DMA_FROMDEVICE);
+
+ /* Map page's first half */
+ pci_dma_sync_single_for_device(adapter->pdev, mapping,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+
+ /* Map page'second half */
+ pci_dma_sync_single_for_device(adapter->pdev, mapping +
PAGE_SIZE / 2,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+
+ /* Unmap page'second half */
+ pci_unmap_page(adapter->pdev, mapping,
+ PAGE_SIZE, PCI_DMA_FROMDEVICE);
+
+ /* Unmap all page */
+ __free_pages(page, 0);
+ alloc_page_once = 1;
+ }
+
+
+ printk("%s: dev %s, queue set %d\n", __func__, dev->name, id);
which gives this output:
WARNING: at /mnt/net-next-2.6/lib/dma-debug.c:902 check_sync+0xc8/0x497()
Hardware name: X6DH8-XG2
cxgb3 0000:04:00.0: DMA-API: device driver tries to sync DMA memory it has not
allocated [device address=0x000000007adcf800] [size=2048 bytes]
Modules linked in: autofs4 sunrpc ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad
iw_nes iw_cxgb3 mlx4_ib mlx4_core ib_mthca ib_mad ib_core dm_multipath scsi_dh
lp parport option usb_wwan usbserial e1000 ide_cd_mod cdrom cxgb3 mdio rtc_cmos
rtc_core rtc_lib serio_raw pcspkr i2c_i801 intel_rng e752x_edac shpchp
edac_core floppy dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
[last unloaded: microcode]
Pid: 2970, comm: ip Tainted: G W 2.6.35-rc1 #4
Call Trace:
[<ffffffff8117072e>] ? check_sync+0xc8/0x497
[<ffffffff81039142>] warn_slowpath_common+0x80/0x99
[<ffffffff8103923e>] warn_slowpath_fmt+0x69/0x6b
[<ffffffff8105f1ea>] ? __module_text_address+0xd/0x5d
[<ffffffff8105f243>] ? is_module_text_address+0x9/0x14
[<ffffffff8117072e>] check_sync+0xc8/0x497
[<ffffffff81005149>] ? dump_trace+0x2ef/0x2fe
[<ffffffff81170d21>] debug_dma_sync_single_for_device+0x3f/0x41
[<ffffffff811716cc>] ? add_dma_entry+0x39/0x44
[<ffffffff81171f21>] ? debug_dma_map_page+0x10c/0x11b
[<ffffffffa01d94c8>] t3_sge_alloc_qset+0x193/0x803 [cxgb3]
[<ffffffffa01cfd1e>] ? t3_phy_lasi_intr_clear+0x1b/0x26 [cxgb3]
[<ffffffffa01c8702>] cxgb_up+0x362/0xd8d [cxgb3]
[<ffffffff812c4c84>] ? inetdev_event+0x23/0x397
[<ffffffffa01c96dc>] cxgb_open+0x3a/0x270 [cxgb3]
[<ffffffff8127bc50>] __dev_open+0x89/0xb2
[<ffffffff81279f72>] __dev_change_flags+0xa8/0x12a
[<ffffffff8127b846>] dev_change_flags+0x1c/0x52
[<ffffffff812c5293>] devinet_ioctl+0x269/0x5da
[<ffffffff8112ddf3>] ? inode_has_perm+0x77/0x89
[<ffffffff812c5dbe>] inet_ioctl+0x92/0xaa
[<ffffffff8126b1ff>] sock_do_ioctl+0x26/0x45
[<ffffffff8126b425>] sock_ioctl+0x207/0x216
[<ffffffff810cf7d4>] vfs_ioctl+0x2a/0x9d
[<ffffffff810cfcd1>] do_vfs_ioctl+0x412/0x463
[<ffffffff810cfd79>] sys_ioctl+0x57/0x7a
[<ffffffff810028eb>] system_call_fastpath+0x16/0x1b
---[ end trace 6d450e935ee1897e ]---
t3_sge_alloc_qset: dev toe0, queue set 0
refill_fl: mapping 7adcf000
refill_fl: mapping 7adcf800
refill_fl: mapping 7a201000
refill_fl: mapping 7a201800
t3_sge_alloc_qset: dev toe0, queue set 1
t3_sge_alloc_qset: dev toe0, queue set 2
t3_sge_alloc_qset: dev toe0, queue set 3
t3_sge_alloc_qset: dev toe1, queue set 4
t3_sge_alloc_qset: dev toe1, queue set 5
t3_sge_alloc_qset: dev toe1, queue set 6
t3_sge_alloc_qset: dev toe1, queue set 7
The warning now triggers in the debug code.
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping I agree with Divy, it appears the debug_dma_sync_single_for_device code uses a hash bucked to track dma mappings that presumes the dma_handle is immutable during teh lifetime of a mapping (i.e. the address returned to pci_map_[page|single|etc] is what will always be passed to the sync_* api calls without any offset. The dma debug code needs to be taught that a particular sync may not be given at the start of a mapping, and may cross mapping boundaries. I'll try put together a patch. Created attachment 504916 [details]
patch to dma library to allow flexibility in entry discovery
Jay,Divy, would one of you mind testing this out to see if it resolves this problem? Thanks!
ok, been almost a month here, any feedback on this patch? The patch quiets the warning. Thank you, sent upstream: http://marc.info/?l=linux-kernel&m=131109732210343&w=2 new patch version sent to that thread based on reviews. Jay, the new version of the patch, with some minor adjustments has been accepted upstream. I just noted today that F16 alpha is on its way out. Do you still want this backported to F14 or should I close it as NEXTRELEASE? NEXTRELEASE is fine with me. copy that, it should get pulled into mainline and rawhide by the time F16 releases. |