57471 – pci window memcpy optimizations

Bug 57471 - pci window memcpy optimizations

Summary: pci window memcpy optimizations

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	eCos
Classification:	Retired
Component:	Patches and contributions
Sub Component:
Version:	1.5.2
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	ecc-bugs-int
QA Contact:	ecc-bugs-int
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-12-13 14:58 UTC by Andrew Lunn
Modified:	2007-04-18 16:38 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-06-20 16:08:47 UTC
Embargoed:

Attachments	(Terms of Use)
The patch (for 1.5.2) (29.35 KB, patch) 2001-12-13 14:59 UTC, Andrew Lunn	no flags	Details \| Diff
.pdf file describing the work/results (for interest only) (105.64 KB, application/octet-stream) 2001-12-13 15:00 UTC, Andrew Lunn	no flags	Details
View All

Description Andrew Lunn 2001-12-13 14:58:55 UTC

Description of Contribution:

The i82559 network device drivers are doing lots of half word aligned
memcpy's to/from the PCI window. The memcpy function is not optimized for
this and so uses its fallback of byte by byte copies. For the EBSA platform
(and probably afe1) any accesses to the PCI window are slow since they are
uncached, unbuffered, none burst etc... Its much more efficient to do 2
half word accesses to normal memory and one word access to PCI window
memory. This patch adds functions which do this. Tests have shown various
degrees of speed up from 40% to nearly 4x.

This patch could be made more generic. At the moment it just modifes the
two i82559 drivers. It will only work on little endian machines. What may
be interesting is to make these functions part of the pci library. By
default memcpy could be used, but the hardware specific part of the pci
code may provide its own implementation optimised to the architecture? Just
an idea...  

Version-Release number of selected component (if applicable): 1.5.2

Comment 1 Andrew Lunn 2001-12-13 14:59:45 UTC

Created attachment 40487 [details]
The patch (for 1.5.2)

Comment 2 Andrew Lunn 2001-12-13 15:00:57 UTC

Created attachment 40488 [details]
.pdf file describing the work/results (for interest only)

Comment 3 Jonathan Larmour 2001-12-13 16:31:58 UTC

While this is obviously a good patch for you to have. I'm not entirely sure
about this going in generally. Firstly, the 82559 driver is generic i.e. cross
platform, and so we can't put in endian specific dependencies.

Secondly, as alluded to in the recent eCos thread, it would be better to just
fix:
a) the generic memcpy to be more efficient for unaligned copies, possibly also
with a configuration dependent choice of using a Duff's device copy

b) pulling in architecture/target specific optimizations. This requires a
framework to be defined though.

Comment 4 Andrew Lunn 2001-12-13 17:17:57 UTC

The arm/xscale code will not help (much). Its designed for symmetric access
times for src & dst. Thats very untrue for PCI window accesses. eg aligned word
copies between normal memory i get arount 90Mbyte/s. Word copies to/from PCI
window to normal memory i get about 16Mbytes/S max. 

So we need a memcpy optimized for normal memory and a memcopy optimized for PCI
window memory. I would put the PCI memcopy into the PCI library. 

I would realy see this code as half the code needed for the PCI library. It
should not be too hard to write big endian code for the other half. (I don't
have a big endian embedded system, but i could at least do some testing on a Sun
Sparx machine).

Comment 5 Alex Schuilenburg 2003-06-20 16:08:47 UTC

This bug has moved to http://bugs.ecos.sourceware.org/show_bug.cgi?id=57471

Note You need to log in before you can comment on or make changes to this bug.