Bug 57471

Summary: pci window memcpy optimizations
Product: [Retired] eCos Reporter: Andrew Lunn <andrew.lunn>
Component: Patches and contributionsAssignee: ecc-bugs-int
Status: CLOSED WONTFIX QA Contact: ecc-bugs-int
Severity: medium Docs Contact:
Priority: medium    
Version: 1.5.2CC: jlarmour
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-20 16:08:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The patch (for 1.5.2)
none
.pdf file describing the work/results (for interest only) none

Description Andrew Lunn 2001-12-13 14:58:55 UTC
Description of Contribution:

The i82559 network device drivers are doing lots of half word aligned
memcpy's to/from the PCI window. The memcpy function is not optimized for
this and so uses its fallback of byte by byte copies. For the EBSA platform
(and probably afe1) any accesses to the PCI window are slow since they are
uncached, unbuffered, none burst etc... Its much more efficient to do 2
half word accesses to normal memory and one word access to PCI window
memory. This patch adds functions which do this. Tests have shown various
degrees of speed up from 40% to nearly 4x.

This patch could be made more generic. At the moment it just modifes the
two i82559 drivers. It will only work on little endian machines. What may
be interesting is to make these functions part of the pci library. By
default memcpy could be used, but the hardware specific part of the pci
code may provide its own implementation optimised to the architecture? Just
an idea...  

Version-Release number of selected component (if applicable): 1.5.2

Comment 1 Andrew Lunn 2001-12-13 14:59:45 UTC
Created attachment 40487 [details]
The patch (for 1.5.2)

Comment 2 Andrew Lunn 2001-12-13 15:00:57 UTC
Created attachment 40488 [details]
.pdf file describing the work/results (for interest only)

Comment 3 Jonathan Larmour 2001-12-13 16:31:58 UTC
While this is obviously a good patch for you to have. I'm not entirely sure
about this going in generally. Firstly, the 82559 driver is generic i.e. cross
platform, and so we can't put in endian specific dependencies.

Secondly, as alluded to in the recent eCos thread, it would be better to just
fix:
a) the generic memcpy to be more efficient for unaligned copies, possibly also
with a configuration dependent choice of using a Duff's device copy

b) pulling in architecture/target specific optimizations. This requires a
framework to be defined though.


Comment 4 Andrew Lunn 2001-12-13 17:17:57 UTC
The arm/xscale code will not help (much). Its designed for symmetric access
times for src & dst. Thats very untrue for PCI window accesses. eg aligned word
copies between normal memory i get arount 90Mbyte/s. Word copies to/from PCI
window to normal memory i get about 16Mbytes/S max. 

So we need a memcpy optimized for normal memory and a memcopy optimized for PCI
window memory. I would put the PCI memcopy into the PCI library. 

I would realy see this code as half the code needed for the PCI library. It
should not be too hard to write big endian code for the other half. (I don't
have a big endian embedded system, but i could at least do some testing on a Sun
Sparx machine).

Comment 5 Alex Schuilenburg 2003-06-20 16:08:47 UTC
This bug has moved to http://bugs.ecos.sourceware.org/show_bug.cgi?id=57471