Red Hat Bugzilla – Bug 57471
pci window memcpy optimizations
Last modified: 2007-04-18 12:38:42 EDT
Description of Contribution:
The i82559 network device drivers are doing lots of half word aligned
memcpy's to/from the PCI window. The memcpy function is not optimized for
this and so uses its fallback of byte by byte copies. For the EBSA platform
(and probably afe1) any accesses to the PCI window are slow since they are
uncached, unbuffered, none burst etc... Its much more efficient to do 2
half word accesses to normal memory and one word access to PCI window
memory. This patch adds functions which do this. Tests have shown various
degrees of speed up from 40% to nearly 4x.
This patch could be made more generic. At the moment it just modifes the
two i82559 drivers. It will only work on little endian machines. What may
be interesting is to make these functions part of the pci library. By
default memcpy could be used, but the hardware specific part of the pci
code may provide its own implementation optimised to the architecture? Just
Version-Release number of selected component (if applicable): 1.5.2
Created attachment 40487 [details]
The patch (for 1.5.2)
Created attachment 40488 [details]
.pdf file describing the work/results (for interest only)
While this is obviously a good patch for you to have. I'm not entirely sure
about this going in generally. Firstly, the 82559 driver is generic i.e. cross
platform, and so we can't put in endian specific dependencies.
Secondly, as alluded to in the recent eCos thread, it would be better to just
a) the generic memcpy to be more efficient for unaligned copies, possibly also
with a configuration dependent choice of using a Duff's device copy
b) pulling in architecture/target specific optimizations. This requires a
framework to be defined though.
The arm/xscale code will not help (much). Its designed for symmetric access
times for src & dst. Thats very untrue for PCI window accesses. eg aligned word
copies between normal memory i get arount 90Mbyte/s. Word copies to/from PCI
window to normal memory i get about 16Mbytes/S max.
So we need a memcpy optimized for normal memory and a memcopy optimized for PCI
window memory. I would put the PCI memcopy into the PCI library.
I would realy see this code as half the code needed for the PCI library. It
should not be too hard to write big endian code for the other half. (I don't
have a big endian embedded system, but i could at least do some testing on a Sun
This bug has moved to http://bugs.ecos.sourceware.org/show_bug.cgi?id=57471