Red Hat Bugzilla – Bug 67199
2.4.18-4,5: very slow 'sync' nfs writes (~50k/sec)
Last modified: 2014-01-21 17:48:03 EST
kernel-2.4.18-4,kernel-2.4.18-5(errata) yields very slow nfs writes (~50K/sec)
on mounts using the 'sync' option (and on mounts that do not specify either
sync or async) against both a 2.4.18-5 and 2.2.19-6.2.16 (rh62) nfs server.
Both nfs server and client are on fast ethernet switches. I tried varying
nfsvers=2,3 and rsize/wsize=1024,4096,8192 all with similar results.
Sorry, ene exception, kernel-2.4.18-4 nfs client and kernel-2.2.19... nfs
server does not appear to experience the same write performance degradation
with the 'sync' option.
FYI, all machines in question here are Intel network cards, tested with both
the e100 and eepro100 network drivers.
Note that 2.4.18 kernels default to sync operation instead of async. Does
adding the async option to /etc/exports on the server improve performance?
mounting with the 'async' option yields ~3-4MB/sec writes.
Now that you mention it, is async/sync an exports or mount option (or
both)? So far... I've only been manually using async/sync when using the
It is both. Previous kernels would default to async exports and mounts by
default, but newer kernels default to sync.
There seems to be an important difference in nfs client speed (large slowdown)
between 2.4.18-4 and 2.4.18-5. Are all the defaults the same between those
kernel versions. The mount options I am using (to a Solaris nfs server) are
I am also using an eepro100 NIC with the default redhat driver. This nfs
problem has rendered the linux machines in our lab almost unusable. I will try
playing with different options but the ones I am using now were fine with 2.4.18-4.
I think this may be related to bug 64921.
I'm using Red Hat 7.3 as a client to a Solaris 8 nfs server. In kernel 2.4.18-3
everything worked fine, 2.4.18-4 would hang server and client, unless I put in
rsize,wsize of 8192 or less. With 2.4.18-5, it doesn't hang client or server,
but i get a ratio of client rpc calls to retrans (in nfsstat) of about 10:1 so
it's _real_ slow. Under the 2.4.18-4 setup it's like 100000:1, sync or async.
So something is still busted when using Solaris as a server, which wasn't busted
with the defaults in 2.4.18-3.
Here's more data on write speeds on switched fast ethernet. This was just a
quick test, but highlights nicely the problem combination, sync on both server
Server Client Speed
async async 8.7MB/sec
async sync 6.8MB/sec
sync async 350K/sec
sync sync 50k/sec
Is this really the kind of performance I should expect to get? (I hope not)
2.4.18-5e fixes the slowness problems I was seeing. But sync performance does
suck pretty harshly.
What changed in 2.4.18-5e from 2.4.18-5?
THe changelog does not appear to have been updated and I was wondering what the
5e fixed a problem in the eepro100 driver.
To keep people uptodate, yes we are aware of the slowness of sync/sync writes,
but the fix is not going to be ready for a while.
If I have 2.4.18-5e on both ends, async/async performance is great reading or
writing. If, however, the server is not 2.4.18-5e, then even async/async read
performance bites. I've tried both almost vanilla 2.4.18 (patched with XFS 1.1
from SGI) on a 7.3 system as well 2.4.9-31 on a 7.1 system as the server. In
both cases async/async read performance is under 2MB/s. Write performance is
OK, but not stellar (~6.5MB/s).
Just trying to point out that this isn't just a problem of sync/sync
performance, which really shouldn't be blazing.
A performance issue that is fixed with 5e is a bug in the network driver, and
shouldn't be hoisted on top of the NFS problem.
I am seeing substantial improvements running 2.4.18-5e on my desktop (3c59x
network driver) for writes to a RH 7.2 server (2.4.9-31 kernel), but read speeds
are abysmal (however, the stock 2.4.18-5 is bad for both reads and writes).
Reads and writes to Solaris 2.7 servers are better with the 2.4.18-5e kernel.
The interaction with the 2.4.9-31 machines has me concerned the most (async
ameliorates that a bit, but not as much as I would like).
So I guess the question I have is if/when we might see an errata kernel that
fixes this. I ask b/c I'm trying to determine if I should push 2.4.18-5e to the
machines I maintain or if I should wait for the errata.
if its not too far out I'll wait for the errata but if its going to be a while....
The errata will be ready when it's ready.
I was mostly interested if it was in QA's queue yet or not.
I wasn't trying to be snippy, just trying to avoid duplicating work for myself.
The next errata kernels could be out this week, or several weeks; I just can't
say for certain.
I just tried one of the test kernels (2.4.18-7) as referenced in bug #67461,
and write speeds are back! I'm now getting at least >5MB/sec for any
combination of async/sync.
*** This bug has been marked as a duplicate of 67461 ***