kernel-2.4.18-4,kernel-2.4.18-5(errata) yields very slow nfs writes (~50K/sec) on mounts using the 'sync' option (and on mounts that do not specify either sync or async) against both a 2.4.18-5 and 2.2.19-6.2.16 (rh62) nfs server. Both nfs server and client are on fast ethernet switches. I tried varying nfsvers=2,3 and rsize/wsize=1024,4096,8192 all with similar results.
Sorry, ene exception, kernel-2.4.18-4 nfs client and kernel-2.2.19... nfs server does not appear to experience the same write performance degradation with the 'sync' option.
FYI, all machines in question here are Intel network cards, tested with both the e100 and eepro100 network drivers.
Note that 2.4.18 kernels default to sync operation instead of async. Does adding the async option to /etc/exports on the server improve performance?
mounting with the 'async' option yields ~3-4MB/sec writes. Now that you mention it, is async/sync an exports or mount option (or both)? So far... I've only been manually using async/sync when using the mount command...
It is both. Previous kernels would default to async exports and mounts by default, but newer kernels default to sync.
There seems to be an important difference in nfs client speed (large slowdown) between 2.4.18-4 and 2.4.18-5. Are all the defaults the same between those kernel versions. The mount options I am using (to a Solaris nfs server) are rw,rsize=32768,wsize=32768,nfsvers=3,proto=tcp I am also using an eepro100 NIC with the default redhat driver. This nfs problem has rendered the linux machines in our lab almost unusable. I will try playing with different options but the ones I am using now were fine with 2.4.18-4.
I think this may be related to bug 64921. I'm using Red Hat 7.3 as a client to a Solaris 8 nfs server. In kernel 2.4.18-3 everything worked fine, 2.4.18-4 would hang server and client, unless I put in rsize,wsize of 8192 or less. With 2.4.18-5, it doesn't hang client or server, but i get a ratio of client rpc calls to retrans (in nfsstat) of about 10:1 so it's _real_ slow. Under the 2.4.18-4 setup it's like 100000:1, sync or async. So something is still busted when using Solaris as a server, which wasn't busted with the defaults in 2.4.18-3.
Here's more data on write speeds on switched fast ethernet. This was just a quick test, but highlights nicely the problem combination, sync on both server and client: Server Client Speed async async 8.7MB/sec async sync 6.8MB/sec sync async 350K/sec sync sync 50k/sec Is this really the kind of performance I should expect to get? (I hope not)
2.4.18-5e fixes the slowness problems I was seeing. But sync performance does suck pretty harshly.
What changed in 2.4.18-5e from 2.4.18-5? THe changelog does not appear to have been updated and I was wondering what the fixes were. thanks -sv
5e fixed a problem in the eepro100 driver.
To keep people uptodate, yes we are aware of the slowness of sync/sync writes, but the fix is not going to be ready for a while.
If I have 2.4.18-5e on both ends, async/async performance is great reading or writing. If, however, the server is not 2.4.18-5e, then even async/async read performance bites. I've tried both almost vanilla 2.4.18 (patched with XFS 1.1 from SGI) on a 7.3 system as well 2.4.9-31 on a 7.1 system as the server. In both cases async/async read performance is under 2MB/s. Write performance is OK, but not stellar (~6.5MB/s). Just trying to point out that this isn't just a problem of sync/sync performance, which really shouldn't be blazing.
A performance issue that is fixed with 5e is a bug in the network driver, and shouldn't be hoisted on top of the NFS problem.
I am seeing substantial improvements running 2.4.18-5e on my desktop (3c59x network driver) for writes to a RH 7.2 server (2.4.9-31 kernel), but read speeds are abysmal (however, the stock 2.4.18-5 is bad for both reads and writes). Reads and writes to Solaris 2.7 servers are better with the 2.4.18-5e kernel. The interaction with the 2.4.9-31 machines has me concerned the most (async ameliorates that a bit, but not as much as I would like). -Sean
So I guess the question I have is if/when we might see an errata kernel that fixes this. I ask b/c I'm trying to determine if I should push 2.4.18-5e to the machines I maintain or if I should wait for the errata. if its not too far out I'll wait for the errata but if its going to be a while.... thanks
The errata will be ready when it's ready.
I was mostly interested if it was in QA's queue yet or not. I wasn't trying to be snippy, just trying to avoid duplicating work for myself.
The next errata kernels could be out this week, or several weeks; I just can't say for certain.
I just tried one of the test kernels (2.4.18-7) as referenced in bug #67461, and write speeds are back! I'm now getting at least >5MB/sec for any combination of async/sync. *** This bug has been marked as a duplicate of 67461 ***