Description of Problem: I have observed that there is some resource that is allocated on a per-mount basis that is becomes a performance issue under high NFS I/O loads. The symptoms are that file operations to the individual mount slow noticably (4 seconds to run 'df' on the mount, stat() taking 2-5 seconds, etc) under load. Other mounts to the same NFS server still respond in a reasonable timeframe, even if they are simply a second mount of the same export. These symptoms lead me to believe that there is either a lock being held, or a linked list that is being walked linearly that fails to scale under a high number of NFS operations (1300-1800 NFS ops/sec, >20Mbytes/sec I/O). Were any NFS patches rolled into 2.4.18 that may address this? If not, has anyone been in touch with Trond about the current stability of his latest NFS patch sets? Version-Release number of selected component (if applicable): 2.4.9-31 and 2.4.9-34
I have also tested 2.4.18-5, and the maximum performance under that kernel is around 30Mb/s. The long waits for stat() and friends did not show up, but that could just be a result of the server not handling as much traffic because of the new upper performance limits in 2.4.18-5. I have also observed 2 of the 4 hosts that I have installed it on have had processes go into device wait and not recover, but new processes have no problems accessing the same NFS mount point. One of the hosts has 1G of RAM, the other host has 2G of RAM.
2.4.18-17.7.x still exhibits the same device wait behavior as 2.4.18-5 and 2.4.18-10 did.
How did you generate the "high NFS I/O loads"? How did you get these numbers (1300-1800 NFS ops/sec, >20Mbytes/sec I/O)?