NFS is an important & valuable workload for a cluster file system. GFS2 needs to match or exceed using ext3 under NFS. The common benchmark used is SPECSFS
which RHEL version are we targetting?
Do we have information on current performance and how far from ext3 capabilities the current base is? Withholding devel ack until we have a baseline run.
The present configuration, 4 clients communicating via gigabit ethernet to the server (BIGI) an HP DL580 4 socket extreme Xeon processors (8 logical [ HT disabled ], 16GB RAM, 4 NICs, 2 dual ported Qlogic FC adaptors with each port directly connected to 4 HP MSA1000 each presenting 14 RAID1 LUNS (for a total of 56 filesystems). Note that jumbo frames is enabled but has only shown a benefit for ext3 (not gfs1 or gfs2). SElinux is disabled. Server tuning includes TCP/IP buffers (elevated), and inode/dirent dentry hash tables (elevated), and 128 NFSD threads. Filesystem tuning for EXT3 is a reduction in journal size, ie -J size=4. Some GFS2 tuning has been attempted, but with little effect. GFS2 filesystems are built lock_nolock. The SPECsfs benchmark is configured such that each client defines 56 processes which operates on each filesystem. TCP/IP and NFS V3 are tested only. The workload is in increments of 2000 Ops/sec (2000, 4000, ... ) As of RHEL5.1-2.6.18-52.el5 EXT3 achieved 34000 Ops/sec, overall response time of 1.43 msec/op GFS2 achieved 24500 Ops/sec, overall response time of 1.43 msec/op So peak performance is presently off by 28%. While overall response time is identical, if I calculate the EXT3 overall response time up to the peak GFS2 sustained workload (24000 Ops requested), we get 1.03 msec/op. This is significantly better than GFS2 and needs to be considered too. Here's the raw SFSSUM results. (Requested_Ops Actual_Ops ResponseTime ...) GFS2: 2000 1982 0.6 594064 299 3 T 20284992 4 56 2 2 3.0 4000 4032 0.7 1207562 299 3 T 40564160 4 56 2 2 3.0 6000 6076 0.8 1818141 299 3 T 60843328 4 56 2 2 3.0 8000 8099 0.8 2421659 299 3 T 81122496 4 56 2 2 3.0 10000 10062 0.9 3016076 299 3 T 101401664 4 56 2 2 3.0 12000 12096 1.1 3628919 300 3 T 121680832 4 56 2 2 3.0 14000 14135 1.3 4240568 300 3 T 141965824 4 56 2 2 3.0 16000 16174 1.6 4852077 300 3 T 162244992 4 56 2 2 3.0 18000 18194 2.0 5458129 300 3 T 182524160 4 56 2 2 3.0 20000 20259 2.5 6067425 299 3 T 202803328 4 56 2 2 3.0 22000 22322 4.0 6674153 299 3 T 223082496 4 56 2 2 3.0 24000 24418 6.8 7300984 299 3 T 243361664 4 56 2 2 3.0 26000 24425 10.7 7303017 299 3 T 263640832 4 56 2 2 3.0 EXT3: 2000 1983 0.5 594423 299 3 T 20284992 4 56 2 2 3.0 4000 4030 0.7 1208997 300 3 T 40564160 4 56 2 2 3.0 6000 6075 0.7 1819493 299 3 T 60843328 4 56 2 2 3.0 8000 8096 0.8 2420813 299 3 T 81122496 4 56 2 2 3.0 10000 10055 0.9 3016539 300 3 T 101401664 4 56 2 2 3.0 12000 12115 1.0 3631562 299 3 T 121680832 4 56 2 2 3.0 14000 14156 1.0 4239739 299 3 T 141965824 4 56 2 2 3.0 16000 16175 1.1 4852574 300 3 T 162244992 4 56 2 2 3.0 18000 18192 1.2 5457744 300 3 T 182524160 4 56 2 2 3.0 20000 20269 1.3 6070461 299 3 T 202803328 4 56 2 2 3.0 22000 22332 1.4 6688303 299 3 T 223082496 4 56 2 2 3.0 24000 24354 1.7 7299979 299 3 T 243361664 4 56 2 2 3.0 26000 26386 2.1 7902556 299 3 T 263640832 4 56 2 2 3.0 28000 28432 2.8 8515250 299 3 T 283925824 4 56 2 2 3.0 30000 30469 3.8 9110304 299 3 T 304204992 4 56 2 2 3.0 32000 32421 4.3 9718310 299 3 T 324484160 4 56 2 2 3.0 34000 34552 6.6 10339762 299 3 T 344763328 4 56 2 2 3.0 36000 33343 7.6 10002776 300 3 T 365042496 4 56 2 2 3.0 Barry
Need to re-run on the latest code to see where we stand.
Barry, when will you get a chance to rerun this on the latest 5.3 kernel/gfs2.
There will not be any code changes for 5.3 so removing that flag. Really just want to quantify where we stand for 5.3 gfs2 with this bug and address any shortcomings in the future.
There are already other bugs open to address performance issues, so we can close this one I think.