Red Hat Bugzilla – Bug 994209
libgfapi has poor sequential performance at small transfer sizes
Last modified: 2015-12-03 12:18:59 EST
Description of problem:
Overall libgfapi is performing well but on reads, here's a case where libgfapi should outperform FUSE but loses badly. This means we're not maximizing the benefit of libgfapi on writes either. This defeats whole purpose of libgfapi, which is to reduce Gluster filesystem overhead.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. download and compile glfs_io_test.c (see comments at top) at this URL:
2.create a Gluster volume like one shown below, with a 10-GbE link between client and server. Replication is not needed for this test.
3.create a 16-GB file in the volume, then read it with the glfs_io_test.c program using the parameters shown below.
# GFAPI_HOSTNAME=perf86-ib GFAPI_VOLNAME=nossd GFAPI_FSZ=16384 GFAPI_RECSZ=1 GFAPI_LOAD=seq-rd ./glfs_io_test
GLUSTER: vol=nossd xport=tcp host=perf86-ib port=24007 fuse?No
WORKLOAD: type = seq-rd , file name = x.tmp , file size = 16384 MB, record size = 1 KB
total transfers = 16777216
elapsed time = 144.59 sec
throughput = 113.31 MB/sec
IOPS = 116033.97 (sequential read)
should be able to run at line speed because readahead buffer is already in user process's address space, no context switching required.
"top" with "H" option (per thread display shows that one glfs_io_test process thread is at 99% utilization, so we have a CPU bottleneck.
The attached screenshot of perf top shows where the hotspot is in libgfapi, at this URL:
Here are graphs of libgfapi performance, note that sequential writes do not have the desired performance either on small transfer size.
This gluster volume profile shows that readahead translator is doing its job and all reads across the wire are max RPC size of 128 KB. The file is cached in memory on the brick server.
Interval 6 Stats:
Block Size: 131072b+
No. of Reads: 31392
No. of Writes: 0
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 140.98 us 37.00 us 60440.00 us 31392 READ
Duration: 35 seconds
Data Read: 4114612224 bytes
Data Written: 0 bytes
here's the volume parameters:
Volume Name: nossd
Volume ID: e8b8997f-d5b6-4c05-ac7b-2283402e0640
Number of Bricks: 1
may be affected by upcoming fix to bz 1009134, needs retest.
folks, this bug still exists in RHS 2.1 U2. I see the same behavior on reads as before with small record size. I think I know something more about what it's doing. See files at
r.log contains throughput for a single-threaded gfapi sequential read of an 8-GB file as a function of record size. It hasn't changed since initial post.
s.log is a system call trace with 1-KB record size. You'll note that the sequence is repeated roughly 128 times between RPCs (writev/readv)
31499 geteuid() = 0
31499 getegid() = 0
31499 getgroups(200, ) = 1
I think this is what happens every time the app reads 1 KB. Why does it need to poll security info this often? I can see doing that once per RPC but once data is already in the user's address space, the battle is lost!
rhs21u2-gfapi-perf-top-rsz1k.jpg , you'll see perf top. There is some sort of interaction with Gluster logging, I suspect you need to avoid calling the gluster logging routine (which forces construction of the arguments?) unless you have DEBUG log level established.
I'm running rpms from build at
[root@gprfc093 ~]# rpm -qa | grep glusterfs
and on server:
[root@gprfc093 ~]# ssh gprfs048 rpm -qa | grep glusterfs
Since libgfapi glfs.h no longer is in devel RPM, I have to pull it from the source RPM to compile my test program, which hasn't changed.
This benchmark is now open-source at https://github.com/bengland2/parallel-libgfapi .
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/
If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.