Turning on performance.read-ahead results in a huge increase in network traffic during reads, up to 2.5X the amount of traffic expected. Red Hat Enterprise Linux Server release 6.6 (Santiago) Red Hat Storage Server 3.0 Update 2 8-node cluster, 10G network, 6 JBOD bricks per node, replication=2 This is the test protocol: 1. Run: gluster volume set HadoopVol performance.read-ahead on 2. Write a 1 GB file to gluster. We monitor performance stats during tests with collectl and colmux (collectl.sourceforge.net). When writing a 1 GB file, we see 2GB leave the client node over the network. Each server node (replication=2) receives 1GB over the net and writes 1GB to disk, as expected. 3. Drop the linux page cache on all nodes. 4. Read the file sequentially on the node that wrote it, with an I/O size of 256KB. We see one of the two servers read 1GB from disk and send 2.5 GB to the client node. That's 2.5 times the amount of network traffic expected. The factor of 2.5X does not depend on the file size, from 10 MB up to 10 GB. The factor does depend on the I/O size. For a read size of 256KB or less, the factor is about 2.5X. For a 1 MB read size, the factor is 1.6X. For a read size of 16 MB, the extra traffic is negligible. It looks like each read causes an unnecessary 500-600 KB of traffic. When we turn off performance.read-ahead, this problem goes away. Just in case there was a problem with the the counters used by collectl, we captured tcpdump traces during the tests, and added up the packet sizes. These results agree with the collectl data. Hank
From client logs attached to bz 1393419, I could see that reads from kernel are interspersed with attr calls. These fstat calls flush the read-ahead cache. So, data is read more than once - once for read-ahead and later when application actually issues read. This explains the extra data read over network. From the same logs, it also looks read-ahead logic is bit aggressive making this problem more prominent. Had there been no fstat calls from kernel, the prefetched data would be eventually consumed as cache hit and it would not have been a problem.
REVIEW: https://review.gluster.org/20510 (performance/read-ahead: stricter adherence to force-atime-update) posted (#1) for review on master by Raghavendra G
COMMIT: https://review.gluster.org/20510 committed in master by "Raghavendra G" <rgowdapp> with a commit message- performance/read-ahead: stricter adherence to force-atime-update Throwaway read-ahead cache in fstat only if force-atime-update is set. Note that fstat flushes read-ahead cache only for atime consistency. However if atime consistency is needed user is required to set force-atime-update which updates atime on backend fs even though application reads are served from read-ahead cache. So, if user has not set force-atime-update, atime won't be accurate and there is no point in flushing read-ahead cache in fstats. mounts requiring atime consistency have to mandatorily set force-atime-update. Also note that normally kernel interspers reads with fstat. So, read-ahead is not effective as fstats flush read-ahead-cache. Instead it regresses performance due to wasted network reads. It is recommended to turn off read-ahead if applications require atime consistency. This patch is aimed at applications which don't require atime consistency. Without atime consistency required, read-ahead cache is effective and increases performance of sequential reads. Change-Id: I122bbc410cee96661823f9c4b934383495c18446 Signed-off-by: Raghavendra G <rgowdapp> Fixes: bz#1601166
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report. glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html [2] https://www.gluster.org/pipermail/gluster-users/