1601166 – performance.read-ahead causes huge increase in unnecessary network traffic

Bug 1601166 - performance.read-ahead causes huge increase in unnecessary network traffic

Summary: performance.read-ahead causes huge increase in unnecessary network traffic

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	read-ahead
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1214489
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-14 12:41 UTC by Raghavendra G
Modified:	2018-10-23 15:14 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-5.0
Clone Of:	1214489
Environment:
Last Closed:	2018-10-23 15:14:36 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Raghavendra G 2018-07-14 12:41:53 UTC

Turning on performance.read-ahead results in a huge increase in network traffic during reads, up to 2.5X the amount of traffic expected.

Red Hat Enterprise Linux Server release 6.6 (Santiago)
Red Hat Storage Server 3.0 Update 2

8-node cluster, 10G network, 6 JBOD bricks per node, replication=2

This is the test protocol:

1. Run: gluster volume set HadoopVol performance.read-ahead on

2. Write a 1 GB file to gluster. We monitor performance stats during tests with collectl and colmux (collectl.sourceforge.net). When writing a 1 GB file, we see 2GB leave the client node over the network. Each server node (replication=2) receives 1GB over the net and writes 1GB to disk, as expected.

3. Drop the linux page cache on all nodes.

4. Read the file sequentially on the node that wrote it, with an I/O size of 256KB. We see one of the two servers read 1GB from disk and send 2.5 GB to the client node. That's 2.5 times the amount of network traffic expected.

The factor of 2.5X does not depend on the file size, from 10 MB up to 10 GB.

The factor does depend on the I/O size. For a read size of 256KB or less, the factor is about 2.5X. For a 1 MB read size, the factor is 1.6X. For a read size of 16 MB, the extra traffic is negligible. It looks like each read causes an unnecessary 500-600 KB of traffic.

When we turn off performance.read-ahead, this problem goes away.

Just in case there was a problem with the the counters used by collectl, we captured tcpdump traces during the tests, and added up the packet sizes. These results agree with the collectl data.

Hank

Comment 2 Raghavendra G 2018-07-14 12:42:47 UTC

From client logs attached to bz 1393419, I could see that reads from kernel are interspersed with attr calls. These fstat calls flush the read-ahead cache. So, data is read more than once - once for read-ahead and later when application actually issues read. This explains the extra data read over network.

From the same logs, it also looks read-ahead logic is bit aggressive making this problem more prominent. Had there been no fstat calls from kernel, the prefetched data would be eventually consumed as cache hit and it would not have been a problem.

Comment 3 Worker Ant 2018-07-14 13:01:13 UTC

REVIEW: https://review.gluster.org/20510 (performance/read-ahead: stricter adherence to force-atime-update) posted (#1) for review on master by Raghavendra G

Comment 4 Worker Ant 2018-07-19 06:56:51 UTC

COMMIT: https://review.gluster.org/20510 committed in master by "Raghavendra G" <rgowdapp> with a commit message- performance/read-ahead: stricter adherence to force-atime-update

Throwaway read-ahead cache in fstat only if force-atime-update is
set. Note that fstat flushes read-ahead cache only for atime
consistency. However if atime consistency is needed user is required
to set force-atime-update which updates atime on backend fs even
though application reads are served from read-ahead cache. So, if user
has not set force-atime-update, atime won't be accurate and there is
no point in flushing read-ahead cache in fstats. mounts
requiring atime consistency have to mandatorily set
force-atime-update.

Also note that normally kernel interspers reads with fstat. So,
read-ahead is not effective as fstats flush read-ahead-cache. Instead
it regresses performance due to wasted network reads. It is
recommended to turn off read-ahead if applications require atime
consistency.

This patch is aimed at applications which don't require atime
consistency. Without atime consistency required, read-ahead cache is
effective and increases performance of sequential reads.

Change-Id: I122bbc410cee96661823f9c4b934383495c18446
Signed-off-by: Raghavendra G <rgowdapp>
Fixes: bz#1601166

Comment 5 Shyamsundar 2018-10-23 15:14:36 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.