1214489 – performance.read-ahead causes huge increase in unnecessary network traffic

Bug 1214489 - performance.read-ahead causes huge increase in unnecessary network traffic

Summary: performance.read-ahead causes huge increase in unnecessary network traffic

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	read-ahead
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Csaba Henk
QA Contact:	Sayalee
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1601166 1696807
TreeView+	depends on / blocked

Reported:	2015-04-22 20:30 UTC by Hank Jakiela
Modified:	2019-10-30 12:20 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-6.0-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1601166 (view as bug list)
Environment:
Last Closed:	2019-10-30 12:19:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:3249	0	None	None	None	2019-10-30 12:20:11 UTC

Description Hank Jakiela 2015-04-22 20:30:38 UTC

Turning on performance.read-ahead results in a huge increase in network traffic during reads, up to 2.5X the amount of traffic expected.

Red Hat Enterprise Linux Server release 6.6 (Santiago)
Red Hat Storage Server 3.0 Update 2

8-node cluster, 10G network, 6 JBOD bricks per node, replication=2

This is the test protocol:

1. Run: gluster volume set HadoopVol performance.read-ahead on

2. Write a 1 GB file to gluster. We monitor performance stats during tests with collectl and colmux (collectl.sourceforge.net). When writing a 1 GB file, we see 2GB leave the client node over the network. Each server node (replication=2) receives 1GB over the net and writes 1GB to disk, as expected.

3. Drop the linux page cache on all nodes.

4. Read the file sequentially on the node that wrote it, with an I/O size of 256KB. We see one of the two servers read 1GB from disk and send 2.5 GB to the client node. That's 2.5 times the amount of network traffic expected.

The factor of 2.5X does not depend on the file size, from 10 MB up to 10 GB.

The factor does depend on the I/O size. For a read size of 256KB or less, the factor is about 2.5X. For a 1 MB read size, the factor is 1.6X. For a read size of 16 MB, the extra traffic is negligible. It looks like each read causes an unnecessary 500-600 KB of traffic.

When we turn off performance.read-ahead, this problem goes away.

Just in case there was a problem with the the counters used by collectl, we captured tcpdump traces during the tests, and added up the packet sizes. These results agree with the collectl data.

The cluster is in the Phoenix lab. Contact me for access.

Hank

Comment 5 Raghavendra G 2018-06-08 09:20:24 UTC

Looks to be a duplicate of bz 1220845

Comment 6 Raghavendra G 2018-06-08 21:14:09 UTC

From client logs attached to bz 1393419, I could see that reads from kernel are interspersed with attr calls. These fstat calls flush the read-ahead cache. So, data is read more than once - once for read-ahead and later when application actually issues read. This explains the extra data read over network.

From the same logs, it also looks read-ahead logic is bit aggressive making this problem more prominent. Had there been no fstat calls from kernel, the prefetched data would be eventually consumed as cache hit and it would not have been a problem.

Even with stat-prefetch/md-cache turned on we can hit this bug as default timeout for md-cache is 1s and there is a very good chance that this cache might've timedout when it actually sees an fstat. So, if we are using md-cache we need to turn on "group metadata-cache" profile which makes sure larger timeouts are set on md-cache and upcall used for handling cache-coherency issues.

Comment 13 Sweta Anandpara 2018-12-11 07:18:55 UTC

At the release stakeholders meeting this morning, it was agreed to push this out of proposed list of 3.4.3, and to be considered for a future batch update.

Comment 34 errata-xmlrpc 2019-10-30 12:19:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249

Note You need to log in before you can comment on or make changes to this bug.