1673058 – Network throughput usage increased x5

Bug 1673058 - Network throughput usage increased x5

Summary: Network throughput usage increased x5

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Poornima G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Gluster_5_Affecting_oVirt_4.3 1692093 1692101 1693935
TreeView+	depends on / blocked

Reported:	2019-02-06 15:22 UTC by Jacob
Modified:	2019-04-26 13:49 UTC (History)
CC List:	15 users (show)
Fixed In Version:	glusterfs-5.6
Clone Of:
Clones:	1692093 1692101 (view as bug list)
Environment:
Last Closed:	2019-04-08 14:15:01 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
client throughput graph (102.05 KB, image/png) 2019-02-06 15:22 UTC, Jacob	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22404	0	None	Merged	client-rpc: Fix the payload being sent on the wire	2019-04-08 14:15:00 UTC

Description Jacob 2019-02-06 15:22:09 UTC

Created attachment 1527539 [details]
client throughput graph

Description of problem:

Client network throughput in OUT direction usage increased x5 after an upgrade from 3.11, 3.12 to 5.3 of the server.
Now i have ~110Mbps of traffic in OUT direction for each client and on the server i have a total of ~1450Mbps for each gluster server.
Watch the attachment for graph before/after upgrade network throughput.

Version-Release number of selected component (if applicable):

5.3

How reproducible:

upgrade from 3.11, 3.12 to 5.3

Steps to Reproduce:
1. https://docs.gluster.org/en/v3/Upgrade-Guide/upgrade_to_3.12/
2. https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_5/

Actual results:

Network throughput usage increased x5

Expected results:

Just the features and the bugfix of the 5.3 release

Cluster Information:

2 nodes with 1 volume with 2 distributed brick for each node

Number of Peers: 1

Hostname: 10.2.0.180
Uuid: 368055db-9e90-433f-9a56-bfc1507a25c5
State: Peer in Cluster (Connected)

Volume Information:

Volume Name: storage_other
Type: Distributed-Replicate
Volume ID: 6857bf2b-c97d-4505-896e-8fbc24bd16e8
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.2.0.181:/mnt/storage-brick1/data
Brick2: 10.2.0.180:/mnt/storage-brick1/data
Brick3: 10.2.0.181:/mnt/storage-brick2/data
Brick4: 10.2.0.180:/mnt/storage-brick2/data
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on

Status of volume: storage_other
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.2.0.181:/mnt/storage-brick1/data   49152     0          Y       1165
Brick 10.2.0.180:/mnt/storage-brick1/data   49152     0          Y       1149
Brick 10.2.0.181:/mnt/storage-brick2/data   49153     0          Y       1166
Brick 10.2.0.180:/mnt/storage-brick2/data   49153     0          Y       1156
Self-heal Daemon on localhost               N/A       N/A        Y       1183
Self-heal Daemon on 10.2.0.180              N/A       N/A        Y       1166

Task Status of Volume storage_other
------------------------------------------------------------------------------
There are no active volume tasks

Comment 1 Nithya Balachandran 2019-02-21 07:53:44 UTC

Is this high throughput consistent?
Please provide a tcpdump of the client process for about 30s to 1 min during the high throughput to see what packets gluster is sending:

In a terminal to the client machine:
tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22

Wait for 30s-1min and stop the capture. Send us the pcap file.

Another user reported that turning off readdir-ahead worked for him. Please try that after capturing the statedump and see if it helps you.

Comment 2 Alberto Bengoa 2019-02-21 11:17:22 UTC

(In reply to Nithya Balachandran from comment #1)
> Is this high throughput consistent?
> Please provide a tcpdump of the client process for about 30s to 1 min during
> the high throughput to see what packets gluster is sending:
> 
> In a terminal to the client machine:
> tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
> 
> Wait for 30s-1min and stop the capture. Send us the pcap file.
> 
> Another user reported that turning off readdir-ahead worked for him. Please
> try that after capturing the statedump and see if it helps you.

I'm the another user and I can confirm the same behaviour here. 

On our tests we did:

- Mounted the new cluster servers (running 5.3 version) using client 5.3
- Started a find . -type d on a directory with lots of directories. 
- It generated an outgoing traffic (on the client) of around 90mbps (so, inbound traffic on gluster server). 

We repeated the same test using 3.8 client (on 5.3 cluster) and the outgoing traffic on the client was just around 1.3 mbps. 

I can provide pcaps if needed.

Cheers,
Alberto Bengoa

Comment 3 Nithya Balachandran 2019-02-22 04:09:41 UTC

Assigning this to Amar to be reassigned appropriately.

Comment 4 Jacob 2019-02-25 13:42:45 UTC

i'm not able to upload in the bugzilla portal due to the size of the pcap.
You can download from here:

https://mega.nz/#!FNY3CS6A!70RpciIzDgNWGwbvEwH-_b88t9e1QVOXyLoN09CG418

Comment 5 Poornima G 2019-03-04 15:23:14 UTC

Disabling readdir-ahead fixed the issue?

Comment 6 Hubert 2019-03-04 15:32:17 UTC

We seem to have the same problem with a fresh install of glusterfs 5.3 on a debian stretch. We migrated from an existing setup (version 4.1.6, distribute-replicate) to a new setup (version 5.3, replicate), and traffic on clients went up significantly, maybe causing massive iowait on the clients during high-traffic times. Here are some munin graphs:

network traffic on high iowait client: https://abload.de/img/client-eth1-traffic76j4i.jpg
network traffic on old servers: https://abload.de/img/oldservers-eth1nejzt.jpg
network traffic on new servers: https://abload.de/img/newservers-eth17ojkf.jpg

performance.readdir-ahead is on by default. I could deactivate it tomorrow morning (07:00 CEST), and provide tcpdump data if necessary.


Regards,
Hubert

Comment 7 Hubert 2019-03-05 12:03:11 UTC

i set performance.readdir-ahead to off and watched network traffic for about 2 hours now, but traffic is still as high. 5-8 times higher than it was with old 4.1.x volumes.

just curious: i see hundreds of thousands of these messages:

[2019-03-05 12:02:38.423299] W [dict.c:761:dict_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/quick-read.so(+0x6df4) [0x7f0db452edf4] -->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/io-cache.so(+0xa39d) [0x7f0db474039d] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) [0x7f0dbb7e4a38] ) 5-dict: dict is NULL [Invalid argument]

see https://bugzilla.redhat.com/show_bug.cgi?id=1674225 - could this be related?

Comment 8 Jacob 2019-03-06 09:54:26 UTC

Disabling readdir-ahead doesn't change the througput.

Comment 9 Alberto Bengoa 2019-03-06 10:07:59 UTC

Neither to me. 

BTW, read-ahead/readdir-ahead shouldn't generate traffic in the opposite direction? ( Server -> Client)

Comment 10 Nithya Balachandran 2019-03-06 11:40:49 UTC

(In reply to Jacob from comment #4)
> i'm not able to upload in the bugzilla portal due to the size of the pcap.
> You can download from here:
> 
> https://mega.nz/#!FNY3CS6A!70RpciIzDgNWGwbvEwH-_b88t9e1QVOXyLoN09CG418

@Poornima,

the following are the calls and instances from the above:
    104 proc-1  (stat)
   8259 proc-11 (open)
     46 proc-14 (statfs)
   8239 proc-15 (flush)
      8 proc-18 (getxattr)
     68 proc-2  (readlink)
   5576 proc-27 (lookup)
   8388 proc-41 (forget)

Not sure if it helps.

Comment 11 Hubert 2019-03-07 08:34:21 UTC

i made a tcpdump as well:

tcpdump -i eth1 -s 0 -w /tmp/dirls.pcap tcp and not port 2222
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
259699 packets captured
259800 packets received by filter
29 packets dropped by kernel

The file is 1.1G big; gzipped and uploaded it: https://ufile.io/5h6i2

Hope this helps.

Comment 12 Hubert 2019-03-07 09:00:12 UTC

Maybe i should add that the relevant IP addresses of the gluster servers are: 192.168.0.50, 192.168.0.51, 192.168.0.52

Comment 13 Hubert 2019-03-18 13:45:51 UTC

fyi: on a test setup (debian stretch, after upgrade 5.3 -> 5.5) i did a little test:

- copied 11GB of data
- via rsync: rsync --bwlimit=10000 --inplace --- bandwith limit of max. 10000 KB/s
- rsync pulled data over interface eth0
- rsync stats: sent 1,484,200 bytes  received 11,402,695,074 bytes  5,166,106.13 bytes/sec
- so external traffic average was about 5 MByte/s
- result was an internal traffic up to 350 MBit/s (> 40 MByte/s) on eth1 (LAN interface)
- graphic of internal traffic: https://abload.de/img/if_eth1-internal-trafdlkcy.png
- graphic of external traffic: https://abload.de/img/if_eth0-external-trafrejub.png

Comment 14 Poornima G 2019-03-19 06:15:50 UTC

Apologies for the delay, there have been some changes done to quick-read feature, which deals with reading the content of a file in lookup fop, if the file is smaller than 64KB. I m suspecting that with 5.3 the increase in bandwidth may be due to more number of reads of small file(generated by quick-read).

Please try the following:
gluster vol set <volname> quick-read off
gluster vol set <volname> read-ahead off
gluster vol set <volname> io-cache off


And let us know if the network bandwidth consumption decreases, meanwhile i will try to reproduce the same locally.

Comment 15 Hubert 2019-03-19 08:12:04 UTC

I deactivated the 3 params and did the same test again.

- same rsync params: rsync --bwlimit=10000 --inplace
- rsync stats: sent 1,491,733 bytes  received 11,444,330,300 bytes  6,703,263.27 bytes/sec
- so ~6,7 MByte/s or ~54 MBit/s in average (peak of 60 MBit/s) over external network interface
- traffic graphic of the server with rsync command: https://abload.de/img/if_eth1-internal-traf4zjow.png
- so server is sending with an average of ~110 MBit/s and with peak at ~125 MBit/s over LAN interface
- traffic graphic of one of the replica servers (disregard first curve: is the delete of the old data): https://abload.de/img/if_enp5s0-internal-trn5k9v.png
- so one of the replicas receices data with ~55 MBit/s average and peak ~62 MBit/s
- as a comparison - traffic before and after changing the 3 params (rsync server, highest curve is relevant):
- https://abload.de/img/if_eth1-traffic-befortvkib.png

So it looks like the traffic was reduced to about a third. Is it this what you expected?

If so: traffic would be still a bit higher when i compare 4.1.6 and 5.3 - here's a graphic of one client in our live system after switching from 4.1.6 (~20 MBit/s) to 5.3. (~100 MBit/s in march):

https://abload.de/img/if_eth1-comparison-gly8kyx.png

So if this traffic gets reduced to 1/3: traffic would be ~33 MBit/s then. Way better, i think. And could be "normal"?

Thx so far :-)

Comment 16 Poornima G 2019-03-19 09:23:48 UTC

Awesome thank you for trying it out, i was able to reproduce this issue locally, one of the major culprit was the quick-read. The other two options had no effect in reducing the bandwidth consumption. So for now as a workaround, can disable quick-read:

# gluster vol set <volname> quick-read off

Quick-read alone reduced the bandwidth consumption by 70% for me. Debugging the rest 30% increase. Meanwhile, planning to make this bug a blocker for our next gulster-6 release.

Will keep the bug updated with the progress.

Comment 17 Hubert 2019-03-19 10:07:35 UTC

i'm running another test, just alongside... simply deleting and copying data, no big effort. Just curious :-)

2 little questions:

- does disabling quick-read have any performance issues for certain setups/scenarios?
- bug only blocker for v6 release? update for v5 planned?

Comment 18 Poornima G 2019-03-19 10:36:20 UTC

(In reply to Hubert from comment #17)
> i'm running another test, just alongside... simply deleting and copying
> data, no big effort. Just curious :-)
I think if the volume hosts small files, then any kind of operation around these files will see increased bandwidth usage.

> 
> 2 little questions:
> 
> - does disabling quick-read have any performance issues for certain
> setups/scenarios?
Small file reads(files with size <= 64kb) will see reduced performance. Eg: web server use case.
> - bug only blocker for v6 release? update for v5 planned?
Yes there will be updated for v5, not sure when. The updates for major releases are made once in every 3 or 4 weeks not sure. For critical bugs the release will be made earlier.

Comment 19 Alberto Bengoa 2019-03-19 11:54:58 UTC

Hello guys,

Thanks for your update Poornima.

I was already running quick-read off here so, on my case, I noticed the traffic growing consistently after enabling it.

I've made some tests on my scenario, and I wasn't able to reproduce your 70% reduction results. To me, it's near 46% of traffic reduction (from around 103 Mbps to around 55 Mbps, graph attached here: https://pasteboard.co/I68s9qE.png )

What I'm doing is just running a find . type -d on a directory with loads of directories/files.

Poornima, if you don't mind to answer a question, why are we seem this traffic on the inbound of gluster servers (outbound of clients)? On my particular case, the traffic should be basically on the opposite direction I think, and I'm very curious about that.

Thank you,

Alberto

Comment 20 Poornima G 2019-03-22 17:42:54 UTC

Thank You all for the report. We have the RCA, working on the patch will be posting it shortly.
The issue was with the size of the payload being sent from the client to server for operations like lookup and readdirp. Hence worakload involving lookup and readdir would consume a lot of bandwidth.

Comment 21 Worker Ant 2019-03-24 10:42:32 UTC

REVIEW: https://review.gluster.org/22404 (client-rpc: Fix the payload being sent on the wire) posted (#1) for review on release-5 by Poornima G

Comment 22 Pavel Znamensky 2019-03-28 13:25:15 UTC

Unfortunately, it's blocker for us too. As Jacob, we've faced with 4x increasing outgoing traffic on clients.
Disabling read-ahead and readdir-ahead didn't help. Disabling quick-read helped a little bit.
Look forward to the fix and hope this bug is marked as critical so fix for the 5x branch will be released earlier.

Comment 23 Worker Ant 2019-03-29 14:29:58 UTC

REVISION POSTED: https://review.gluster.org/22404 (client-rpc: Fix the payload being sent on the wire) posted (#2) for review on release-5 by Poornima G

Comment 24 Poornima G 2019-03-29 14:39:53 UTC

(In reply to Znamensky Pavel from comment #22)
> Unfortunately, it's blocker for us too. As Jacob, we've faced with 4x
> increasing outgoing traffic on clients.
> Disabling read-ahead and readdir-ahead didn't help. Disabling quick-read
> helped a little bit.
> Look forward to the fix and hope this bug is marked as critical so fix for
> the 5x branch will be released earlier.

Will try to make a release as soon as the patch is merged. Thanks for your update.
Have posted the patch, the link can be found in the previous comment.

Comment 25 Pavel Znamensky 2019-03-29 14:45:44 UTC

>Will try to make a release as soon as the patch is merged. Thanks for your update.
>Have posted the patch, the link can be found in the previous comment.

Thanks for the quick fix!

Comment 26 Worker Ant 2019-03-29 15:24:06 UTC

REVIEW: https://review.gluster.org/22404 (client-rpc: Fix the payload being sent on the wire) posted (#3) for review on release-5 by Poornima G

Comment 27 Worker Ant 2019-04-08 14:15:01 UTC

REVIEW: https://review.gluster.org/22404 (client-rpc: Fix the payload being sent on the wire) merged (#4) on release-5 by Shyamsundar Ranganathan

Comment 28 Shyamsundar 2019-04-18 11:10:09 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.6, please open a new bug report.

glusterfs-5.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-April/000123.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 29 Poornima G 2019-04-24 04:49:22 UTC

Can anyone verify if the issue is not seen in 5.6 anymore?

Comment 30 Alberto Bengoa 2019-04-24 08:00:36 UTC

(In reply to Poornima G from comment #29)
> Can anyone verify if the issue is not seen in 5.6 anymore?

I'm planning to test it soon. My environment is partially in production so I need to arrange a maintenance window to do that.

I will send an update here as soon as I finish.

Comment 31 Hubert 2019-04-24 09:25:49 UTC

I'll so some tests probably next week, tcpdump included.

Comment 32 Alberto Bengoa 2019-04-26 13:49:02 UTC

Hello Poornima,

I did some tests today and, in my scenario, it seems fixed.

What I did this time:

- Mounted the new cluster (running 5.6 version) using a client running version 5.5
- Started a find . -type d on a directory with lots of directories.
- It generated an outgoing traffic (on the client) of around 40mbps [1]

Then I upgraded the client to version 5.6 and re-run the tests, and had around 800kbps network traffic[2]. Really good!

I've made a couple of tests more, enabling quick read[3][4]. It may have slightly increased my network traffic, but nothing really significant.


[1] - https://pasteboard.co/IbVwWTP.png
[2] - https://pasteboard.co/IbVxgVU.png
[3] - https://pasteboard.co/IbVxuaJ.png
[4] - https://pasteboard.co/IbVxCbZ.png

This is my current volume info:

Volume Name: volume
Type: Replicate
Volume ID: 1d8f7d2d-bda6-4f1c-aa10-6ad29e0b7f5e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs02tmp:/var/data/glusterfs/volume/brick
Brick2: fs01tmp:/var/data/glusterfs/volume/brick
Options Reconfigured:
network.ping-timeout: 10
performance.flush-behind: on
performance.write-behind-window-size: 16MB
performance.cache-size: 1900MB
performance.io-thread-count: 32
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
server.allow-insecure: on
server.event-threads: 4
client.event-threads: 4
performance.readdir-ahead: off
performance.read-ahead: off
performance.open-behind: on
performance.write-behind: off
performance.stat-prefetch: off
performance.quick-read: off
performance.strict-o-direct: on
performance.io-cache: off
performance.read-after-open: yes
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000


Let me know if you need anything else. 

Cheers,

Alberto Bengoa

Note You need to log in before you can comment on or make changes to this bug.