1389120 – High CPU in distribute mode

Bug 1389120 - High CPU in distribute mode

Summary: High CPU in distribute mode

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	gluster-smb
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-26 22:01 UTC by jasonc
Modified:	2017-11-07 10:36 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:36:36 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description jasonc 2016-10-26 22:01:39 UTC

I'm seeing high cpu utilization with glusterfsd.

CPU utilization goes beyond 700%

Please let me know if its somehow misconfigured on my end.

Volume Name: dist-vol
Type: Distribute
Volume ID: 330157c4-49a7-4982-9013-86b5f2e65fc6
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: storage-1:/tank/gshare
Brick2: storage-2:/tank/gshare
Brick3: storage-3:/tank/gshare
Options Reconfigured:
cluster.min-free-disk: 1%
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
performance.cache-size: 2GB
server.allow-insecure: on
performance.io-thread-count: 64
performance.quick-read: on
performance.io-cache: on
performance.write-behind: on
performance.read-ahead: on
performance.write-behind-window-size: 1GB
performance.cache-max-file-size: 2GB
performance.stat-prefetch: on
performance.flush-behind: on
cluster.data-self-heal: off
cluster.entry-self-heal: off
cluster.metadata-self-heal: off
cluster.lookup-optimize: on
server.outstanding-rpc-limit: 128
server.event-threads: 3
client.event-threads: 3

Comment 2 Atin Mukherjee 2016-10-27 04:19:59 UTC

Are you using redhat gluster storage or community version of GlusterFS? This BZ has been filed with the former. If you meant the community version please change the product to GlusterFS. Along with that a little more details like what exact version, workload type with client and brick logs would help us in debugging the issue further. Currently the information provided looks to be inadequate.

Comment 3 jasonc 2016-10-27 05:40:56 UTC

Thanks Atin,
I have corrected the info.

The interesting thing is even with the diagnostics settings to log at error level, no logs are being recorded under /var/log/glusterfs/bricks

The load is heavy smb usage. We have thousands of small (KB files) and large (Several GB file sizes) data and the cluster's high load and high glusterfsd cpu utilization tends to shift from node to node.

Is there a known bug for this type of behavior? I wasn't able to find answer online nor anyone wasn't able to shed some light on the irc. They encouraged filing a bug for this behavior.

Comment 4 Atin Mukherjee 2016-11-10 13:07:04 UTC

I am moving this BZ to gluster-smb component since as per comment 3 the workload is on SMB.

Comment 5 jasonc 2016-11-10 18:52:41 UTC

top - 10:50:42 up 3 days,  4:06,  1 user,  load average: 51.29, 47.47, 44.14
Tasks: 645 total,   1 running, 644 sleeping,   0 stopped,   0 zombie
%Cpu0  : 50.0 us, 45.3 sy,  0.0 ni,  3.4 id,  0.3 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu1  : 45.9 us, 49.3 sy,  0.0 ni,  4.4 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 46.1 us, 48.5 sy,  0.0 ni,  3.7 id,  1.0 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu3  : 48.1 us, 47.8 sy,  0.0 ni,  3.0 id,  0.3 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu4  : 31.5 us, 60.6 sy,  0.0 ni,  4.1 id,  1.7 wa,  0.0 hi,  2.1 si,  0.0 st
%Cpu5  : 38.0 us, 55.8 sy,  0.0 ni,  2.1 id,  3.1 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu6  : 40.2 us, 52.0 sy,  0.0 ni,  4.7 id,  1.0 wa,  0.0 hi,  2.0 si,  0.0 st
%Cpu7  : 39.0 us, 52.9 sy,  0.0 ni,  3.4 id,  2.4 wa,  0.0 hi,  2.4 si,  0.0 st
KiB Mem:  13200491+total, 13131869+used,   686212 free,    11228 buffers
KiB Swap: 62465916 total,   256332 used, 62209584 free. 11889336+cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND           
 6267 root      20   0 8935784 6.642g   3972 S 676.8  5.3   4243:49 glusterfsd        
 7156 root      20   0 2224496 1.547g   3896 S  23.9  1.2   1490:24 glusterfs

Comment 6 Niels de Vos 2017-11-07 10:36:36 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.