1403984 – Node node high CPU - healing entries increasing

Bug 1403984 - Node node high CPU - healing entries increasing

Summary: Node node high CPU - healing entries increasing

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-12 19:36 UTC by denmat
Modified:	2018-03-29 11:49 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:37:57 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
statedump from gluster node with high load (655.79 KB, application/x-gzip) 2016-12-12 19:36 UTC, denmat	no flags	Details
glusterfsd process CPU (814.93 KB, image/png) 2016-12-26 23:25 UTC, denmat	no flags	Details
View All

Description denmat 2016-12-12 19:36:00 UTC

Created attachment 1230893 [details]
statedump from gluster node with high load

Description of problem:

3 x node Fedora Cluster in AWS (m4.xlarge) (Fedora 23 (Cloud Edition))
2.5Tb volume

One node out of 3 gets high CPU, healing entries increase. Logs on one node with low CPU usage keep getting messages similar to this:

I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 6954047: SETATTR /ftpdata/<removed>/2_kamih2.zip (c46c2f49-9688-4617-9541-a7181b495f80) ==> (Operation not permitted) [Operation not permitted]

I have asked previously about this message on the mailing list but nobody would answer.

Other two nodes (one with high CPU and one with low) logs are quiet, with occassional heal messages 

To reduce load we have to stop reading and writing to volumes.

After having trouble explained here https://bugzilla.redhat.com/show_bug.cgi?id=1402621 we upgraded the cluster 3.8. The cluster of 3x m4.xlarge hosts (4CPU 16G RAM) support only 12 clients at most. 

Version-Release number of selected component (if applicable):
3.8

How reproducible:
Performance issue occurring regularly.

Steps to Reproduce:
Running find /path -type f -exec stat {} \; can show a significant increase from just one host.


Additional info:

sudo gluster volume info

Volume Name: marketplace_nfs
Type: Distributed-Replicate
Volume ID: 528de1b5-0bd5-488b-83cf-c4f3f747e6cd
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.90.5.105:/data/data0/marketplace_nfs
Brick2: 10.90.3.14:/data/data3/marketplace_nfs
Brick3: 10.90.4.195:/data/data0/marketplace_nfs
Brick4: 10.90.5.105:/data/data1/marketplace_nfs
Brick5: 10.90.3.14:/data/data1/marketplace_nfs
Brick6: 10.90.4.195:/data/data1/marketplace_nfs
Options Reconfigured:
performance.client-io-threads: on
performance.io-thread-count: 12
server.event-threads: 3
client.event-threads: 3
server.outstanding-rpc-limit: 256
cluster.self-heal-readdir-size: 16KB
cluster.self-heal-window-size: 3
diagnostics.brick-log-level: INFO
network.ping-timeout: 15
cluster.quorum-type: none
performance.readdir-ahead: on
cluster.self-heal-daemon: enable
performance.cache-size: 1024MB
cluster.lookup-optimize: on
cluster.data-self-heal-algorithm: diff
nfs.disable: off
cluster.server-quorum-ratio: 51%

sudo gluster volume status
Status of volume: marketplace_nfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.90.5.105:/data/data0/marketplace_n
fs                                          49155     0          Y       20611
Brick 10.90.3.14:/data/data3/marketplace_nf
s                                           49158     0          Y       23161
Brick 10.90.4.195:/data/data0/marketplace_n
fs                                          49155     0          Y       5504
Brick 10.90.5.105:/data/data1/marketplace_n
fs                                          49156     0          Y       20616
Brick 10.90.3.14:/data/data1/marketplace_nf
s                                           49159     0          Y       23166
Brick 10.90.4.195:/data/data1/marketplace_n
fs                                          49156     0          Y       5509
NFS Server on localhost                     2049      0          Y       23250
Self-heal Daemon on localhost               N/A       N/A        Y       23262
NFS Server on ip-10-90-4-195.ec2.internal   2049      0          Y       25289
Self-heal Daemon on ip-10-90-4-195.ec2.inte
rnal                                        N/A       N/A        Y       25297
NFS Server on ip-10-90-5-105.ec2.internal   2049      0          Y       8405
Self-heal Daemon on ip-10-90-5-105.ec2.inte
rnal                                        N/A       N/A        Y       8416

Task Status of Volume marketplace_nfs
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: marketplace_uploads
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.90.4.195:/data/data2/uploads       49157     0          Y       5528
Brick 10.90.3.14:/data/data2/uploads        49160     0          Y       23180
Brick 10.90.5.105:/data/data2/uploads       49157     0          Y       20621
NFS Server on localhost                     2049      0          Y       23250
Self-heal Daemon on localhost               N/A       N/A        Y       23262
NFS Server on ip-10-90-4-195.ec2.internal   2049      0          Y       25289
Self-heal Daemon on ip-10-90-4-195.ec2.inte
rnal                                        N/A       N/A        Y       25297
NFS Server on ip-10-90-5-105.ec2.internal   2049      0          Y       8405
Self-heal Daemon on ip-10-90-5-105.ec2.inte
rnal                                        N/A       N/A        Y       8416

Task Status of Volume marketplace_uploads
------------------------------------------------------------------------------
There are no active volume tasks

Comment 1 denmat 2016-12-13 00:21:02 UTC

On one node with low CPU:
[fedora@ip-10-90-3-14 ~]$ sudo grep 9b886490-6a45-433f-aa49-a398c4ef385d -rs /var/log/glusterfs/*
/var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-12 22:56:45.703001] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 61001: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted]
/var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-12 23:51:11.958781] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 2730089: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted]
/var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:03:47.098154] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 305778: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted]
/var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:04:56.252942] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 385474: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted]
/var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:05:55.345749] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 430351: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted]
/var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:10:39.747917] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 557974: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted]

On the node with high CPU:
$ sudo grep 9b886490-6a45-433f-aa49-a398c4ef385d -rs /var/log/glusterfs/*
/var/log/glusterfs/glustershd.log:[2016-12-13 00:14:25.053114] I [MSGID: 108026] [afr-self-heal-common.c:1076:afr_log_selfheal] 0-marketplace_nfs-replicate-0: Completed data selfheal on 9b886490-6a45-433f-aa49-a398c4ef385d. sources=[0] 2  sinks=1
/var/log/glusterfs/glustershd.log:[2016-12-13 00:14:25.100564] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-marketplace_nfs-replicate-0: performing metadata selfheal on 9b886490-6a45-433f-aa49-a398c4ef385d
/var/log/glusterfs/glustershd.log:[2016-12-13 00:14:25.123101] I [MSGID: 108026] [afr-self-heal-common.c:1076:afr_log_selfheal] 0-marketplace_nfs-replicate-0: Completed metadata selfheal on 9b886490-6a45-433f-aa49-a398c4ef385d. sources=[0] 2  sinks=1

Is that Informational log pointing to a problem (operation not permitted)?

Comment 2 Ravishankar N 2016-12-13 16:06:31 UTC

High CPU usage is a known problem, we are working on introducing some form of throttling mechanism to regulate self-heal traffic.

Meanwhile, some queries:
1. What process was consuming high CPU (was it brick/glustershd/glusternfs process)?
2. How many files were pending heal when you noticed the high cpu load (i.e. 'Number of entries' in each brick when `gluster volume heal volname info` is run)
3. The [Operation not permitted] should normally not occur. Can you share the complete brick log? Also, the filename corresponding to the gfid 9b886490-6a45-433f-aa49-a398c4ef385d. Is there a difference in file owner/permissions/times for this file amongst the 3 bricks? 


Also, some tips:
1. cluster.quorum-type: none --> This has to be set to 'auto' to avoid split-brains.
2. Can you try using cgroups for regulation and let us know if that helps? ( http://blog.gluster.org/2016/09/making-gluster-play-nicely-with-others/ )

Comment 3 denmat 2016-12-26 23:25:40 UTC

Created attachment 1235451 [details]
glusterfsd process CPU

HighCPU and dstat output.

Comment 4 denmat 2016-12-26 23:28:27 UTC

This bug is most likely a duplicate of this issue https://bugzilla.redhat.com/show_bug.cgi?id=1402621 (also reported by me). It has some more statedumps and a client fuse log.

We have had to rip out our GlusterFS installation as it was too unstable and the instances I stopped have been cleaned up, so I can't get any further logs.

I have captured a screen shot during the time it was misbehaving and have attached it, if that can provide any further insight. It includes a dstat showing more disk read/writes and net performance at the time.

1.
The CPU was being hogged by the glusterfsd.
2.
Number of entries of those bricks was as high as 1700. When we stopped the hosts accessing the gluster mounts it would go down but quickly started climbing again as soon as they were brought back on line and CPU would climb rapidly (we are talking at most 12 client nodes).
The node that was most recently added as a replacement node always showed a lower count of entries for the brick
3.
We did find there was a directory that was causing 'mount point does not exist' errors. We removed that from the gluster nodes and it seems to help. But also we did a rebalance (fix-layout and migrate data) which also appeared to help(after everything calmed down).

The file 9b886490-6a45-433f-aa49-a398c4ef385d ('/ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg') that were reported with the SETATTR error, on one of the low CPU nodes (10.90.3.14), was just one of many similar files of with similar errors. The files were zero byte and with the same timestamps as the other nodes. At that time they did not have their permissions set correctly as that happens after they get the data (I assume). They were owned by root and after they received data, they had correct permissions. I could remove one of these zero byte files from the gluster node filesystem and they would be recreated (zero bytes).

Before we removed the cluster it was behaving normally the heal info showed a normal amount of entries (up to 20).

Comment 5 Niels de Vos 2017-11-07 10:37:57 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 6 Arie Skliarouk 2018-03-29 11:49:38 UTC

For those who might have such problem, here is my situation and workaround:

On glusterfs 3.7.6-1ubuntu1 (ubuntu 16.04), we had had high CPU usage on one of the gluster servers. It used all available CPU resources without writing anything meaningful into logs. There were more than usual lines like these:

[2018-03-29 11:35:22.081500] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 2-gfs-cmj-dht: Setting layout of /tmp/salt_www/dashman/cf37firsesc549t9a2a12o3653 with [Subvol_name: gfs-cmj-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],  

Any file access to the glusterfs filesystem was slow, each operaton taking 10 seconds or more.

I found that this command mitigates the problem - high CPU went away and glusterfs became fast again:
gluster volume gfs-cmj performance.readdir-ahead off

May this will help someone (took me many hours to find out).

Note You need to log in before you can comment on or make changes to this bug.