Created attachment 1230893 [details] statedump from gluster node with high load Description of problem: 3 x node Fedora Cluster in AWS (m4.xlarge) (Fedora 23 (Cloud Edition)) 2.5Tb volume One node out of 3 gets high CPU, healing entries increase. Logs on one node with low CPU usage keep getting messages similar to this: I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 6954047: SETATTR /ftpdata/<removed>/2_kamih2.zip (c46c2f49-9688-4617-9541-a7181b495f80) ==> (Operation not permitted) [Operation not permitted] I have asked previously about this message on the mailing list but nobody would answer. Other two nodes (one with high CPU and one with low) logs are quiet, with occassional heal messages To reduce load we have to stop reading and writing to volumes. After having trouble explained here https://bugzilla.redhat.com/show_bug.cgi?id=1402621 we upgraded the cluster 3.8. The cluster of 3x m4.xlarge hosts (4CPU 16G RAM) support only 12 clients at most. Version-Release number of selected component (if applicable): 3.8 How reproducible: Performance issue occurring regularly. Steps to Reproduce: Running find /path -type f -exec stat {} \; can show a significant increase from just one host. Additional info: sudo gluster volume info Volume Name: marketplace_nfs Type: Distributed-Replicate Volume ID: 528de1b5-0bd5-488b-83cf-c4f3f747e6cd Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.90.5.105:/data/data0/marketplace_nfs Brick2: 10.90.3.14:/data/data3/marketplace_nfs Brick3: 10.90.4.195:/data/data0/marketplace_nfs Brick4: 10.90.5.105:/data/data1/marketplace_nfs Brick5: 10.90.3.14:/data/data1/marketplace_nfs Brick6: 10.90.4.195:/data/data1/marketplace_nfs Options Reconfigured: performance.client-io-threads: on performance.io-thread-count: 12 server.event-threads: 3 client.event-threads: 3 server.outstanding-rpc-limit: 256 cluster.self-heal-readdir-size: 16KB cluster.self-heal-window-size: 3 diagnostics.brick-log-level: INFO network.ping-timeout: 15 cluster.quorum-type: none performance.readdir-ahead: on cluster.self-heal-daemon: enable performance.cache-size: 1024MB cluster.lookup-optimize: on cluster.data-self-heal-algorithm: diff nfs.disable: off cluster.server-quorum-ratio: 51% sudo gluster volume status Status of volume: marketplace_nfs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.90.5.105:/data/data0/marketplace_n fs 49155 0 Y 20611 Brick 10.90.3.14:/data/data3/marketplace_nf s 49158 0 Y 23161 Brick 10.90.4.195:/data/data0/marketplace_n fs 49155 0 Y 5504 Brick 10.90.5.105:/data/data1/marketplace_n fs 49156 0 Y 20616 Brick 10.90.3.14:/data/data1/marketplace_nf s 49159 0 Y 23166 Brick 10.90.4.195:/data/data1/marketplace_n fs 49156 0 Y 5509 NFS Server on localhost 2049 0 Y 23250 Self-heal Daemon on localhost N/A N/A Y 23262 NFS Server on ip-10-90-4-195.ec2.internal 2049 0 Y 25289 Self-heal Daemon on ip-10-90-4-195.ec2.inte rnal N/A N/A Y 25297 NFS Server on ip-10-90-5-105.ec2.internal 2049 0 Y 8405 Self-heal Daemon on ip-10-90-5-105.ec2.inte rnal N/A N/A Y 8416 Task Status of Volume marketplace_nfs ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: marketplace_uploads Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.90.4.195:/data/data2/uploads 49157 0 Y 5528 Brick 10.90.3.14:/data/data2/uploads 49160 0 Y 23180 Brick 10.90.5.105:/data/data2/uploads 49157 0 Y 20621 NFS Server on localhost 2049 0 Y 23250 Self-heal Daemon on localhost N/A N/A Y 23262 NFS Server on ip-10-90-4-195.ec2.internal 2049 0 Y 25289 Self-heal Daemon on ip-10-90-4-195.ec2.inte rnal N/A N/A Y 25297 NFS Server on ip-10-90-5-105.ec2.internal 2049 0 Y 8405 Self-heal Daemon on ip-10-90-5-105.ec2.inte rnal N/A N/A Y 8416 Task Status of Volume marketplace_uploads ------------------------------------------------------------------------------ There are no active volume tasks
On one node with low CPU: [fedora@ip-10-90-3-14 ~]$ sudo grep 9b886490-6a45-433f-aa49-a398c4ef385d -rs /var/log/glusterfs/* /var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-12 22:56:45.703001] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 61001: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted] /var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-12 23:51:11.958781] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 2730089: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted] /var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:03:47.098154] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 305778: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted] /var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:04:56.252942] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 385474: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted] /var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:05:55.345749] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 430351: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted] /var/log/glusterfs/bricks/data-data3-marketplace_nfs.log:[2016-12-13 00:10:39.747917] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-marketplace_nfs-server: 557974: SETATTR /ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg (9b886490-6a45-433f-aa49-a398c4ef385d) ==> (Operation not permitted) [Operation not permitted] On the node with high CPU: $ sudo grep 9b886490-6a45-433f-aa49-a398c4ef385d -rs /var/log/glusterfs/* /var/log/glusterfs/glustershd.log:[2016-12-13 00:14:25.053114] I [MSGID: 108026] [afr-self-heal-common.c:1076:afr_log_selfheal] 0-marketplace_nfs-replicate-0: Completed data selfheal on 9b886490-6a45-433f-aa49-a398c4ef385d. sources=[0] 2 sinks=1 /var/log/glusterfs/glustershd.log:[2016-12-13 00:14:25.100564] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-marketplace_nfs-replicate-0: performing metadata selfheal on 9b886490-6a45-433f-aa49-a398c4ef385d /var/log/glusterfs/glustershd.log:[2016-12-13 00:14:25.123101] I [MSGID: 108026] [afr-self-heal-common.c:1076:afr_log_selfheal] 0-marketplace_nfs-replicate-0: Completed metadata selfheal on 9b886490-6a45-433f-aa49-a398c4ef385d. sources=[0] 2 sinks=1 Is that Informational log pointing to a problem (operation not permitted)?
High CPU usage is a known problem, we are working on introducing some form of throttling mechanism to regulate self-heal traffic. Meanwhile, some queries: 1. What process was consuming high CPU (was it brick/glustershd/glusternfs process)? 2. How many files were pending heal when you noticed the high cpu load (i.e. 'Number of entries' in each brick when `gluster volume heal volname info` is run) 3. The [Operation not permitted] should normally not occur. Can you share the complete brick log? Also, the filename corresponding to the gfid 9b886490-6a45-433f-aa49-a398c4ef385d. Is there a difference in file owner/permissions/times for this file amongst the 3 bricks? Also, some tips: 1. cluster.quorum-type: none --> This has to be set to 'auto' to avoid split-brains. 2. Can you try using cgroups for regulation and let us know if that helps? ( http://blog.gluster.org/2016/09/making-gluster-play-nicely-with-others/ )
Created attachment 1235451 [details] glusterfsd process CPU HighCPU and dstat output.
This bug is most likely a duplicate of this issue https://bugzilla.redhat.com/show_bug.cgi?id=1402621 (also reported by me). It has some more statedumps and a client fuse log. We have had to rip out our GlusterFS installation as it was too unstable and the instances I stopped have been cleaned up, so I can't get any further logs. I have captured a screen shot during the time it was misbehaving and have attached it, if that can provide any further insight. It includes a dstat showing more disk read/writes and net performance at the time. 1. The CPU was being hogged by the glusterfsd. 2. Number of entries of those bricks was as high as 1700. When we stopped the hosts accessing the gluster mounts it would go down but quickly started climbing again as soon as they were brought back on line and CPU would climb rapidly (we are talking at most 12 client nodes). The node that was most recently added as a replacement node always showed a lower count of entries for the brick 3. We did find there was a directory that was causing 'mount point does not exist' errors. We removed that from the gluster nodes and it seems to help. But also we did a rebalance (fix-layout and migrate data) which also appeared to help(after everything calmed down). The file 9b886490-6a45-433f-aa49-a398c4ef385d ('/ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg') that were reported with the SETATTR error, on one of the low CPU nodes (10.90.3.14), was just one of many similar files of with similar errors. The files were zero byte and with the same timestamps as the other nodes. At that time they did not have their permissions set correctly as that happens after they get the data (I assume). They were owned by root and after they received data, they had correct permissions. I could remove one of these zero byte files from the gluster node filesystem and they would be recreated (zero bytes). Before we removed the cluster it was behaving normally the heal info showed a normal amount of entries (up to 20).
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.
For those who might have such problem, here is my situation and workaround: On glusterfs 3.7.6-1ubuntu1 (ubuntu 16.04), we had had high CPU usage on one of the gluster servers. It used all available CPU resources without writing anything meaningful into logs. There were more than usual lines like these: [2018-03-29 11:35:22.081500] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 2-gfs-cmj-dht: Setting layout of /tmp/salt_www/dashman/cf37firsesc549t9a2a12o3653 with [Subvol_name: gfs-cmj-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], Any file access to the glusterfs filesystem was slow, each operaton taking 10 seconds or more. I found that this command mitigates the problem - high CPU went away and glusterfs became fast again: gluster volume gfs-cmj performance.readdir-ahead off May this will help someone (took me many hours to find out).