Earlier, Gluster kept trying to heal the files which failed and remain unhealed consuming a significant amount of CPU.
With this enhancement, Gluster has a better way to detect when continuous healing is necessary and reduces CPU utilization when pending heals cannot be immediately healed.
DescriptionLeela Venkaiah Gangavarapu
2020-07-01 08:06:20 UTC
Created attachment 1699439[details]
CPU usage on server and client
Description of problem:
High CPU usage is being observed after in-service upgrade of one node in a 3 node cluster
Version-Release number of selected component (if applicable):
glusterfs-server-6.0-37.1.el7rhgs.x86_64
How reproducible:
Consistent
Steps to Reproduce:
1. A cluster with 3 nodes hosting 4X(4+2) dist-disp vol and 3X3 repl vol
2. Upgraded one of the nodes when dist-disp is ~5% full and repl is ~35% full
3. Monitoring CPU during the heal process and post heal process
4. Saw a sudden spike in CPU and still continuing even after heal is complete
5. Observe CPU spikes for gluster process on servers
6. Command used "$ top -c -p $(pgrep -d',' -f gluster)" on client and server
Actual results:
CPU Usage spikes reaching ~700-800%
Expected results:
CPU Usage should be moderate <100%
Additional info:
- No heals are pending
- The other two nodes are in `glusterfs-6.0-37.el7rhgs.x86_64` version
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:5603
Created attachment 1699439 [details] CPU usage on server and client Description of problem: High CPU usage is being observed after in-service upgrade of one node in a 3 node cluster Version-Release number of selected component (if applicable): glusterfs-server-6.0-37.1.el7rhgs.x86_64 How reproducible: Consistent Steps to Reproduce: 1. A cluster with 3 nodes hosting 4X(4+2) dist-disp vol and 3X3 repl vol 2. Upgraded one of the nodes when dist-disp is ~5% full and repl is ~35% full 3. Monitoring CPU during the heal process and post heal process 4. Saw a sudden spike in CPU and still continuing even after heal is complete 5. Observe CPU spikes for gluster process on servers 6. Command used "$ top -c -p $(pgrep -d',' -f gluster)" on client and server Actual results: CPU Usage spikes reaching ~700-800% Expected results: CPU Usage should be moderate <100% Additional info: - No heals are pending - The other two nodes are in `glusterfs-6.0-37.el7rhgs.x86_64` version