Description of problem: Here is the mail sent by Lindsay Mathieson: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM - What does glusterfsd do? - What can I do to fix this? thanks, ------------------------ We found that the root cause is that mount started self-heal of all the VMs which are doing diff self-heal, i.e. checksums are consuming high CPU on the bricks which lead to the issue. We need a way to throttle the number of parallel self-heals. Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1. Have a lot of VMs on replicated volume 2. Bring one brick down and do some write activity on all the VMs 3. Bring the brick back up while the VM operations are in progress 4. This will lead to self-heal of all the VMs by the mount. 5. That will cause high CPU usage on bricks because of checksums. Expected results: Bricks should not use so much CPU. There should be some kind of throttling
With sharding and full self-heal algorithm this problem doesn't happen. So closing this.