Description of problem: Current upstream bitrot daemon (bitd) runs full throttle on CPU/disk for a set of objects (files). This practically eats up all CPU cycles and probably thrashes disk heads. 1 [||||||||||||||||||||||||||||||||||||||||||||||||||||||93.3%] Tasks: 163, 499 thr; 4 running 2 [||||||||||||||||||||||||||||||||||||||||||||||||||||||89.7%] Load average: 7.17 6.23 4.37 3 [||||||||||||||||||||||||||||||||| 55.2%] Uptime: 01:20:49 4 [||||||||||||||||||||||||||||||||||||||||||||| 73.6%] Mem[||||||||||||||||||||||||||||||||||||||||||||||||3456/7860MB] Swp[| 0/7999MB] Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: 1. Create a Gluster volume (of any topology) 2. Start and enable bitrot 3. Execute the following (or any similar workload) # wget http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.19.tar.gz # tar -zxvf linux-3.19.tar.gz # cd linux-3.19 # make defconfig # make Actual results: All CPU cores run close to 100%. Expected results: Avoid maxing out CPU usage by having some form of synthetic delays (induced throttling) or using Cgroups if possible (what about NetBSD, etc..). Additional info:
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#1) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#2) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#3) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#4) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#5) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#6) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10511 (features/bitrot: Throttle filesystem scrubber) posted (#2) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#7) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling [WIP]) posted (#8) for review on master by Gaurav Kumar Garg (ggarg)
REVIEW: http://review.gluster.org/10511 (features/bitrot: Throttle filesystem scrubber) posted (#4) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10307 (features/bit-rot: Token Bucket based throttling) posted (#10) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10511 (features/bitrot: Throttle filesystem scrubber) posted (#5) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/10511 (features/bitrot: Throttle filesystem scrubber) posted (#8) for review on master by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/10511 committed in master by Vijay Bellur (vbellur) ------ commit 9ba8963999bca431ec14a25961a163810cfe1e5b Author: Venky Shankar <vshankar> Date: Mon Apr 27 21:34:34 2015 +0530 features/bitrot: Throttle filesystem scrubber This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user