Problem: glusterfind pre command seems to hang for days when listing millions of files with changelog.py processes consuming 100% CPU RCA: changelog.py processes indeed consume 100% CPU due to CPU intensive SQLite operations during changelog parsing. Part of the time consumed during changelog parsing was due to a bug causing repetitive parsing of previously parsed changelogs.
REVIEW: http://review.gluster.org/15609 (tools/glusterfind: kill remote processes and separate run-time directories) posted (#7) for review on master by Milind Changire (mchangir)
REVIEW: http://review.gluster.org/15609 (tools/glusterfind: kill remote processes and separate run-time directories) posted (#8) for review on master by Milind Changire (mchangir)
REVIEW: http://review.gluster.org/15609 (tools/glusterfind: kill remote processes and separate run-time directories) posted (#9) for review on master by Milind Changire (mchangir)
REVIEW: http://review.gluster.org/15609 (tools/glusterfind: kill remote processes and separate run-time directories) posted (#10) for review on master by Milind Changire (mchangir)
COMMIT: http://review.gluster.org/15609 committed in master by Aravinda VK (avishwan) ------ commit feea851fad4f89b48bfe89fe3b75250cc7bd6501 Author: Milind Changire <mchangir> Date: Mon Oct 17 12:16:36 2016 +0530 tools/glusterfind: kill remote processes and separate run-time directories Problem #1: Hitting CTRL+C leaves stale processes on remote nodes if glusterfind pre has been initiated. Solution #1: Adding "-t -t" to ssh command-line forces pseudo-terminal to be assigned to remote process. When local process receives Keyboard Interrupt, SIGHUP is immediately conveyed to the remote terminal causing remote changelog.py process to terminate immediately. Problem #2: Concurrent glusterfind pre runs are not possible on the same glusterfind session in case of a runaway process. Solution #2: glusterfind pre runs now add random directory name to the working directory to store and manage temporary database and changelog processing. If KeyboardInterrupt is received, the function call run_cmd_nodes("cleanup", args, tmpfilename=gtmpfilename) cleans up the remote run specific directory. Patch: 7571380 cli/xml: Fix wrong XML format in volume get command broke "gluster volume get <vol> changelog.rollover-time --xml" Now fixed function utils.py::get_changelog_rollover_time() Fixed spurious trailing space getting written if second path is empty in main.py::write_output() Fixed repetitive changelog processing in changelog.py::get_changes() Change-Id: Ia8d96e2cd47bf2a64416bece312e67631a1dbf29 BUG: 1382236 Signed-off-by: Milind Changire <mchangir> Reviewed-on: http://review.gluster.org/15609 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Aravinda VK <avishwan>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/