| Summary: | GenericWorkers and PriorityWorkers are spawning threads that are not cleaned up during provisioning in 5.5.3.2 | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Alex Krzos <akrzos> | |
| Component: | Performance | Assignee: | dmetzger | |
| Status: | CLOSED ERRATA | QA Contact: | Alex Krzos <akrzos> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 5.5.0 | CC: | akrzos, cpelland, dajohnso, jhardy, jrafanie, mfeifer, mkanoor, obarenbo, simaishi | |
| Target Milestone: | GA | Keywords: | ZStream | |
| Target Release: | 5.6.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | perf:worker | |||
| Fixed In Version: | 5.6.0.5 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1342278 1342280 (view as bug list) | Environment: | ||
| Last Closed: | 2016-06-29 15:45:32 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1342278, 1342280 | |||
| Attachments: | ||||
|
Description
Alex Krzos
2016-03-29 12:09:47 UTC
Created attachment 1141221 [details]
CPU Graph during 2 hour workload
Created attachment 1141222 [details]
Memory Usage during 2 hour workload
Created attachment 1141226 [details]
Generic/Priority Workers Threads during 2 hour workload
Created attachment 1141228 [details]
Generic/Priority RSS Memory Usage during 2 hour workload
*** Bug 1322394 has been marked as a duplicate of this bug. *** Question in automate gitter room:
Is this DRb.start_service[1] and this one[2] run in the same process?
one is in the RUBY_METHOD_PREAMBLE and one is in the setup_drb_for_ruby_method
two start_service calls on the same URI with a one or more stop_service calls, leaks threads, or so it appears
irb(main):004:0> require 'drb'; 10.times { puts Thread.list.length; DRb.start_service("druby://127.0.0.1:0"); DRb.start_service("druby://127.0.0.1:0"); DRb.stop_service; }
1
2
3
4
5
6
7
8
9
10
[1]
https://github.com/ManageIQ/manageiq/blob/3f5a48410bf8023774584ba4a2616264a0511936/lib/miq_automation_engine/engine/miq_ae_method.rb#L230
[2] https://github.com/ManageIQ/manageiq/blob/3f5a48410bf8023774584ba4a2616264a0511936/lib/miq_automation_engine/engine/miq_ae_method.rb#L129
This relates to Automate methods and how we use Drb provided caching. For caching objects sent over by the DrbServer we use a Ruby implementation called TimerIdConv. This caches all the objects based on a passed in timeout, we pass in an hour. For that time period there is a thread that holds on to the cached objects. When the method ends, this thread has not been terminated. When the next work is picked up by the Automate worker and it dispatches a method it creates another TimerIdConv thread. These threads would all terminate after 1 hour. https://github.com/ruby/ruby/blob/32674b167bddc0d737c38f84722986b0f228b44b/lib/drb/timeridconv.rb#L78 One of the solutions we worked on was to kill this thread when the method ends. New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/1428c1e9a855e6ca22cd1187dff6bbf7fa4c93cc commit 1428c1e9a855e6ca22cd1187dff6bbf7fa4c93cc Author: Madhu Kanoor <mkanoor> AuthorDate: Mon May 2 17:59:22 2016 -0400 Commit: Madhu Kanoor <mkanoor> CommitDate: Tue May 3 12:10:20 2016 -0400 DRb TimerIdConv cleanup https://bugzilla.redhat.com/show_bug.cgi?id=1321934 The TimerIdConv thread was not being terminated after a method ends. The TimerIdConv thread sleeps for a hour before it GC's the objects lib/miq_automation_engine/engine/miq_ae_method.rb | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) New commit detected on cfme/5.5.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=e3112414d3feecd65d823970049a292565f88a60 commit e3112414d3feecd65d823970049a292565f88a60 Author: Madhu Kanoor <mkanoor> AuthorDate: Mon May 2 17:59:22 2016 -0400 Commit: Madhu Kanoor <mkanoor> CommitDate: Fri May 13 15:59:30 2016 -0400 DRb TimerIdConv cleanup https://bugzilla.redhat.com/show_bug.cgi?id=1321934 The TimerIdConv thread was not being terminated after a method ends. The TimerIdConv thread sleeps for a hour before it GC's the objects lib/miq_automation_engine/engine/miq_ae_method.rb | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) New commit detected on cfme/5.5.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=418e1a6628339842597a045911269da2f860308b commit 418e1a6628339842597a045911269da2f860308b Merge: 1b6e399 aac6e54 Author: Greg McCullough <gmccullo> AuthorDate: Wed May 25 16:16:31 2016 -0400 Commit: Greg McCullough <gmccullo> CommitDate: Wed May 25 16:16:31 2016 -0400 Merge branch 'bugzilla_1321934_5_5_z' into '5.5.z' Thread leakage due to Drb's TimerIdConv thread not being terminated To prevent Drb recycled error we delegated the caching of referenced objects to DRb's builtin TimerIdConv, which uses a thread and that thread never gets terminated, causing a thread leakage. https://bugzilla.redhat.com/show_bug.cgi?id=1321934 Original PRs: https://github.com/ManageIQ/manageiq/pull/8399 (Clean Cherry Pick) https://github.com/ManageIQ/manageiq/pull/8419 (2 commits clean cherry pick, 1 commit had rspec 3 changes) See merge request !939 lib/miq_automation_engine/engine/miq_ae_method.rb | 13 +- .../miq_automation_engine/miq_ae_method_spec.rb | 240 +++++++++++---------- 2 files changed, 138 insertions(+), 115 deletions(-) Resolved by https://github.com/ManageIQ/manageiq/pull/8399 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1348 |