Bug 1297097

Summary: CF 3.2: Automation code throwing "0x000... is recycled object" exceptions
Product: Red Hat CloudForms Management Engine Reporter: Josh Carter <jocarter>
Component: AutomateAssignee: mkanoor
Status: CLOSED ERRATA QA Contact: Milan Falešník <mfalesni>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.4.0CC: cpelland, jdeubel, jhardy, jocarter, jprause, mfalesni, mfeifer, mkanoor, nachandr, obarenbo, sshveta, tfitzger
Target Milestone: GAKeywords: ZStream
Target Release: 5.6.0   
Hardware: All   
OS: All   
Whiteboard: automate
Fixed In Version: 5.6.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1297499 1297952 (view as bug list) Environment:
Last Closed: 2016-06-29 15:26:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1297499, 1297952, 1304859    
Attachments:
Description Flags
Updated Automated Method with additional logging none

Description Josh Carter 2016-01-09 04:53:51 UTC
Description of problem:

After the 3.2 migration, we started bulk provision testing.

About 20% of the time, we have seen failures due to "recycled objects". This is ruby garbage collection and I wasn't expecting referenced objects to be "garbage".

This error is thrown when executing a best_fit_code method. Specifically, it is iterating through a collection of clusters. In a nutshell, the code works like below:

prov = $evm.root["miq_provision"]
template = prov.vm_template
ems = template.ext_management_system

clusters = ems.ems_clusters.select do |cluster|
  # selection criteria
done

clusters.each { |cluster|
  # select on hosts, then on storages
}


Version-Release number of selected component (if applicable): 5.4.3


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 mkanoor 2016-01-13 17:03:50 UTC
Created attachment 1114474 [details]
Updated Automated Method with additional logging

This method logs the class name in the Automate Method,
To test it
(1) Create a new domain called TEST_DOMAIN, make sure its enabled
(2) Copy the Automate method  /intuit/ Intuit/Provisioning/VMMethods/best_fit_with_scope into this new TEST_DOMAIN
(3) Overwrite the contents of Automate Method /TEST_DOMAIN/Intuit/Provisioning/VMMethods/ best_fit_with_scope with the attached file
(4) Run a provision request

Comment 10 CFME Bot 2016-01-18 20:51:01 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/20fa1a1d427dae46cae611b404e865f7358bf7f7

commit 20fa1a1d427dae46cae611b404e865f7358bf7f7
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Thu Jan 14 17:09:23 2016 -0500
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Mon Jan 18 12:46:25 2016 -0500

    Use DRb's builtin caching mechanism
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1297097
    
    Previously we have been using our own caching mechanism to
    cache objects sent as references to the Automate Methods. We have
    gotten DRb recycled object errors, since some of the objects which
    were sent as references were not being cached.
    DRb has a builtin caching mechanism based on TimerIdConv, which
    we are starting to use in this PR.

 lib/miq_automation_engine/engine/miq_ae_method.rb     | 8 ++++++++
 lib/miq_automation_engine/engine/miq_ae_service.rb    | 2 +-
 spec/lib/miq_automation_engine/miq_ae_service_spec.rb | 9 ---------
 3 files changed, 9 insertions(+), 10 deletions(-)

Comment 11 CFME Bot 2016-01-18 20:51:07 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/4a566c82234c92d99f1c81b01c111e9a6cebe5b9

commit 4a566c82234c92d99f1c81b01c111e9a6cebe5b9
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Mon Jan 18 12:55:44 2016 -0500
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Mon Jan 18 12:55:44 2016 -0500

    Removed references to drb_return
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1297097

 lib/miq_automation_engine/engine/miq_ae_service.rb      | 17 +++++------------
 .../engine/miq_ae_service_model_base.rb                 | 14 +++-----------
 .../mixins/miq_ae_service_miq_provision_mixin.rb        |  2 +-
 3 files changed, 9 insertions(+), 24 deletions(-)

Comment 13 CFME Bot 2016-02-09 22:47:47 UTC
New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=0945d50d5b662f11feeff242b802ae7c1b50c82c

commit 0945d50d5b662f11feeff242b802ae7c1b50c82c
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Thu Jan 14 17:09:23 2016 -0500
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Tue Jan 19 11:04:22 2016 -0500

    Use DRb's builtin caching mechanism
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1297097
    
    Previously we have been using our own caching mechanism to
    cache objects sent as references to the Automate Methods. We have
    gotten DRb recycled object errors, since some of the objects which
    were sent as references were not being cached.
    DRb has a builtin caching mechanism based on TimerIdConv, which
    we are starting to use in this PR.

 vmdb/lib/miq_automation_engine/engine/miq_ae_method.rb    |  8 ++++++++
 vmdb/lib/miq_automation_engine/engine/miq_ae_service.rb   |  2 +-
 .../spec/lib/miq_automation_engine/miq_ae_service_spec.rb | 15 ---------------
 3 files changed, 9 insertions(+), 16 deletions(-)

Comment 14 Greg McCullough 2016-02-10 20:22:31 UTC
*** Bug 1304745 has been marked as a duplicate of this bug. ***

Comment 17 Milan Falešník 2016-04-19 18:14:48 UTC
Verified in 5.6.0.1-beta2 using a script from https://bugzilla.redhat.com/show_bug.cgi?id=1284573 and also by provisionings.

Comment 19 errata-xmlrpc 2016-06-29 15:26:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348