Bug 1096332 - All content-related workflows broken on Agent
Summary: All content-related workflows broken on Agent
Status: ON_QA
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugin Container
Version: 4.10,4.11,4.12
Hardware: Unspecified
OS: Unspecified
unspecified
urgent vote
Target Milestone: ---
: RHQ 4.13
Assignee: Thomas Segismont
QA Contact: Mike Foley
URL:
Whiteboard:
Keywords: Reopened
: 1068630 (view as bug list)
Depends On:
Blocks: 1125343
TreeView+ depends on / blocked
 
Reported: 2014-05-09 17:07 UTC by Lukas Krejci
Modified: 2018-01-30 17:57 UTC (History)
4 users (show)

(edit)
Clone Of:
: 1125343 (view as bug list)
(edit)
Last Closed: 2014-05-16 09:15:24 UTC


Attachments (Terms of Use)

Description Lukas Krejci 2014-05-09 17:07:50 UTC
Description of problem:
Due to overly eager initialization of resource context, the content services are not available to the resource components, essentially rendering any content related work (CreateChildFacet and ContentFacet) unusable. (Any attempt to use ContentContext.getContentServices() (needed for example to download packages from the server) results in an NPE.

Version-Release number of selected component (if applicable):
4.10.0

How reproducible:
always

Steps to Reproduce:
1. Try to deploy a WAR into JBoss AS7 through the create child facility of RHQ

Actual results:
Always fails

Expected results:
Succeeds

Additional info:
This is a regression caused by BZ 991149

Comment 1 Lukas Krejci 2014-05-12 12:54:00 UTC
*** Bug 1068630 has been marked as a duplicate of this bug. ***

Comment 2 Lukas Krejci 2014-05-12 12:55:42 UTC
See BZ 1068630 for a concrete example of the NPE this produces in the AS7 plugin.

Comment 3 John Mazzitelli 2014-05-12 13:44:45 UTC
I reopened bug #991149 since that's the one that introduced the regression.

Comment 4 Lukas Krejci 2014-05-16 09:15:24 UTC
Seems to have been a problem with my local build. I can't really explain what happened, but I can no longer replicate this. Closing.

Comment 5 Thomas Segismont 2014-07-31 15:14:35 UTC
*** Bug 1068630 has been marked as a duplicate of this bug. ***

Comment 6 Thomas Segismont 2014-07-31 15:31:06 UTC
Just hit the problem again. New investigation shows that this regression was indeed introduced by the work for Bug 991149. Here's the issue.

All PC subsystems are initialized in org.rhq.core.pc.PluginContainer#initialize

InventoryManager is initialized *before* the ContentManager

When InventoryManager is initialized, it activates already inventoried resources recursively. Inventoried resources which have child types creatable by content are given a org.rhq.core.pluginapi.content.ContentContext instance. The problem is the ContentContext holds a null reference to ContentManager, because at this point the Content manager is not initialized, and not even created.

The consequences are the following:
* resources added to inventory *after* the PC started will be able to use the content services, because when the resource context is created, the content manager is already initialized
* resources *already* in inventory when the PC is started will not be able to use the content services.

A good test to check if this issue is resolved is:

1. create new RHQ environment
2. inventory an AS7 / EAP6 standalone server
3. create a new Deployment child resource
4. check the child resource creation works
5. delete the child resource
6. restart the agent
5. create the Deployment child resource again

Step 5 five won't work without a fix

Working on the fix ASAP

Comment 7 Thomas Segismont 2014-08-01 16:24:06 UTC
Fixed in master

commit 8668677a01100d22e6799ce3f73f58b92a1b030c
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Fri Aug 1 18:22:34 2014 +0200

All *ContextImpl classes now lazy load plugin container services

The content subsystem was the only one affected as:
* some of the other managers were created and initialized before the inventory manager
* some of the *ContextImpl classes were already lazy loading the container services

Comment 8 Thomas Segismont 2014-08-01 17:14:49 UTC
Additional commit in master, some itest classes were broken

commit 82fba05c62d5a5036455fa936cfa045f60b4724c
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Fri Aug 1 19:13:20 2014 +0200

Comment 9 Lukas Krejci 2014-08-04 14:58:54 UTC
Commit 8668677a01100d22e6799ce3f73f58b92a1b030c reintroduces the lazy obtaining of plugin container services.

I am not sure that is an advisable thing to do because it opens up the door to difficult to debug race conditions with different plugin container services being requested by another at different points in time before, during or after any of them being initialized.

In https://github.com/rhq-project/rhq/pull/32 I offer a different approach to solving this issue without reintroducing the possibility of "initialization loops".

Of course even that is not a bulletproof solution because it leaves various resource contexts uninitialized during resource upgrade phase (which by the way is the current state, too).

A proper solution to this problem would possibly have to formally admit to there being certain initialization stages of the plugin container during which different services are available.

I'm leaving the commits in master but reverting the issue back to ON_DEV so that we can discuss the different approaches before committing this to QA.

Thomas, what is your stance on this?

Comment 10 Thomas Segismont 2014-08-19 17:28:03 UTC
(In reply to Lukas Krejci from comment #9)
> Thomas, what is your stance on this?

I think you're right :)

commit 24baf850306bcf58ecf35b1d327ae33ceca3c999
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Tue Aug 19 16:53:16 2014 +0200

This reverts commit 8668677a01100d22e6799ce3f73f58b92a1b030c and 82fba05c62d5a5036455fa936cfa045f60b4724c
    
Re-introducing lazy loading is a bad idea.

Comment 11 Thomas Segismont 2014-08-19 17:29:18 UTC
Fixed in master

commit 2f359251836f5ce82b0fcd28903e423082326051
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Tue Aug 19 19:23:53 2014 +0200
    
All *ContextImpl classes now user eagerly loaded plugin container services
    
The content subsystem was the only one affected as:
* some of the other managers were created and initialized before the inventory manager
* some of the *ContextImpl classes were still lazy loading container services

Comment 12 Thomas Segismont 2014-08-21 10:36:07 UTC
Lukas,
I've created a pull request following your latest comments. I'll merge it as soon as you approve.
Thanks

Comment 13 Lukas Krejci 2014-08-21 13:14:06 UTC
Merged https://github.com/rhq-project/rhq/pull/111

Comment 14 Thomas Segismont 2014-08-27 09:07:20 UTC
Lukas has merged the 2 commits in release/jon3.3.x

commit 6a0cb4d336db879539e98e3695a4002dd69f80b6
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Tue Aug 19 19:23:53 2014 +0200

    (cherry picked from commit 2f359251836f5ce82b0fcd28903e423082326051)
    Signed-off-by: Lukas Krejci <lkrejci@redhat.com>

commit 0d7980daf2b46ca3abf19740a25aa5faf22b1a1f
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Thu Aug 21 12:33:38 2014 +0200
    
    (cherry picked from commit aefe29a44961fc723d5241d6b67bea2f2f3ccaf7)
    Signed-off-by: Lukas Krejci <lkrejci@redhat.com>

Comment 15 Thomas Segismont 2014-08-27 09:12:11 UTC
Back to MODIFIED as the commits are not in ER1

Comment 16 Thomas Segismont 2014-08-27 09:14:29 UTC
(In reply to Thomas Segismont from comment #15)
> Back to MODIFIED as the commits are not in ER1

Forget this, this the upstream BZ


Note You need to log in before you can comment on or make changes to this bug.