Bug 1129729 - Plugins are not updated after upgrade from jon3.2.2 to jon3.3.dr01
Summary: Plugins are not updated after upgrade from jon3.2.2 to jon3.3.dr01
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Upgrade
Version: JON 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ER04
: JON 3.3.0
Assignee: John Mazzitelli
QA Contact: Filip Brychta
URL:
Whiteboard:
Keywords: Reopened
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-13 14:38 UTC by Filip Brychta
Modified: 2014-12-11 14:00 UTC (History)
5 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2014-12-11 14:00:46 UTC


Attachments (Terms of Use)
server.log (1.45 MB, text/x-log)
2014-08-13 14:38 UTC, Filip Brychta
no flags Details
agent.log (26.84 KB, text/x-log)
2014-08-13 14:38 UTC, Filip Brychta
no flags Details
complete logs (325.71 KB, application/octet-stream)
2014-09-03 13:16 UTC, Filip Brychta
no flags Details
complete logs #2 (202.49 KB, application/octet-stream)
2014-09-19 12:24 UTC, Filip Brychta
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1128780 None None None Never
Red Hat Bugzilla 1129705 None None None Never

Internal Trackers: 1128780 1129705

Description Filip Brychta 2014-08-13 14:38:13 UTC
Created attachment 926468 [details]
server.log

Description of problem:
Jon server must be restarted after upgrade to get the plugins updated. Ordinary upgrade process doesn't update plugins.

Version-Release number of selected component (if applicable):
Version :	
3.3.0.DR01
Build Number :	
6468454:dda0a47

How reproducible:
2/2

Steps to Reproduce:
1. jon3.2.2 is running using oracle db (not sure if used db is relevant)
2. upgrade JON to version 3.3.DR01:
  a. stop jon3.2.0.GA (rhqctl stop)
  b. unzip jon-server-3.3.0.DR01.zip
  c. unzip plugins and copy jars to jon-server-3.3.0.DR01/plugins/ 
  d. edit rhq-server.properties and set rhq.autoinstall.server.admin.password to workaround bz 1128151 (this can be any string)
  e. run upgrade (./rhqctl upgrade --from-server-dir /home/hudson/jon-server-3.2.0.GA/ --from-agent-dir /home/hudson/rhq-agent)
3. start JON (rhqctl start)

Actual results:
Plugins are not updated (visble in Administration->Agent Plugins)
Imported resources are in unknowns status

Expected results:
Plugins are updated and resources are up

Additional info:
Workaround:
./rhctl restart 

Full logs attached

Comment 1 Filip Brychta 2014-08-13 14:38:38 UTC
Created attachment 926469 [details]
agent.log

Comment 2 Heiko W. Rupp 2014-08-14 13:20:30 UTC
It may be that this is directly caused by Bug 1129705 / Bug 1128780 so we should first fix the other two and then revisit this one.

Comment 3 Thomas Segismont 2014-08-29 12:47:55 UTC
(In reply to Heiko W. Rupp from comment #2)
> It may be that this is directly caused by Bug 1129705 / Bug 1128780 so we
> should first fix the other two and then revisit this one.

Should be fixed in ER01 if Heiko's assumption is correct

Comment 4 Filip Brychta 2014-09-03 13:15:55 UTC
Still visible in
Version :	
3.3.0.ER01.1
Build Number :	
9941660:f3aa7e7

This problem is probably caused by following:
- according to server.log, plugins are updated just partly, some are updated and some are not. There is a lot of following msgs for different plugins in the server.log:
23:38:59,424 INFO  [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] (EJB default - 7) It appears the agent plugin [AgentPlugin [id=0, name=RHQAgent, md5=600c128c4115be9935c66251dd3518af]] in the database may be obsolete. If so, it will be updated soon by the version on the filesystem [/home/hudson/jon-server-3.3.0.ER01.1/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.12.0.JON330ER01.jar]

It says that plugins should be updated soon, but there were not updated even after more than 12 hours. The jon server must be restarted to actually upgrade the rest of plugins.

Previous facts probably cause following problems:
- imported resources are down
- some agents throw following errs:
2014-09-02 11:39:24,254 INFO  [main] (org.rhq.core.pc.PluginContainer)- Initializing Plugin Container v4.12.0.JON330ER01...
2014-09-02 11:39:31,650 ERROR [main] (rhq.core.pc.plugin.PluginManager)- Error initializing plugin container
java.lang.IllegalArgumentException: Plugin [JMX] is required by plugins [[ActiveMQ]] but it does not exist in the dependency graph yet
	at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:329)
	at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:342)
	at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeploymentOrder(PluginDependencyGraph.java:245)
	at org.rhq.core.pc.plugin.PluginManager.<init>(PluginManager.java:169)
	at org.rhq.core.pc.PluginContainer.initialize(PluginContainer.java:290)
	at org.rhq.enterprise.agent.AgentMain.startPluginContainer(AgentMain.java:2031)
	at org.rhq.enterprise.agent.AgentMain.start(AgentMain.java:729)
	at org.rhq.enterprise.agent.AgentMain.main(AgentMain.java:448)
2014-09-02 11:39:31,660 FATAL [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.startup-error}The agent encountered an error during startup and must abort
java.lang.RuntimeException: Cannot initialize the plugin container
	at org.rhq.core.pc.plugin.PluginManager.<init>(PluginManager.java:191)
	at org.rhq.core.pc.PluginContainer.initialize(PluginContainer.java:290)
	at org.rhq.enterprise.agent.AgentMain.startPluginContainer(AgentMain.java:2031)
	at org.rhq.enterprise.agent.AgentMain.start(AgentMain.java:729)
	at org.rhq.enterprise.agent.AgentMain.main(AgentMain.java:448)
Caused by: java.lang.IllegalArgumentException: Plugin [JMX] is required by plugins [[ActiveMQ]] but it does not exist in the dependency graph yet
	at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:329)
	at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:342)
	at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeploymentOrder(PluginDependencyGraph.java:245)
	at org.rhq.core.pc.plugin.PluginManager.<init>(PluginManager.java:169)
	... 4 more
2014-09-02 11:39:31,661 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.shutting-down}Agent is being shut down..


Workaround:
1- restart jon server - after restart, it's visible in server.log that the rest of plugins were updated and no more "...If so, it will be updated soon.." msgs were reported
2- restart all agents - after restart, it's visible in agent.log that plugins were downloaded and everything seems to be ok


See attached complete logs.

Comment 5 Filip Brychta 2014-09-03 13:16:25 UTC
Created attachment 934090 [details]
complete logs

Comment 6 John Mazzitelli 2014-09-12 20:01:08 UTC
I just tried going from 3.2.0 -> 3.3 ER01.1 and didn't see any problems.

I'll sniff around to see what possibly could be wrong, but nothing on my box shows any issue so its going to be more difficult.

Comment 8 John Mazzitelli 2014-09-17 20:18:03 UTC
tried again.

I installed 3.2.0.GA with the 3.2.0.GA EAP plugin pack. Put the agent and a standalone EAP 6.3 in inventory.

Then I upgraded to 3.2.2 (and 3.2.2 EAP plugin pack).

Then I upgraded to 3.3 ER03 (and 3.3.ER3 EAP plugin pack).

The plugins always got successfully deployed and updated each step of the way. I did not see any errors. 

I did see lots of these (as expected):

14:38:25,527 INFO  [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] (pool-6-thread-1) Filesystem has a plugin [RHQAgent] at the file [/home/mazz/Desktop/jon33/33/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.12.0.JON330ER03.jar] which is different than where the DB thinks it should be [/home/mazz/Desktop/jon33/33/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.9.0.JON320GA.jar]

14:38:25,527 INFO  [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] (pool-6-thread-1) It appears the agent plugin [AgentPlugin [id=0, name=RHQAgent, md5=600c128c4115be9935c66251dd3518af]] in the database may be obsolete. If so, it will be updated soon by the version on the filesystem [/home/mazz/Desktop/jon33/33/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.12.0.JON330ER03.jar].

But all plugins got updated successfullly.

Comment 9 Filip Brychta 2014-09-19 12:23:55 UTC
I reproduced it twice on ER3. Providing better repro steps.

I used following set up:
- jon3.2.0 with all available plugins
- 4 remote agents, one is monitoring EAP5, the rest monitor EAP6 standalone or domain

I tried following scenarious:
1 - copy ALL plugins to JON_HOME/plugins BEFORE upgrade -- issue is visible
2 - copy ALL plugins to JON_HOME/plugins AFTER upgrade -- successful upgrade without errors

Issue is not visible when using just EAP plugin pack.

Here is a scenario which reproduced the issue 3/3:
1 - jon3.2.0 is running in set up described above (all plugins are installed)
2 - unzip jon-server-3.3.0.ER03.zip
3 - copy all available plugin jars to jon-server-3.3.0.ER03/plugins/
4 - edit rhq-server.properties and set rhq.autoinstall.server.admin.password to workaround bz 1128151 (this can be any string)
5 - stop jon3.2.0
6 - upgrade to jon3.3 - ./rhqctl upgrade --from-server-dir /home/hudson/jon-server-3.2.0.GA/
7 - start jon3.3

Probably relevant errors in server.log:
03:54:24,744 ERROR [org.rhq.enterprise.server.core.plugin.PluginDeploymentScanner] (pool-6-thread-1) Scan failed. Cause: java.lang.Exception:File [/home/hudson/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/teiid-rhq-plugin-2.1.0.redhat-1.jar] is not a valid jarfile -  it is either corrupted or file has not been fully written yet.



03:54:25,683 ERROR [org.jboss.as.ejb3.invocation] (pool-6-thread-1) JBAS014134: EJB Invocation failed on component ServerPluginManagerBean for method public abstract org.rhq.core.domain.plugin.ServerPlugin org.rhq.enterprise.server.plugin.ServerPluginManagerLocal.getServerPluginRelationships(org.rhq.core.domain.plugin.ServerPlugin): javax.ejb.EJBException: java.lang.NullPointerException



03:54:25,697 WARN  [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Failed to preload server plugin [WflyPatchBundleServerPlugin] from URL [file:/home/hudson/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-serverplugins/rhq-serverplugin-wfly-patch-bundle-4.12.0.JON330ER03.jar]: java.lang.RuntimeException: Failed to get plugin config/schedules from the database



03:54:26,535 WARN  [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Master server plugin container has been initialized but it detected some problems. Parts of the server may not operate correctly due to these errors.
03:54:26,536 WARN  [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Problem #1: java.lang.RuntimeException:Failed to get plugin config/schedules from the database -> javax.ejb.EJBException:java.lang.NullPointerException -> java.lang.NullPointerException:null
03:54:26,536 WARN  [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Problem #2: java.lang.RuntimeException:Failed to get plugin config/schedules from the database -> javax.ejb.EJBException:java.lang.NullPointerException -> java.lang.NullPointerException:null


See attached server.log for complete exceptions.

Attached logs contain agent.log and server.log which are from run when plugins were copied before the upgrade and agent-post.log and server-post.log from run when plugins were copied after upgrade. You can see that second approach works correctly.

So at least documentation note that second approach is "safer" would be good.

Comment 10 Filip Brychta 2014-09-19 12:24:36 UTC
Created attachment 939244 [details]
complete logs #2

Comment 11 John Mazzitelli 2014-09-19 21:24:47 UTC
OK, I replicated. Must be something about those other plugins.

if I read the code right, if a plugin deployment fails, you need to restart the server to try again because we cache the plugins that we "deployed" which means we never try to deploy it again.

org.rhq.enterprise.server.core.plugin.AgentPluginScanner.agentPluginScan()

calls agentPluginScanFilesystem(), takes the returned list of plugin files, and marks them as "needing to be deployed" which then happens later. But if a plugin fails to deploy at that later time, because we haven't cleared the cache, agentPluginScanFilesystem won't return those plugin files again and we'll never retry. Its arguable that we should not retry since its probably going to fail again, but this BZ is saying just restarting will succesfully deploy them.

But the root of the problem remains - why did the plugin deployment fail initially? Fix that, and the issue with the cache is moot. I need to find out what plugin failed and why.

Comment 12 John Mazzitelli 2014-09-22 17:08:53 UTC
Something looks fishy with the two Teiid plugins. I don't know what or why, but we both show errors when processing them. I am going to try to just deploy the datavisualation plugin pack (which has the two  Teiid plugins) and see if that replicates the problem. Will be easy to replicate and debug if only a single plugin pack demonstrates the problem.

Comment 13 John Mazzitelli 2014-09-22 18:05:58 UTC
I think I found the crux of the problem.

Look at the Teiid plugins. First, the one from 3.2.0:

plugin name=Teiid
version=3.0.0
filename=teiid-rhq-plugin-1.0.0.Final-redhat-4.jar

OK, that's the OLD version. Now look at the new version (the one that comes with the 3.3 plugin pack):

plugin name=Teiid
version=2.0.1
filename=teiid-rhq-plugin-2.1.0.redhat-1.jar

The NEW one that came in the 3.3 plugin pack has a LOWER version number. Something must break because of this. Also look at the version and the file name - version 2.0.1 but file name shows 2.1.0 (not that that matters, but it is inconsistent and confusing).

I think when you restart the server, you get back the "old" 3.0.0 plugin, but I'm not sure.

We need someone to fix the Teiid plugin I think. The NEW plugin that comes with JON 3.3 should not have a LOWER version of the older one that came with JON 3.2.

Comment 14 John Mazzitelli 2014-09-22 18:32:50 UTC
Ted - can you look at the Teiid plugin? See my comment #13 specifically.

Comment 15 Van Halbert 2014-09-22 20:31:50 UTC
For data services / data virtualization,

EDS 5.x used version 2.x
DV 6.0 used version 3.0.0

so for DV 6.1 (which is using teiid-rhq  2.1.0 code), should be set to 3.1.0.

Comment 16 John Mazzitelli 2014-09-22 20:59:38 UTC
We'll need another BZ to address the Teiid plugin problem. For this BZ, I will introduce code that will at least allow the server to continue to process plugins even when this kind of thing happens (which today causes all plugin processing to abort).

AgentPluginScanner change to method registerAgentPlugins:

                     log.debug("Hot deploying agent plugin [" + di.url + "]...");
-                    this.agentPluginDeployer.pluginDetected(di);
+                    try {
+                        this.agentPluginDeployer.pluginDetected(di);
+                    } catch (Exception e) {
+                        log.error("Failed to process plugin at [" + di.url + "]. If it was obsolete and was deleted, this can be ignored. Otherwise, something is wrong with the plugin file and cannot be processed.", e);
+                    }

Comment 17 John Mazzitelli 2014-09-22 21:03:41 UTC
master commit:

commit 4eaa0e0bdfd0b99c8bdf626a4074747f9e4603c0
Author: John Mazzitelli <mazz@redhat.com>
Date:   Mon Sep 22 17:02:45 2014 -0400

    BZ 1129729 - catch any exceptions if a plugin failed to be processed during detection

Comment 18 Van Halbert 2014-09-22 21:09:49 UTC
I've opened a DV BZ:  https://bugzilla.redhat.com/show_bug.cgi?id=1145341
to fix the teiid plugin.

Comment 19 Thomas Segismont 2014-09-23 15:00:48 UTC
Cherry-picked over to release/jon3.3.x

commit 9001f28f0124ee2d1cc7e3a7ba4ef2e4359af6b5
Author: John Mazzitelli <mazz@redhat.com>
Date:   Mon Sep 22 17:02:45 2014 -0400

    BZ 1129729 - catch any exceptions if a plugin failed to be processed during detection - this should allow the rest of the plugins to continue to be processed.
    
    (cherry picked from commit 4eaa0e0bdfd0b99c8bdf626a4074747f9e4603c0)
    Signed-off-by: Thomas Segismont <tsegismo@redhat.com>

Comment 20 Simeon Pinder 2014-10-01 21:32:58 UTC
Moving to ON_QA as available for test with build:
https://brewweb.devel.redhat.com/buildinfo?buildID=388959

Comment 21 Filip Brychta 2014-10-08 10:45:22 UTC
Verified on
Version :	
3.3.0.ER04
Build Number :	
99d2107:d7c537e


Note You need to log in before you can comment on or make changes to this bug.