Bug 1129729
| Summary: | Plugins are not updated after upgrade from jon3.2.2 to jon3.3.dr01 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Operations Network | Reporter: | Filip Brychta <fbrychta> | ||||||||||
| Component: | Upgrade | Assignee: | John Mazzitelli <mazz> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Filip Brychta <fbrychta> | ||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||
| Priority: | urgent | ||||||||||||
| Version: | JON 3.3.0 | CC: | hrupp, jshaughn, tejones, tsegismo, vhalbert | ||||||||||
| Target Milestone: | ER04 | Keywords: | Reopened | ||||||||||
| Target Release: | JON 3.3.0 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2014-12-11 14:00:46 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Filip Brychta
2014-08-13 14:38:13 UTC
Created attachment 926469 [details]
agent.log
It may be that this is directly caused by Bug 1129705 / Bug 1128780 so we should first fix the other two and then revisit this one. (In reply to Heiko W. Rupp from comment #2) > It may be that this is directly caused by Bug 1129705 / Bug 1128780 so we > should first fix the other two and then revisit this one. Should be fixed in ER01 if Heiko's assumption is correct Still visible in
Version :
3.3.0.ER01.1
Build Number :
9941660:f3aa7e7
This problem is probably caused by following:
- according to server.log, plugins are updated just partly, some are updated and some are not. There is a lot of following msgs for different plugins in the server.log:
23:38:59,424 INFO [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] (EJB default - 7) It appears the agent plugin [AgentPlugin [id=0, name=RHQAgent, md5=600c128c4115be9935c66251dd3518af]] in the database may be obsolete. If so, it will be updated soon by the version on the filesystem [/home/hudson/jon-server-3.3.0.ER01.1/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.12.0.JON330ER01.jar]
It says that plugins should be updated soon, but there were not updated even after more than 12 hours. The jon server must be restarted to actually upgrade the rest of plugins.
Previous facts probably cause following problems:
- imported resources are down
- some agents throw following errs:
2014-09-02 11:39:24,254 INFO [main] (org.rhq.core.pc.PluginContainer)- Initializing Plugin Container v4.12.0.JON330ER01...
2014-09-02 11:39:31,650 ERROR [main] (rhq.core.pc.plugin.PluginManager)- Error initializing plugin container
java.lang.IllegalArgumentException: Plugin [JMX] is required by plugins [[ActiveMQ]] but it does not exist in the dependency graph yet
at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:329)
at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:342)
at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeploymentOrder(PluginDependencyGraph.java:245)
at org.rhq.core.pc.plugin.PluginManager.<init>(PluginManager.java:169)
at org.rhq.core.pc.PluginContainer.initialize(PluginContainer.java:290)
at org.rhq.enterprise.agent.AgentMain.startPluginContainer(AgentMain.java:2031)
at org.rhq.enterprise.agent.AgentMain.start(AgentMain.java:729)
at org.rhq.enterprise.agent.AgentMain.main(AgentMain.java:448)
2014-09-02 11:39:31,660 FATAL [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.startup-error}The agent encountered an error during startup and must abort
java.lang.RuntimeException: Cannot initialize the plugin container
at org.rhq.core.pc.plugin.PluginManager.<init>(PluginManager.java:191)
at org.rhq.core.pc.PluginContainer.initialize(PluginContainer.java:290)
at org.rhq.enterprise.agent.AgentMain.startPluginContainer(AgentMain.java:2031)
at org.rhq.enterprise.agent.AgentMain.start(AgentMain.java:729)
at org.rhq.enterprise.agent.AgentMain.main(AgentMain.java:448)
Caused by: java.lang.IllegalArgumentException: Plugin [JMX] is required by plugins [[ActiveMQ]] but it does not exist in the dependency graph yet
at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:329)
at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeepDependencies(PluginDependencyGraph.java:342)
at org.rhq.core.clientapi.agent.metadata.PluginDependencyGraph.getDeploymentOrder(PluginDependencyGraph.java:245)
at org.rhq.core.pc.plugin.PluginManager.<init>(PluginManager.java:169)
... 4 more
2014-09-02 11:39:31,661 INFO [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.shutting-down}Agent is being shut down..
Workaround:
1- restart jon server - after restart, it's visible in server.log that the rest of plugins were updated and no more "...If so, it will be updated soon.." msgs were reported
2- restart all agents - after restart, it's visible in agent.log that plugins were downloaded and everything seems to be ok
See attached complete logs.
Created attachment 934090 [details]
complete logs
I just tried going from 3.2.0 -> 3.3 ER01.1 and didn't see any problems. I'll sniff around to see what possibly could be wrong, but nothing on my box shows any issue so its going to be more difficult. tried again. I installed 3.2.0.GA with the 3.2.0.GA EAP plugin pack. Put the agent and a standalone EAP 6.3 in inventory. Then I upgraded to 3.2.2 (and 3.2.2 EAP plugin pack). Then I upgraded to 3.3 ER03 (and 3.3.ER3 EAP plugin pack). The plugins always got successfully deployed and updated each step of the way. I did not see any errors. I did see lots of these (as expected): 14:38:25,527 INFO [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] (pool-6-thread-1) Filesystem has a plugin [RHQAgent] at the file [/home/mazz/Desktop/jon33/33/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.12.0.JON330ER03.jar] which is different than where the DB thinks it should be [/home/mazz/Desktop/jon33/33/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.9.0.JON320GA.jar] 14:38:25,527 INFO [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] (pool-6-thread-1) It appears the agent plugin [AgentPlugin [id=0, name=RHQAgent, md5=600c128c4115be9935c66251dd3518af]] in the database may be obsolete. If so, it will be updated soon by the version on the filesystem [/home/mazz/Desktop/jon33/33/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-agent-plugin-4.12.0.JON330ER03.jar]. But all plugins got updated successfullly. I reproduced it twice on ER3. Providing better repro steps. I used following set up: - jon3.2.0 with all available plugins - 4 remote agents, one is monitoring EAP5, the rest monitor EAP6 standalone or domain I tried following scenarious: 1 - copy ALL plugins to JON_HOME/plugins BEFORE upgrade -- issue is visible 2 - copy ALL plugins to JON_HOME/plugins AFTER upgrade -- successful upgrade without errors Issue is not visible when using just EAP plugin pack. Here is a scenario which reproduced the issue 3/3: 1 - jon3.2.0 is running in set up described above (all plugins are installed) 2 - unzip jon-server-3.3.0.ER03.zip 3 - copy all available plugin jars to jon-server-3.3.0.ER03/plugins/ 4 - edit rhq-server.properties and set rhq.autoinstall.server.admin.password to workaround bz 1128151 (this can be any string) 5 - stop jon3.2.0 6 - upgrade to jon3.3 - ./rhqctl upgrade --from-server-dir /home/hudson/jon-server-3.2.0.GA/ 7 - start jon3.3 Probably relevant errors in server.log: 03:54:24,744 ERROR [org.rhq.enterprise.server.core.plugin.PluginDeploymentScanner] (pool-6-thread-1) Scan failed. Cause: java.lang.Exception:File [/home/hudson/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/teiid-rhq-plugin-2.1.0.redhat-1.jar] is not a valid jarfile - it is either corrupted or file has not been fully written yet. 03:54:25,683 ERROR [org.jboss.as.ejb3.invocation] (pool-6-thread-1) JBAS014134: EJB Invocation failed on component ServerPluginManagerBean for method public abstract org.rhq.core.domain.plugin.ServerPlugin org.rhq.enterprise.server.plugin.ServerPluginManagerLocal.getServerPluginRelationships(org.rhq.core.domain.plugin.ServerPlugin): javax.ejb.EJBException: java.lang.NullPointerException 03:54:25,697 WARN [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Failed to preload server plugin [WflyPatchBundleServerPlugin] from URL [file:/home/hudson/jon-server-3.3.0.ER03/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-serverplugins/rhq-serverplugin-wfly-patch-bundle-4.12.0.JON330ER03.jar]: java.lang.RuntimeException: Failed to get plugin config/schedules from the database 03:54:26,535 WARN [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Master server plugin container has been initialized but it detected some problems. Parts of the server may not operate correctly due to these errors. 03:54:26,536 WARN [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Problem #1: java.lang.RuntimeException:Failed to get plugin config/schedules from the database -> javax.ejb.EJBException:java.lang.NullPointerException -> java.lang.NullPointerException:null 03:54:26,536 WARN [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] (pool-6-thread-1) Problem #2: java.lang.RuntimeException:Failed to get plugin config/schedules from the database -> javax.ejb.EJBException:java.lang.NullPointerException -> java.lang.NullPointerException:null See attached server.log for complete exceptions. Attached logs contain agent.log and server.log which are from run when plugins were copied before the upgrade and agent-post.log and server-post.log from run when plugins were copied after upgrade. You can see that second approach works correctly. So at least documentation note that second approach is "safer" would be good. Created attachment 939244 [details]
complete logs #2
OK, I replicated. Must be something about those other plugins. if I read the code right, if a plugin deployment fails, you need to restart the server to try again because we cache the plugins that we "deployed" which means we never try to deploy it again. org.rhq.enterprise.server.core.plugin.AgentPluginScanner.agentPluginScan() calls agentPluginScanFilesystem(), takes the returned list of plugin files, and marks them as "needing to be deployed" which then happens later. But if a plugin fails to deploy at that later time, because we haven't cleared the cache, agentPluginScanFilesystem won't return those plugin files again and we'll never retry. Its arguable that we should not retry since its probably going to fail again, but this BZ is saying just restarting will succesfully deploy them. But the root of the problem remains - why did the plugin deployment fail initially? Fix that, and the issue with the cache is moot. I need to find out what plugin failed and why. Something looks fishy with the two Teiid plugins. I don't know what or why, but we both show errors when processing them. I am going to try to just deploy the datavisualation plugin pack (which has the two Teiid plugins) and see if that replicates the problem. Will be easy to replicate and debug if only a single plugin pack demonstrates the problem. I think I found the crux of the problem. Look at the Teiid plugins. First, the one from 3.2.0: plugin name=Teiid version=3.0.0 filename=teiid-rhq-plugin-1.0.0.Final-redhat-4.jar OK, that's the OLD version. Now look at the new version (the one that comes with the 3.3 plugin pack): plugin name=Teiid version=2.0.1 filename=teiid-rhq-plugin-2.1.0.redhat-1.jar The NEW one that came in the 3.3 plugin pack has a LOWER version number. Something must break because of this. Also look at the version and the file name - version 2.0.1 but file name shows 2.1.0 (not that that matters, but it is inconsistent and confusing). I think when you restart the server, you get back the "old" 3.0.0 plugin, but I'm not sure. We need someone to fix the Teiid plugin I think. The NEW plugin that comes with JON 3.3 should not have a LOWER version of the older one that came with JON 3.2. Ted - can you look at the Teiid plugin? See my comment #13 specifically. For data services / data virtualization, EDS 5.x used version 2.x DV 6.0 used version 3.0.0 so for DV 6.1 (which is using teiid-rhq 2.1.0 code), should be set to 3.1.0. We'll need another BZ to address the Teiid plugin problem. For this BZ, I will introduce code that will at least allow the server to continue to process plugins even when this kind of thing happens (which today causes all plugin processing to abort).
AgentPluginScanner change to method registerAgentPlugins:
log.debug("Hot deploying agent plugin [" + di.url + "]...");
- this.agentPluginDeployer.pluginDetected(di);
+ try {
+ this.agentPluginDeployer.pluginDetected(di);
+ } catch (Exception e) {
+ log.error("Failed to process plugin at [" + di.url + "]. If it was obsolete and was deleted, this can be ignored. Otherwise, something is wrong with the plugin file and cannot be processed.", e);
+ }
master commit:
commit 4eaa0e0bdfd0b99c8bdf626a4074747f9e4603c0
Author: John Mazzitelli <mazz>
Date: Mon Sep 22 17:02:45 2014 -0400
BZ 1129729 - catch any exceptions if a plugin failed to be processed during detection
I've opened a DV BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1145341 to fix the teiid plugin. Cherry-picked over to release/jon3.3.x
commit 9001f28f0124ee2d1cc7e3a7ba4ef2e4359af6b5
Author: John Mazzitelli <mazz>
Date: Mon Sep 22 17:02:45 2014 -0400
BZ 1129729 - catch any exceptions if a plugin failed to be processed during detection - this should allow the rest of the plugins to continue to be processed.
(cherry picked from commit 4eaa0e0bdfd0b99c8bdf626a4074747f9e4603c0)
Signed-off-by: Thomas Segismont <tsegismo>
Moving to ON_QA as available for test with build: https://brewweb.devel.redhat.com/buildinfo?buildID=388959 Verified on Version : 3.3.0.ER04 Build Number : 99d2107:d7c537e |