Description of problem: I have an issue when ugprading JON 3.1 to JON 3.1.1 Version-Release number of selected component (if applicable): JON 3.1.0 + JON 3.1.1.ER2 How reproducible:always Steps to Reproduce: 1. Install JON 3.0 with soa & eap pluginpacks 2. Upgrade to JON 3.1.1.ER2 2.1 unzip JON 3.1.1.ER2 to separate folder 2.2 copy 3.1.1.ER2 pluginpacks to /plugins directory 2.3 shutdown JON 3.1.0 and start JON 3.1.1 2.4 upgrade 3. check Administration-> Agent plugins Actual results: JBossAS7 plugin is still in version 3.1.0 In server log, there is interesting line repeated a few times INFO [org.rhq.enterprise.server.core.plugin.ProductPluginDeployer] File [/home/hudson/jon-server-3.1.1.ER2/jbossas/server/default/deplo y/rhq.ear/rhq-downloads/rhq-plugins/soa-p-5.3.0.GA-rhq-jbossesb-plugin-as4.jar] is not a valid jarfile - the file may not have been fully written yet. Cause: j ava.util.zip.ZipException: error in opening zip file It looks like this error prevents upgrading all other plugins that have been picked from /plugins directory. I've reproduced it twice in a row with SOA pluginpack installed in 3.1.0 and new plugins available in /plugins in 3.1.1 folder. I've also tried a scenario without installing/upgrading SOA pluginpack, and I didn't reproduce this BZ => all eap plugins got correctly upgraded. Expected results: all plugins (including pluginpack plugins got upgraded) Additional info: Workaround: all you need to do is restart JON 3.1.1.ER2 server and then all plugins get upgraded. - well I cannot observe it for soa-p-5.3.0.GA-rhq-jbossesb-plugin-as4.jar because it's version was not bumped. Maybe this line in log means that this jar was deployed ... *Discovered agent plugin [JBossESB]*
Created attachment 604605 [details] 3.1.1 server.log
this is not a regression, I had similar issue when upgrading from 3.0.0 to 3.0.1 (not sure what plugin jar was causing it)
If you do not apply 'workaround' you are not going to get plugins updated anymore - I've tried copying them to /plugins a few times and then pressing 'Scan for Updates' in UI, I've also uploaded some plugin (AS7) manually with no succcess. Only server restart fixed that.
This could be a cause of bug 825260 the same exception was found in logs
Assigning to mazz to investigate.
Libor - can you run a test for me? In the soa-p plugin, can you remove these two files from the jar and try again? 2826 Thu Jul 19 21:04:44 EDT 2012 META-INF/JBOSSCOD.RSA 1565 Thu Jul 19 21:04:44 EDT 2012 META-INF/JBOSSCOD.SF I have a feeling this problem is due to the fact the soa-p plugin is signed.
FYI: I was just able to upgrade from 3.1.0 to 3.1.1.ER2 (with SOA-P plugin pack) without errors. I'll try again, this come going from 3.0 to 3.1.1.ER2 (but I doubt it 3.0 vs 3.1 matters)
(In reply to comment #7) > FYI: I was just able to upgrade from 3.1.0 to 3.1.1.ER2 (with SOA-P plugin > pack) without errors. I'll try again, this come going from 3.0 to 3.1.1.ER2 > (but I doubt it 3.0 vs 3.1 matters) Never mind- I looked at step 1 in repo steps (where it says 3.0) but I think that's a typo. Looks like this repro'ed for libor started with 3.1.0.
my repo steps: --start with a clean db-- 1) unzip 3.1.0.GA 2) put the 3.1.0 soa-p plugin pack jars in the /plugins dir 3) start 3.1.0.GA, complete the installer - make sure the plugins are deployed OK 4) shutdown 3.1.0.GA 5) unzip 3.1.1.ER2 in a separate location 6) put the 3.1.1.ER2 soa-p plugin pack jars in the /plugins dir 7) start 3.1.1.ER2, complete the installer I did not see any errors in the log. Things appear to be successful. Will try again.
Len ... the upgrade of the SOA plugins ... I assume this is a testcase your team should own. Can you verify the SOA plugin upgrade behavior with JON 3.1.1? Thanks.
email sent to Len 8/15 ... Hi Len, Just FYI .... here's a BZ related to upgrade of SOA Plugins with JON 3.1.1. https://bugzilla.redhat.com/show_bug.cgi?id=848390 Libor reported an issue, that Mazz could not reproduce. The upgrade testing of the SOA plugin with JON 3.1.1 ... I assume that is a test your team performs. If your team has performed this test with JON 3.1.1 ... can you comment on that BZ? If your team has not performed upgrade testing ... well ... that is a testcase you should add for sure. Thanks! Mike
Created attachment 604904 [details] rhq-server.properties
Created attachment 604905 [details] 3.1.1 server.log - plugins not updated, no ZipException
Libor, in your rhq-server.properties you have this: rhq.autoinstall.database=overwrite This is not applicable for upgrading servers. This "overwrite" option tells the server to completely overwrite the database by dropping tables and recreating them. Instead, you should set that value to auto: rhq.autoinstall.database=auto This retains the database content so you don't lose any data (including your old plugin content that you want to update). Using overwrite, you aren't really testing the plugin update feature. Now, this said, using "overwrite" should be allowed when doing auto-install - so this could be a bug still. But I don't think its a blocker for 3.1.1 release. I say it isn't a blocker because this isn't a bug in the upgrade steps - people will use the "Keep existing data" option in the installer when upgrading (that is "auto" in the .properties file) - they will not use "overwrite". So it is not correct to say this problem causes "upgrading from 3.1 to 3.1.1 to fail" because you aren't doing a "true" upgrade. We should write a BZ (or reuse this one) to indicate that "overwrite" appears to be causing problems when installing JON over an existing schema (but where you want that schema to be completely blown away). If this is truly a bug (which your testing appears to make the case), the workaround in that case is to manually drop your DB schema outside of JON - then do your installation from there. Note that this bug probably doesn't appear only in auto-install mode - I bet if you did the same things, but instead used the web installer and selected the "Overwrite" drop down item, you'd see the same thing. But again, there is a workaround for that - just manually drop your DB schema and follow the JON install procedures from there.
Thanks mazz, I'll update my rhq-server.properties. I always clean up my db prior installing. Maybe I wasn't clear .. but .. I use my rhq-server.properties only to install JON 3.1.0 . I upgrade to JON 3.1.1 manually (I get zip unpacked, so I have default rhq-server.properties) and I select "keep data" when upgrading. But when I upgrade I always select my server name instead of 'New Server' which could be kind of related to 3.1.0 rhq-server.properties values. Is it related? Anyway, I do not loose my data after upgrade, because my inventory is untouched. I still think there's a bug, but it seems to be odd, that it is reproduced only when i upgrade a JON instance that was installed with attached rhq-server.properties file. I do not consider this as blocker too.
Reproduced when upgrading from 3.0.1 to 3.1.0 with oracle. Scenario: 1- JON 3.0.1 + all plugins(except tech-preview) installed and running 2- RHQ agent and eap5.1.2 imported to inventory 3- stop JON 3.0.1 server 4- unzip JON 3.1.0 5- cp all jar files from all(except tech-preview) 3.1.0 plugins to $JON3.1.0_HOME/plugins 6- start JON 3.1.0 server exception in server log: 2012-08-28 10:31:36,473 ERROR [org.rhq.enterprise.server.core.plugin.PluginDeploymentScanner] Scan failed. Cause: java.lang.Exception:File [/home/jon/jon-server-3.1.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/hornetq-jopr-plugin-2.0.0.Final.jar] is not a valid jarfile - it is either corrupted or file has not been fully written yet.
workaroud: restart RHQ server once more after installation
I finally saw this happen. I upgraded from 3.0.1 to 3.1.0 (i.e. basically what comment #19 said, only I used postgres not oracle - the DB backend doesn't matter). When I was unable to reproduce, I did not have an EAP instance in inventory - I am not sure content of inventory matters though. Also, when I was unable to reproduce, I only had the EAP and SOA-P plugin packs installed. This time I put the following in my /plugins directory: 3.0.1: BRMS, Dataservices, EAP, EPP, SOA 3.1.0: BRMS, Dataservices, EAP, EPP, SOA, JDG I have no idea what the problem is. I can confirm that I also saw the problem go away when I simply shutdown and restarted the server. So, that's the workaround for now.
Pushing this to jon312. Eng. were finally able to reproduce and confirm the workaround of restarting the JON server is effective. This appears not to be a regression. I'll clone this into a docs bug for 311 to document the workaround.
I have no idea if this is relevent - but... I recently got this information via email about our plugin deployment controller stuff - this may help workaround or fix the issue: > I just noticed the Javadoc on the constructor for the controller. It > specifically states that the services must already be ordered before > arriving. That makes sense. ; ) > > By the way, a nifty little library you might want to check out is: > http://jgrapht.org/ > > It allows you to express dependencies between objects as directed > graphs, and has methods to allow you to iterate those objects as > topologically sorted lists and even detects cycles in the graph. > Quite handy. > > And it looks like the project has been around for the better part of > a decade. Wish we had known about it before writing the circular > dependency checker and topological sorting routines from scratch. > > > In the executeServices() method of the Controller, it adds all > > threads to the pool and the Javadoc states that it allows them to > > block either via the pool size or the service dependencies. > > However, if you have a complex enough dependency graph, it could > > be the case that the N services that get added to the pool first > > are ALL waiting on services that are not yet in the pool. In this > > case, the program will wait indefinitely for the dependent > > services which will never be submitted to the pool. > > > > The fix is pretty simple though - implement an iterator over the > > dependency graph where it walks services in the order of > > execution, respecting dependencies. Then submit the jobs to the > > pool in that order. Under the covers the > > ExecutorService.newFixedThreadPool(int) uses a LinkedBlockingQueue > > for work that it can't process, which will guarantee that the > > order you submitted jobs will be the order they are handed off to > > an available thread in the pool.
I suspect the problem is the file copy that occurs from <RHQ_SERVER_HOME>/plugins/ to <RHQ_SERVER_HOME>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/. This is done in a separate thread then the plug-in update itself meaning that in some cases, the timing will result in a file being partially copied to the destination when the scan occurs. The more plug-ins in the plugins directory and the bigger they are, the more likely this is to occur.
(In reply to comment #25) > I suspect the problem is the file copy that occurs from > <RHQ_SERVER_HOME>/plugins/ to > <RHQ_SERVER_HOME>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq- > plugins/. This is done in a separate thread then the plug-in update itself > meaning that in some cases, the timing will result in a file being partially > copied to the destination when the scan occurs. The more plug-ins in the > plugins directory and the bigger they are, the more likely this is to occur. We actually have code in place to "workaround" that by retrying the read. We retry like 5 times, so I don't think this is the cause.
The file takes several seconds to copy? What is the delay between re-try attempts?
(In reply to comment #27) > The file takes several seconds to copy? What is the delay between re-try > attempts? Sorry... that should have been: (In reply to comment #26) > (In reply to comment #25) > > I suspect the problem is the file copy that occurs from > > <RHQ_SERVER_HOME>/plugins/ to > > <RHQ_SERVER_HOME>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq- > > plugins/. This is done in a separate thread then the plug-in update itself > > meaning that in some cases, the timing will result in a file being partially > > copied to the destination when the scan occurs. The more plug-ins in the > > plugins directory and the bigger they are, the more likely this is to occur. > > We actually have code in place to "workaround" that by retrying the read. We > retry like 5 times, so I don't think this is the cause. The file takes several seconds to copy. What is the delay between re-try attempts?
(In reply to comment #28) > (In reply to comment #27) > > The file takes several seconds to copy? What is the delay between re-try > > attempts? 5 seconds between retries, 4 retries, total of 20 seconds: From org.rhq.enterprise.server.core.plugin.ProductPluginDeployer: private void ensureDeploymentIsValid(File pluginFile) throws Exception { // try a few times (sleeping between retries) // if the zip file still isn't valid, its probably corrupted and not simply due to the file still being written out int retries = 4; while (!isDeploymentValidZipFile(pluginFile)) { if (--retries <= 0) { throw new Exception("File [" + pluginFile + "] is not a valid jarfile - " + " it is either corrupted or file has not been fully written yet."); } try { Thread.sleep(5000L); } catch (InterruptedException e) { break; } } return; }
Swapping priority and severity settings, to match what I think Filip was trying to specify.
Moving to JON BZ product for triage.
(in reply to comment 29) The ensureDeploymentIsValid calls isDeploymentValidZipFile. The isDeploymentValidZipFile method returns true if it is able to read an entry from the zip/jar file; so, it is entirely possible for this method to return true when the file has not been fully copied. I cannot say if this is the cause of the problems originally encountered, but it is definitely a possibility. We do not require anything in the jar manifest or in the plugin descriptor that we could use to determine when the file has been fully copied. One possibility might be to use a file marker. When ProductPluginDeployer detects a new plugin, it would only start the process of registering the plugin when no marker file is present.
I think the normal way this is done is by simply copying the file with an extension that will prevent it from being seen by the plug-in scanner. For example: mv plugins/my-plugin.jar jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/my-plugin.jar.part mv jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/my-plugin.jar.part jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/my-plugin.jar On Linux and Windows, such a command will simply result in the file being moved and once it is in its new location, it gets renamed. Therefore, only once it is finished being copied will it have the .jar extension.
Applied the instructions in comment 19 for a JON3.1.2 -> JON3.2 migration. Could not reproduce. There has been some changes in the way the plugin scanner is triggered on server startup. Libor can you also check that the JON3.1.2 -> JON3.2 migration is ok?
Same here, I've just upgraded JON 3.1.2 to JON 3.2.ER3 applying instructions in comment 19.
Thanks Libor so moving to MODIFIED status ER3 so that it gets VERIFIED status ASAP :)
previously verifed on ER3