848390 – When upgrading from JON 3.1.0 to 3.1.1 plugins from pluginpacks are not upgraded

Bug 848390 - When upgrading from JON 3.1.0 to 3.1.1 plugins from pluginpacks are not upgraded

Summary: When upgrading from JON 3.1.0 to 3.1.1 plugins from pluginpacks are not upgraded

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Installer
Sub Component:
Version:	JON 3.1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	ER03
Target Release:	JON 3.2.0
Assignee:	Thomas Segismont
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	854349
TreeView+	depends on / blocked

Reported:	2012-08-15 13:25 UTC by Libor Zoubek
Modified:	2015-11-02 00:43 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Clones:	854349 (view as bug list)
Environment:
Last Closed:
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
3.1.1 server.log (101.94 KB, application/octet-stream) 2012-08-15 13:26 UTC, Libor Zoubek	no flags	Details
rhq-server.properties (8.17 KB, application/octet-stream) 2012-08-16 11:42 UTC, Libor Zoubek	no flags	Details
3.1.1 server.log - plugins not updated, no ZipException (118.18 KB, application/octet-stream) 2012-08-16 11:43 UTC, Libor Zoubek	no flags	Details
View All

Description Libor Zoubek 2012-08-15 13:25:42 UTC

Description of problem: I have an issue when ugprading JON 3.1 to JON 3.1.1

Version-Release number of selected component (if applicable):
JON 3.1.0 + JON 3.1.1.ER2

How reproducible:always

Steps to Reproduce:
1. Install JON 3.0 with soa & eap pluginpacks
2. Upgrade to JON 3.1.1.ER2
2.1 unzip JON 3.1.1.ER2 to separate folder
2.2 copy 3.1.1.ER2 pluginpacks to /plugins directory
2.3 shutdown JON 3.1.0 and start JON 3.1.1
2.4 upgrade
3. check Administration-> Agent plugins

Actual results: JBossAS7 plugin is still in version 3.1.0

In server log, there is interesting line repeated a few times

INFO [org.rhq.enterprise.server.core.plugin.ProductPluginDeployer] File [/home/hudson/jon-server-3.1.1.ER2/jbossas/server/default/deplo
y/rhq.ear/rhq-downloads/rhq-plugins/soa-p-5.3.0.GA-rhq-jbossesb-plugin-as4.jar] is not a valid jarfile - the file may not have been fully written yet. Cause: j
ava.util.zip.ZipException: error in opening zip file

It looks like this error prevents upgrading all other plugins that have been picked from /plugins directory.

I've reproduced it twice in a row with SOA pluginpack installed in 3.1.0 and new plugins available in /plugins in 3.1.1 folder.

I've also tried a scenario without installing/upgrading SOA pluginpack, and I didn't reproduce this BZ => all eap plugins got correctly upgraded.

Expected results: all plugins (including pluginpack plugins got upgraded)

Additional info:

Workaround: all you need to do is restart JON 3.1.1.ER2 server and then all plugins get upgraded. - well I cannot observe it for soa-p-5.3.0.GA-rhq-jbossesb-plugin-as4.jar because it's version was not bumped. Maybe this line in log means that this jar was deployed ...

*Discovered agent plugin [JBossESB]*

Comment 1 Libor Zoubek 2012-08-15 13:26:40 UTC

Created attachment 604605 [details]
3.1.1 server.log

Comment 2 Libor Zoubek 2012-08-15 13:27:34 UTC

this is not a regression, I had similar issue when upgrading from 3.0.0 to 3.0.1 (not sure what plugin jar was causing it)

Comment 3 Libor Zoubek 2012-08-15 13:30:17 UTC

If you do not apply 'workaround' you are not going to get plugins updated anymore - I've tried copying them to /plugins a few times and then pressing 'Scan for Updates' in UI, I've also uploaded some plugin (AS7) manually with no succcess. Only server restart fixed that.

Comment 4 Filip Brychta 2012-08-15 13:44:23 UTC

This could be a cause of bug 825260 the same exception was found in logs

Comment 5 Charles Crouch 2012-08-15 19:51:00 UTC

Assigning to mazz to investigate.

Comment 6 John Mazzitelli 2012-08-15 21:05:30 UTC

Libor - can you run a test for me? In the soa-p plugin, can you remove these two files from the jar and try again?

2826 Thu Jul 19 21:04:44 EDT 2012 META-INF/JBOSSCOD.RSA
1565 Thu Jul 19 21:04:44 EDT 2012 META-INF/JBOSSCOD.SF

I have a feeling this problem is due to the fact the soa-p plugin is signed.

Comment 7 John Mazzitelli 2012-08-15 21:41:35 UTC

FYI: I was just able to upgrade from 3.1.0 to 3.1.1.ER2 (with SOA-P plugin pack) without errors. I'll try again, this come going from 3.0 to 3.1.1.ER2 (but I doubt it 3.0 vs 3.1 matters)

Comment 8 John Mazzitelli 2012-08-15 21:48:27 UTC

(In reply to comment #7)
> FYI: I was just able to upgrade from 3.1.0 to 3.1.1.ER2 (with SOA-P plugin
> pack) without errors. I'll try again, this come going from 3.0 to 3.1.1.ER2
> (but I doubt it 3.0 vs 3.1 matters)

Never mind-  I looked at step 1 in repo steps (where it says 3.0) but I think that's a typo. Looks like this repro'ed for libor started with 3.1.0.

Comment 9 John Mazzitelli 2012-08-15 21:52:03 UTC

my repo steps:

--start with a clean db--

1) unzip 3.1.0.GA
2) put the 3.1.0 soa-p plugin pack jars in the /plugins dir
3) start 3.1.0.GA, complete the installer - make sure the plugins are deployed OK
4) shutdown 3.1.0.GA
5) unzip 3.1.1.ER2 in a separate location
6) put the 3.1.1.ER2 soa-p plugin pack jars in the /plugins dir
7) start 3.1.1.ER2, complete the installer

I did not see any errors in the log. Things appear to be successful.

Will try again.

Comment 10 Mike Foley 2012-08-15 22:06:11 UTC

Len ... the upgrade of the SOA plugins ... I assume this is a testcase your team should own.  Can you verify the SOA plugin upgrade behavior with JON 3.1.1?  Thanks.

Comment 11 Mike Foley 2012-08-15 22:23:05 UTC

email sent to Len 8/15 ...

Hi Len,

Just FYI .... here's a BZ related to upgrade of SOA Plugins with JON 3.1.1.

https://bugzilla.redhat.com/show_bug.cgi?id=848390

Libor reported an issue, that Mazz could not reproduce.  The upgrade testing of the SOA plugin with JON 3.1.1 ... I assume that is a test your team performs.    If your team has performed this test with JON 3.1.1 ... can you comment on that BZ?  If your team has not performed upgrade testing ... well ... that is a testcase you should add for sure.

Thanks!

Mike

Comment 15 Libor Zoubek 2012-08-16 11:42:34 UTC

Created attachment 604904 [details]
rhq-server.properties

Comment 16 Libor Zoubek 2012-08-16 11:43:14 UTC

Created attachment 604905 [details]
3.1.1 server.log - plugins not updated, no ZipException

Comment 17 John Mazzitelli 2012-08-16 14:00:28 UTC

Libor, in your rhq-server.properties you have this:

rhq.autoinstall.database=overwrite

This is not applicable for upgrading servers. This "overwrite" option tells the server to completely overwrite the database by dropping tables and recreating them.

Instead, you should set that value to auto:

rhq.autoinstall.database=auto

This retains the database content so you don't lose any data (including your old plugin content that you want to update).

Using overwrite, you aren't really testing the plugin update feature.

Now, this said, using "overwrite" should be allowed when doing auto-install - so this could be a bug still. But I don't think its a blocker for 3.1.1 release. I say it isn't a blocker because this isn't a bug in the upgrade steps - people will use the "Keep existing data" option in the installer when upgrading (that is "auto" in the .properties file) - they will not use "overwrite". So it is not correct to say this problem causes "upgrading from 3.1 to 3.1.1 to fail" because you aren't doing a "true" upgrade.

We should write a BZ (or reuse this one) to indicate that "overwrite" appears to be causing problems when installing JON over an existing schema (but where you want that schema to be completely blown away). If this is truly a bug (which your testing appears to make the case), the workaround in that case is to manually drop your DB schema outside of JON - then do your installation from there.

Note that this bug probably doesn't appear only in auto-install mode - I bet if you did the same things, but instead used the web installer and selected the "Overwrite" drop down item, you'd see the same thing. But again, there is a workaround for that - just manually drop your DB schema and follow the JON install procedures from there.

Comment 18 Libor Zoubek 2012-08-16 15:14:39 UTC

Thanks mazz, I'll update my rhq-server.properties. 

I always clean up my db prior installing.

Maybe I wasn't clear .. but .. I use my rhq-server.properties only to install JON 3.1.0 . I upgrade to JON 3.1.1 manually (I get zip unpacked, so I have default rhq-server.properties) and I select "keep data" when upgrading. But when I upgrade I always select my server name instead of 'New Server' which could be kind of related to 3.1.0 rhq-server.properties values. Is it related? Anyway, I do not loose my data after upgrade, because my inventory is untouched.

I still think there's a bug, but it seems to be odd, that it is reproduced only when i upgrade a JON instance that was installed with attached rhq-server.properties file. I do not consider this as blocker too.

Comment 19 Filip Brychta 2012-08-28 09:56:52 UTC

Reproduced when upgrading from 3.0.1 to 3.1.0 with oracle.
Scenario:
1- JON 3.0.1 + all plugins(except tech-preview) installed and running
2- RHQ agent and eap5.1.2 imported to inventory
3- stop JON 3.0.1 server
4- unzip JON 3.1.0
5- cp all jar files from all(except tech-preview) 3.1.0 plugins to $JON3.1.0_HOME/plugins
6- start JON 3.1.0 server

exception in server log:
2012-08-28 10:31:36,473 ERROR [org.rhq.enterprise.server.core.plugin.PluginDeploymentScanner] Scan failed. Cause: java.lang.Exception:File [/home/jon/jon-server-3.1.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/hornetq-jopr-plugin-2.0.0.Final.jar] is not a valid jarfile -  it is either corrupted or file has not been fully written yet.

Comment 21 Filip Brychta 2012-08-31 06:53:19 UTC

workaroud: restart RHQ server once more after installation

Comment 22 John Mazzitelli 2012-09-04 16:51:55 UTC

I finally saw this happen. I upgraded from 3.0.1 to 3.1.0 (i.e. basically what comment #19 said, only I used postgres not oracle - the DB backend doesn't matter).

When I was unable to reproduce, I did not have an EAP instance in inventory - I am not sure content of inventory matters though. Also, when I was unable to reproduce, I only had the EAP and SOA-P plugin packs installed. This time I put the following in my /plugins directory:

3.0.1: BRMS, Dataservices, EAP, EPP, SOA
3.1.0: BRMS, Dataservices, EAP, EPP, SOA, JDG

I have no idea what the problem is. I can confirm that I also saw the problem go away when I simply shutdown and restarted the server. So, that's the workaround for now.

Comment 23 Charles Crouch 2012-09-04 17:22:46 UTC

Pushing this to jon312. Eng. were finally able to reproduce and confirm the workaround of restarting the JON server is effective. This appears not to be a regression. I'll clone this into a docs bug for 311 to document the workaround.

Comment 24 John Mazzitelli 2012-09-13 19:53:28 UTC

I have no idea if this is relevent - but... I recently got this information via email about our plugin deployment controller stuff - this may help workaround or fix the issue:

> I just noticed the Javadoc on the constructor for the controller.  It
> specifically states that the services must already be ordered before
> arriving.  That makes sense.  ; )
> 
> By the way, a nifty little library you might want to check out is:
> http://jgrapht.org/
> 
> It allows you to express dependencies between objects as directed
> graphs, and has methods to allow you to iterate those objects as
> topologically sorted lists and even detects cycles in the graph.
>  Quite handy.
> 
> And it looks like the project has been around for the better part of
> a decade.  Wish we had known about it before writing the circular
> dependency checker and topological sorting routines from scratch.
> 
> > In the executeServices() method of the Controller, it adds all
> > threads to the pool and the Javadoc states that it allows them to
> > block either via the pool size or the service dependencies.
> >  However, if you have a complex enough dependency graph, it could
> > be the case that the N services that get added to the pool first
> > are ALL waiting on services that are not yet in the pool.  In this
> > case, the program will wait indefinitely for the dependent
> > services which will never be submitted to the pool.
> > 
> > The fix is pretty simple though - implement an iterator over the
> > dependency graph where it walks services in the order of
> > execution, respecting dependencies.  Then submit the jobs to the
> > pool in that order.  Under the covers the
> > ExecutorService.newFixedThreadPool(int) uses a LinkedBlockingQueue
> > for work that it can't process, which will guarantee that the
> > order you submitted jobs will be the order they are handed off to
> > an available thread in the pool.

Comment 25 Larry O'Leary 2012-09-24 22:53:28 UTC

I suspect the problem is the file copy that occurs from <RHQ_SERVER_HOME>/plugins/ to <RHQ_SERVER_HOME>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/. This is done in a separate thread then the plug-in update itself meaning that in some cases, the timing will result in a file being partially copied to the destination when the scan occurs. The more plug-ins in the plugins directory and the bigger they are, the more likely this is to occur.

Comment 26 John Mazzitelli 2012-09-24 23:01:29 UTC

(In reply to comment #25)
> I suspect the problem is the file copy that occurs from
> <RHQ_SERVER_HOME>/plugins/ to
> <RHQ_SERVER_HOME>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-
> plugins/. This is done in a separate thread then the plug-in update itself
> meaning that in some cases, the timing will result in a file being partially
> copied to the destination when the scan occurs. The more plug-ins in the
> plugins directory and the bigger they are, the more likely this is to occur.

We actually have code in place to "workaround" that by retrying the read. We retry like 5 times, so I don't think this is the cause.

Comment 27 Larry O'Leary 2012-09-24 23:07:31 UTC

The file takes several seconds to copy? What is the delay between re-try attempts?

Comment 28 Larry O'Leary 2012-09-24 23:08:26 UTC

(In reply to comment #27)
> The file takes several seconds to copy? What is the delay between re-try
> attempts?

Sorry... that should have been:

(In reply to comment #26)
> (In reply to comment #25)
> > I suspect the problem is the file copy that occurs from
> > <RHQ_SERVER_HOME>/plugins/ to
> > <RHQ_SERVER_HOME>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-
> > plugins/. This is done in a separate thread then the plug-in update itself
> > meaning that in some cases, the timing will result in a file being partially
> > copied to the destination when the scan occurs. The more plug-ins in the
> > plugins directory and the bigger they are, the more likely this is to occur.
> 
> We actually have code in place to "workaround" that by retrying the read. We
> retry like 5 times, so I don't think this is the cause.

The file takes several seconds to copy. What is the delay between re-try attempts?

Comment 29 John Mazzitelli 2012-09-24 23:17:21 UTC

(In reply to comment #28)
> (In reply to comment #27)
> > The file takes several seconds to copy? What is the delay between re-try
> > attempts?

5 seconds between retries, 4 retries, total of 20 seconds:

From org.rhq.enterprise.server.core.plugin.ProductPluginDeployer:

    private void ensureDeploymentIsValid(File pluginFile) throws Exception {

        // try a few times (sleeping between retries)
        // if the zip file still isn't valid, its probably corrupted and not simply due to the file still being written out
        int retries = 4;
        while (!isDeploymentValidZipFile(pluginFile)) {
            if (--retries <= 0) {
                throw new Exception("File [" + pluginFile + "] is not a valid jarfile - "
                    + " it is either corrupted or file has not been fully written yet.");
            }
            try {
                Thread.sleep(5000L);
            } catch (InterruptedException e) {
                break;
            }
        }
        return;
    }

Comment 30 Charles Crouch 2012-11-19 18:20:50 UTC

Swapping priority and severity settings, to match what I think Filip was trying to specify.

Comment 31 Charles Crouch 2012-11-19 18:22:26 UTC

Moving to JON BZ product for triage.

Comment 32 John Sanda 2012-11-20 16:14:42 UTC

(in reply to comment 29)

The ensureDeploymentIsValid calls isDeploymentValidZipFile. The isDeploymentValidZipFile method returns true if it is able to read an entry from the zip/jar file; so, it is entirely possible for this method to return true when the file has not been fully copied.

I cannot say if this is the cause of the problems originally encountered, but it is definitely a possibility. We do not require anything in the jar manifest or in the plugin descriptor that we could use to determine when the file has been fully copied. One possibility might be to use a file marker. When ProductPluginDeployer detects a new plugin, it would only start the process of registering the plugin when no marker file is present.

Comment 33 Larry O'Leary 2012-11-20 18:12:30 UTC

I think the normal way this is done is by simply copying the file with an extension that will prevent it from being seen by the plug-in scanner. For example:

mv plugins/my-plugin.jar jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/my-plugin.jar.part

mv jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/my-plugin.jar.part jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/my-plugin.jar



On Linux and Windows, such a command will simply result in the file being moved and once it is in its new location, it gets renamed. Therefore, only once it is finished being copied will it have the .jar extension.

Comment 34 Thomas Segismont 2013-10-04 15:27:52 UTC

Applied the instructions in comment 19 for a JON3.1.2 -> JON3.2 migration. Could not reproduce.

There has been some changes in the way the plugin scanner is triggered on server startup.

Libor can you also check that the JON3.1.2 -> JON3.2 migration is ok?

Comment 35 Libor Zoubek 2013-10-08 09:36:34 UTC

Same here, I've just upgraded JON 3.1.2 to JON 3.2.ER3 applying instructions in comment 19.

Comment 36 Thomas Segismont 2013-10-10 14:44:43 UTC

Thanks Libor so moving to MODIFIED status ER3 so that it gets VERIFIED status ASAP :)

Comment 39 Libor Zoubek 2013-10-10 15:26:19 UTC

previously verifed on ER3

Note You need to log in before you can comment on or make changes to this bug.