Bug 596704

Summary: mergeDiscoveredPackages generated ORA-00001: unique constraint (JON.RHQ_PACKAGE_VERSION_IDX) violation
Product: [Other] RHQ Project Reporter: Rajan Timaniya <rtimaniy>
Component: Core ServerAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Rajan Timaniya <rtimaniy>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.0.0CC: hrupp, spinder
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-12 16:49:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 565628, 591531    
Attachments:
Description Flags
server log
none
server log none

Description Rajan Timaniya 2010-05-27 10:50:34 UTC
Created attachment 417193 [details]
server log

Description of problem:
Upgrade JON 2.3.1 to JON 2.4 (JON build #169) with Oracle 10g gives exception -
"ORA-00001: unique constraint (JON.RHQ_PACKAGE_VERSION_IDX) violated" in server log.

2010-05-27 12:02:58,877 WARN  [org.rhq.enterprise.server.core.CoreServerServiceImpl] Agent [10.65.193.1][1.3.1.GA(5295)] would like to connect to this server but it is not supported
2010-05-27 12:02:59,846 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost].[/jboss-remoting-servlet-invoker].[ServerInvokerServlet]] Servlet.service() for servlet ServerInvokerServlet threw exception
java.io.InvalidClassException: org.rhq.core.domain.measurement.MeasurementReport; Serializable incompatible with Externalizable
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:579)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1583)
	...

2010-05-27 12:09:25,680 INFO  [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] Filesystem has a plugin [JBossESB] at the file [/NotBackedUp/install/jon170/jon-server-2.4.0-SNAPSHOT/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-soa-plugin-SOA.4.3.0.GA_CP02.jar] which is different than where the DB thinks it should be [/NotBackedUp/install/jon170/jon-server-2.4.0-SNAPSHOT/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-plugin-SOA.4.3.0.GA_CP02.jar]
2010-05-27 12:09:54,380 WARN  [org.hibernate.util.JDBCExceptionReporter] SQL Error: 1, SQLState: 23000
2010-05-27 12:09:54,380 ERROR [org.hibernate.util.JDBCExceptionReporter] ORA-00001: unique constraint (JON.RHQ_PACKAGE_VERSION_IDX) violated

2010-05-27 12:09:54,380 WARN  [org.hibernate.util.JDBCExceptionReporter] SQL Error: 1, SQLState: 23000
2010-05-27 12:09:54,380 ERROR [org.hibernate.util.JDBCExceptionReporter] ORA-00001: unique constraint (JON.RHQ_PACKAGE_VERSION_IDX) violated

2010-05-27 12:09:54,380 ERROR [org.hibernate.event.def.AbstractFlushingEventListener] Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
	at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:71)
	at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
	...


Version-Release number of selected component (if applicable):
JON build #170 - Revision:  10621
http://hudson-qe.rhq.rdu.redhat.com:8080/job/jon/170/

How reproducible:
Always

Steps to Reproduce:
1) Install JON 2.3.1 with Oracle 10g
2) Install one more JON 2.3.1 server with same database  (High Availability)
3) Install/configure 3 rhq-agests
4) Perform operations on JON 2.3.1
 (i) Create alerts for platform, JBoss AS4, RHQ Agent, Compatible group (alert
condition is Free Memory > 0.1 OR Total Memory < 999999)
 (ii) Create groups (Mix, Compatible, Group defination)
 (iii)Content repository syncronization (for JBoss Patch)
 (iv) Create Roles and Users
 (v) Change INVENTORY -> CONNECTION of JBoss AS4 
 (vi)Change configuration values of 'RHQ Agent' and create 'History'
 (vii)Perform LDAP settings
 (viii)Create 'SCHEDULES' for 'View Process List' on platform
5) Upgrade JON 2.3.1 to JON 2.4 (JON build #170) - Complete installation process
6) Log-in to JON 2.4 and verify the data which were generate before upgrade (on JON 2.3.1)
7) Check server log for any database error/exception

Note:I have upgraded JON2.4 at diff. location then JON2.3.1, (as both have diff. resource key) so i have imported 'JBossAS Server'.
  
Actual results:
Upgrade JON 2.3.1 to JON 2.4 with Oracle 10g gives error/exception -
"ORA-00001: unique constraint (JON.RHQ_PACKAGE_VERSION_IDX) violated" on screen.

Expected results:
Upgrade JON 2.3.1 to JON 2.4 should upgrade with all its data without any
error/exception.

Additional info:
Please refer attached server log.

Comment 1 Rajan Timaniya 2010-05-31 06:14:37 UTC
Description of problem:
Upgrade JON 2.3.1 to JON 2.4 (JON build #177) with postgres 8.4 gives exception -
ERROR: duplicate key value violates unique constraint "rhq_package_version_idx" in server log.

2010-05-31 11:17:33,151 ERROR [org.hibernate.util.JDBCExceptionReporter] Batch entry 0 insert into RHQ_PACKAGE_VERSION (PACKAGE_ID, DISPLAY_NAME, SHORT_DESCRIPTION, LONG_DESCRIPTION, VERSION, DISPLAY_VERSION, ARCHITECTURE_ID, FILE_NAME, FILE_SIZE, FILE_MD5, FILE_SHA256, FILE_CREATION_TIME, LICENSE_NAME, LICENSE_VERSION, METADATA, CONFIG_ID, PACKAGE_BITS_ID, ID) values ('10001', NULL, NULL, NULL, '1.1.1.GA', NULL, '1', 'jbossws-spi.jar', '107896', NULL, NULL, '1275282346000', NULL, NULL, NULL, '13118', NULL, '10101') was aborted.  Call getNextException to see the cause.
2010-05-31 11:17:33,151 WARN  [org.hibernate.util.JDBCExceptionReporter] SQL Error: 0, SQLState: 23505
2010-05-31 11:17:33,151 ERROR [org.hibernate.util.JDBCExceptionReporter] ERROR: duplicate key value violates unique constraint "rhq_package_version_idx"
2010-05-31 11:17:33,151 ERROR [org.hibernate.event.def.AbstractFlushingEventListener] Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
	at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:71)
	at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
	at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:254)
	at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:237)
	at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:141)

Version-Release number of selected component (if applicable):
JON build #177 - Revision:  10621
http://hudson-qe.rhq.rdu.redhat.com:8080/job/jon/177/

How reproducible:
Always

1) Install JON 2.3.1 on postgres 8.4
2) Create some alerts
   (i) Alert:
       Alert -1: Created on platform, the alert condition is Free Memory > 0.1
       Alert -2: Created on JBoss AS server, the alert condition is Free Memory
> 0.1
       Alert -3: Created on RHQ Agent, the alert condition is Free Memory > 0.1
3) Upgrade JON 2.3.1 to JON 2.4 (same database postgres 8.4)
4) Login to JON 2.4 and select platform or (any) resource
5) Click on already created alert

Actual results:
Upgrade JON 2.3.1 to JON 2.4 with Oracle 10g gives error/exception -
ERROR: duplicate key value violates unique constraint "rhq_package_version_idx" in server log

Expected results:
Upgrade JON 2.3.1 to JON 2.4 should upgrade with all its data without any
error/exception.

Comment 2 Rajan Timaniya 2010-05-31 06:16:00 UTC
Created attachment 418170 [details]
server log

Comment 3 Jay Shaughnessy 2010-06-02 13:42:44 UTC
I've updated the title to indicate that this is not limited to Oracle and is not only the product of an upgrade scenario.

It is the result of the fix for https://bugzilla.redhat.com/show_bug.cgi?id=589173

That fix needs to be reviewed as part of resolving this issue... reviewing now...

Comment 4 Jay Shaughnessy 2010-06-02 13:47:30 UTC
*** Bug 598115 has been marked as a duplicate of this bug. ***

Comment 5 Jay Shaughnessy 2010-06-02 22:28:01 UTC
I'm marking this as fixed in 8c3c04155de8b00658bc712cc2e4d064ad077f09

This commit also attempts to further resolve 589173.

Comment 6 Simeon Pinder 2010-06-03 13:17:15 UTC
Now that we're applying packageVersion information to *.jar elements as well, we need to figure out a way to handle versioning for commonly used jar components(like common-collections.jar) that can be incorrectly named/versioned across different versions of AS instances.

Comment 7 Jay Shaughnessy 2010-06-04 14:37:32 UTC
Here is a lengthy summary of of the issue here, related very much to the work in 589173, and the solution:

Starting in 2.4, with the initial work for 589173, when uploading a new package version the SHA256 is calculated and set on the PackageVersion record in the db.  This is currently the only time this value is set in the db.

Agent side, when performing discovery, newly discovered  packages (meaning, those not previously known), of certain packageTypes, get a SHA256 set on the discovered package info.

The SHA256 is the result of work for 589173 to identify a package that could have multiple package versions. Typically, this is for content backed packages resulting from upload in Create New and/or update via the Content tab.  Since different versions of a package  (e.g. a .war file) do not have discernible version or architecture info it is not possible for discovery to determine the version and architecture, and therefore identify an existing packageVersion via version/arch.  Today the code actually sets these values to "1.0" and "noarch" as placeholders.  This may often match an initial version but it's not exactly robust.  This is the purpose of the SHA256, as a mechanism to match a discovered package to a specific PackageVersion as known in the db by binary comparison.

The reason we are getting the exception in this BZ is due to a problem in the intitial 589173 impl that assumes the SHA256 is passed in from Agent discovery for *all* discovered packages.  This is not true, it is only passed in for newly discovered packages.  The SHA256 match fails and therefore it tries to create the PackageVersion.  This fails because in fact there very well may be a PackageVersion with the same package,version and arch, generating the IDX violation.  There is a unique index on Package+Version+Arch and this indexing is valid.

There is also an issue when the SHA256 is passed in but the db PackageVersion does not have a SHA256 set.  This is the upgrade scenario.  For upgrades no PackageVersion entries in the db will have a SHA256 set on the record.  But all packages will be treated as newly discovered.  A similar scenario is possible after an agent start with --clean or --purgedata (removal of inventory.dat).

(As an aside,  there was a bug in the SHA256 query, even when it is passed in for new packages.)

So, the question becomes how do we further fix 589173, thus eliminating the problem in this issue as well as the others mentioned in this description?

Without veering away totally from the SHA256 approach, we can enhance the mechanism further:

0) First off, SHA256 generation for discovered packages needs to move to the plugins and out of the plugin container (ContentManager.java). The plugins should be required to generate a SHA256 for any PackageVersion for Packages that may have multiple versions. This is most likely just about any Package so basically, if it is possible to generate the sha256 it will be done. (Note, if this turns out to not be performant enough we can probably leverage previously generated sha256 info held in the ResourceContainer's installedpackage info.)

1) For discovered packages supplying a SHA256 and creating a PackageVersion in the db, the SHA256 should be stored on create.  So, not just on uploaded files.  Basically, when possible we want to favor binary comparison for PackageVersion identification.

2) The query for PackageVersion existence should require the SHA256 if it is set in the db, but otherwise fall back to the version/arch match.  In this way we favor binary comparison but for PV's that do not have, or get, sha256 values (e.g. upgrade scenario, exploded wars, etc) we maintain the same matching we had previously.

The initial changes were made as described above, a couple of revisions folloed due to the scenarios below:

Scenario 1: We discover a (foo.jar, 1.0, noarch,12345).  The version and architecture are assigned by default, as usual.  The 12345 represents the sha256.   Then, foo.jar gets updated outside of our content mechanism, someone drops a new jar in manually or maybe it was part of a larger entity that itself was replaced.  It gets rediscovered and wham, same name, default 1.0 version and default noarch architecture.  But sha256 is now 67890.  The previous PackageVersion is presumably still in the db, the new discovered package comes in, the sha256 match fails, the version/arch check is not performed.  So, correctly, we find that this is a new PackageVersion but the version info passed in is bogus and we get the constraint violation.

result1: The plugin is unable to know what the version really should be so it sends in 1.0 each time.  The database is now smarter than the plugin because it can pair a version number to a sha256.  The solution is to improve the version string sent in for a discovered package.  Instead of "1.0" set the discovered package's version to, in order of preference:
  1) The manifest's implementation version
  2) sha256 
  3) 0

This is actually a much more realistic version number. If present the manifest version will correspond to the actual version of the jar/war/ear. And it is visually appealing. If that is not there then the sha256 is a very accurate version, basically equating version with the binary signature.  Failing a manifest version and the ability to generate a sha256 (exploded archive) then we have to go back to the original default approach. We'll use 0 as opposed to 1.0 to indicate less specificity, basically "unknown".  

So, the basic logic is (using upgrade scenario): In that case we may have a (foo.jar, 1.0, noarch, null) in the db.  We would then discover and send in maybe (foo.jar, 2.3.0, noarch,12345) or (foo.jar, 12345, noarch,12345).  The former using manifest version, the latter using sha256 for the version.  This would ignore a sha256 match because the db has no sha256.  It would fail the version match because 1.0 != 2.3.0 or 12345.  So, it would generate a new PV.  This is not really a problem, it means that from that point on we'd be switching to a sha256 mechanism for the legacy package. The installed package list for the resource would be updated accordingly to the new PV.


Scenario 2: We can't count on the manifest's version to be unique for different binaries. We found that commons-collections.jar could have the same version (3.1) in the manifest yet be a different binary.  This is probably not that uncommon and we should not rely on perfect manifest maintenance when new binaries are generated. Build number, date, etc may actually indicate version uniqueness, or possibly it's just not maintained at all.  We still want to use the manifest version if it is there, because it is the best human readable version that reflects reality for the package. But, we need uniqueness for differing binaries. So, the solution is to combine the manifest version and the sha256 when possible. So,
the new preference is:
  1) "ManifestVersion [sha256=<the actual sha256>]"
  2) "[sha256=<the actual sha256>]"
  3) "ManifestVersion"
  4) 0
  
This is the current solution.

Comment 8 Jay Shaughnessy 2010-06-04 18:34:26 UTC
Current solution seems solid.  

good as of commit 1bf6d42609a35e58f5e9ae5c2f562b68dac71bc9

A couple of notes.  The dbsetup or upgrade must be performed after this commit.  A db from a prior release/build may not be valid for testing.

Also, in an upgrade scenario you will may see installed packages change, with updated version strings.  This may not happen immediately but rather after package discovery.  This can be forced with agent restart, otherwise it happens intermittently, minimally on the hour, maybe even once per day, not sure.

The best testing would involve inventory of multiple app servers, some ear/war files that are uploaded via create new and/or updated via Content tab.  Inspection of installed jar files.  Also, uninventory/reinventory.

finally, re-verify 589173.

Comment 9 Rajan Timaniya 2010-06-07 11:55:33 UTC
Verified on JON 2.4 Beta1

Steps
1) Install JON 2.3.1 with Oracle 10g (High Availability - 2 servers and 3 agents)
2) Perform operations on JON 2.3.1
 (i) Create alerts for platform, JBoss AS4, RHQ Agent, Compatible group (alert
condition is Free Memory > 0.1 OR Total Memory > 0.1)
 (ii) Create groups (Mix, Compatible, Group defination)
 (iii)Content repository syncronization (for JBoss Patch)
 (iv) Create Roles and Users
 (v) Change INVENTORY -> CONNECTION of JBoss AS4 
 (vi)Change configuration values of 'RHQ Agent' and create 'History'
 (vii)Perform LDAP settings
 (viii)Create 'SCHEDULES' for 'View Process List' on platform
 (ix) Enabled JBossAS events
3) Stop/Down all JON2.3.1 servers
4) Keep all agents running
5) Upgrade one server from JON2.3.1 to JON 2.4 Beta1 (with same database -
Oracle 10g)
6) Please refer server log

There isn't observed error/exception:
ERROR: duplicate key value violates unique constraint "rhq_package_version_idx"
in server log

Comment 10 Corey Welton 2010-08-12 16:49:30 UTC
Mass-closure of verified bugs against JON.