Bug 905632

Summary: removing large numbers of plugin types cause foreign key constraint violations
Product: [Other] RHQ Project Reporter: Simeon Pinder <spinder>
Component: Core ServerAssignee: Simeon Pinder <spinder>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: hrupp
Target Milestone: ---   
Target Release: RHQ 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-01 10:06:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simeon Pinder 2013-01-29 19:48:00 UTC
Description of problem: During plugin type deletion, foreign key constraint violations occur in the server log.  With a small amount(50~) of resources/services these violations do not occur.

Version-Release number of selected component (if applicable):
4.5.1/master.

How reproducible:
always.

Steps to Reproduce:
1. Setup perf test plugin to load a large number of instances a specific server/service that is to be removed later.
2. Import into inventory and create two groups i)recursive group with platform, ii)Dynagroup including only the type for later removal.
3. Comment out the service from the perftest plugin, rebuild and redeploy the plugin.
4. Monitor the RHQ server log for constraint violations listed below.
  
Actual results:
Constratraint violations in server log.

Expected results:
No constraint violations and types successfully removed.

Additional info:
############# Constraint violations
05:44:07,029 WARN  [org.hibernate.hql.internal.ast.QueryTranslatorImpl] (http--0.0.0.0-7080-2) HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
...
05:47:29,969 WARN  [org.jboss.as.ejb3] (EJB default - 3) JBAS014143: Timer a58c58da-769c-4fb8-a8b8-dafb09387054 is still active, skipping overlapping scheduled execution at: Fri Jan 25 05:47:29 EST 2013
05:47:40,038 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) SQL Error: 0, SQLState: 23503
05:47:40,039 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) Batch entry 0 delete from RHQ_RESOURCE where ID=12220 was aborted.  Call getNextException to see the cause.
05:47:40,040 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) SQL Error: 0, SQLState: 23503
05:47:40,040 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) ERROR: update or delete on table "rhq_resource" violates foreign key constraint "rhq_resource_group_res_imp_map_resource_id_fkey" on table "rhq_resource_group_res_imp_map"
  Detail: Key (id)=(12220) is still referenced from table "rhq_resource_group_res_imp_map".
05:47:40,043 ERROR [org.hibernate.engine.jdbc.batch.internal.BatchingBatch] (EJB default - 4) HHH000315: Exception executing batch [Batch entry 0 delete from RHQ_RESOURCE where ID=12220 was aborted.  Call getNextException to see the cause.]
05:47:40,053 ERROR [org.jboss.ejb3.invocation] (EJB default - 4) JBAS014134: EJB Invocation failed on component ResourceMetadataManagerBean for method public abstract void org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerLocal.completeRemoveResourceType(org.rhq.core.domain.auth.Subject,org.rhq.core.domain.resource.ResourceType): javax.ejb.EJBException: javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: Batch entry 0 delete from RHQ_RESOURCE where ID=12220 was aborted.  Call getNextException to see the cause.
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInOurTx(CMTTxInterceptor.java:166) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInOurTx(CMTTxInterceptor.java:230) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:304) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]

################  If you delete the groups above then the following shows up
9:40:28,733 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 6) Batch entry 0 delete from RHQ_RESOURCE where ID=12619 was aborted.  Call getNextException to see the cause.
09:40:28,734 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 6) SQL Error: 0, SQLState: 23503
09:40:28,734 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 6) ERROR: update or delete on table "rhq_resource" violates foreign key constraint "rhq_availability_resource_id_fkey" on table "rhq_availability"
  Detail: Key (id)=(12619) is still referenced from table "rhq_availability".
09:40:28,735 ERROR [org.hibernate.engine.jdbc.batch.internal.BatchingBatch] (EJB default - 6) HHH000315: Exception executing batch [Batch entry 0 delete from RHQ_RESOURCE where ID=12619 was aborted.  Call getNextException to see the cause.]
09:40:28,742 ERROR [org.jboss.ejb3.invocation] (EJB default - 6) JBAS014134: EJB Invocation failed on component ResourceMetadataManagerBean for method public abstract void org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerLocal.completeRemoveResourceType(org.rhq.core.domain.auth.Subject,org.rhq.core.domain.resource.ResourceType): javax.ejb.EJBException: javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: Batch entry 0 delete from RHQ_RESOURCE where ID=12619 was aborted.  Call getNextException to see the cause.

Comment 1 Simeon Pinder 2013-01-29 20:34:08 UTC
Finally figured out the problem here after looking through this code for hours.  The ResourceCriteria query used during type deletion was not using unlimited page size but defaulting to only return 200 results.  This means that for a specific type if there were more than 200 instances that constraint violations would occur because the type removal logic must first:
i)remove all resources/instances of that type along we removing relationships/etc
ii)then finally remove the type

After the first 200 resources were actually removed, then constraint violation occurred because not all instances all the type had been removed as assumed by type removal logic.  This will likely happen in larger installations when they attempt to remove a plugin from their management domain.  As it's based on the number of resources returned via query then this error would occur over time with any moderately sized deployment. 

The result of the bug is that only the first 200 resources are removed and the plugin/type is not actually removed.  Interestingly enough a work around for smaller installs is to continue to update the timestamp on the offending plugin and redeploy to clear out the offending types in 200 size increments.

Wonder how many other places this subtle criteria default may be biting us.

Comment 2 Simeon Pinder 2013-01-29 20:39:05 UTC
This is fixed with commit: 9e70606deeb4654ae6117 to master

Moving this to ON_QA.

Comment 3 Charles Crouch 2013-01-29 20:49:47 UTC
FYI no indication this is a recent regression.

Comment 4 Simeon Pinder 2013-01-31 12:00:20 UTC
Improved on earlier fix to instead chunk through the available pages so as to defend against excessive memory consumption in large deployments.  This is applied with commit : 4b1de4c61a8a5118d  to master. 

Leaving this ON_QA.

If this is cherrypicked to other branches, both commits need to be applied in order.

Comment 5 Heiko W. Rupp 2013-09-01 10:06:05 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.