Description of problem: During plugin type deletion, foreign key constraint violations occur in the server log. With a small amount(50~) of resources/services these violations do not occur. Version-Release number of selected component (if applicable): 4.5.1/master. How reproducible: always. Steps to Reproduce: 1. Setup perf test plugin to load a large number of instances a specific server/service that is to be removed later. 2. Import into inventory and create two groups i)recursive group with platform, ii)Dynagroup including only the type for later removal. 3. Comment out the service from the perftest plugin, rebuild and redeploy the plugin. 4. Monitor the RHQ server log for constraint violations listed below. Actual results: Constratraint violations in server log. Expected results: No constraint violations and types successfully removed. Additional info: ############# Constraint violations 05:44:07,029 WARN [org.hibernate.hql.internal.ast.QueryTranslatorImpl] (http--0.0.0.0-7080-2) HHH000104: firstResult/maxResults specified with collection fetch; applying in memory! ... 05:47:29,969 WARN [org.jboss.as.ejb3] (EJB default - 3) JBAS014143: Timer a58c58da-769c-4fb8-a8b8-dafb09387054 is still active, skipping overlapping scheduled execution at: Fri Jan 25 05:47:29 EST 2013 05:47:40,038 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) SQL Error: 0, SQLState: 23503 05:47:40,039 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) Batch entry 0 delete from RHQ_RESOURCE where ID=12220 was aborted. Call getNextException to see the cause. 05:47:40,040 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) SQL Error: 0, SQLState: 23503 05:47:40,040 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 4) ERROR: update or delete on table "rhq_resource" violates foreign key constraint "rhq_resource_group_res_imp_map_resource_id_fkey" on table "rhq_resource_group_res_imp_map" Detail: Key (id)=(12220) is still referenced from table "rhq_resource_group_res_imp_map". 05:47:40,043 ERROR [org.hibernate.engine.jdbc.batch.internal.BatchingBatch] (EJB default - 4) HHH000315: Exception executing batch [Batch entry 0 delete from RHQ_RESOURCE where ID=12220 was aborted. Call getNextException to see the cause.] 05:47:40,053 ERROR [org.jboss.ejb3.invocation] (EJB default - 4) JBAS014134: EJB Invocation failed on component ResourceMetadataManagerBean for method public abstract void org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerLocal.completeRemoveResourceType(org.rhq.core.domain.auth.Subject,org.rhq.core.domain.resource.ResourceType): javax.ejb.EJBException: javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: Batch entry 0 delete from RHQ_RESOURCE where ID=12220 was aborted. Call getNextException to see the cause. at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInOurTx(CMTTxInterceptor.java:166) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInOurTx(CMTTxInterceptor.java:230) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:304) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final] at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final] at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final] at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final] ################ If you delete the groups above then the following shows up 9:40:28,733 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 6) Batch entry 0 delete from RHQ_RESOURCE where ID=12619 was aborted. Call getNextException to see the cause. 09:40:28,734 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 6) SQL Error: 0, SQLState: 23503 09:40:28,734 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (EJB default - 6) ERROR: update or delete on table "rhq_resource" violates foreign key constraint "rhq_availability_resource_id_fkey" on table "rhq_availability" Detail: Key (id)=(12619) is still referenced from table "rhq_availability". 09:40:28,735 ERROR [org.hibernate.engine.jdbc.batch.internal.BatchingBatch] (EJB default - 6) HHH000315: Exception executing batch [Batch entry 0 delete from RHQ_RESOURCE where ID=12619 was aborted. Call getNextException to see the cause.] 09:40:28,742 ERROR [org.jboss.ejb3.invocation] (EJB default - 6) JBAS014134: EJB Invocation failed on component ResourceMetadataManagerBean for method public abstract void org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerLocal.completeRemoveResourceType(org.rhq.core.domain.auth.Subject,org.rhq.core.domain.resource.ResourceType): javax.ejb.EJBException: javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: Batch entry 0 delete from RHQ_RESOURCE where ID=12619 was aborted. Call getNextException to see the cause.
Finally figured out the problem here after looking through this code for hours. The ResourceCriteria query used during type deletion was not using unlimited page size but defaulting to only return 200 results. This means that for a specific type if there were more than 200 instances that constraint violations would occur because the type removal logic must first: i)remove all resources/instances of that type along we removing relationships/etc ii)then finally remove the type After the first 200 resources were actually removed, then constraint violation occurred because not all instances all the type had been removed as assumed by type removal logic. This will likely happen in larger installations when they attempt to remove a plugin from their management domain. As it's based on the number of resources returned via query then this error would occur over time with any moderately sized deployment. The result of the bug is that only the first 200 resources are removed and the plugin/type is not actually removed. Interestingly enough a work around for smaller installs is to continue to update the timestamp on the offending plugin and redeploy to clear out the offending types in 200 size increments. Wonder how many other places this subtle criteria default may be biting us.
This is fixed with commit: 9e70606deeb4654ae6117 to master Moving this to ON_QA.
FYI no indication this is a recent regression.
Improved on earlier fix to instead chunk through the available pages so as to defend against excessive memory consumption in large deployments. This is applied with commit : 4b1de4c61a8a5118d to master. Leaving this ON_QA. If this is cherrypicked to other branches, both commits need to be applied in order.
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.