Bug 1120417

Summary: Break up transaction of updatePluginConfigurationDefinition into smaller pieces
Product: [Other] RHQ Project Reporter: Elias Ross <genman>
Component: InstallerAssignee: Jay Shaughnessy <jshaughn>
Status: ON_QA --- QA Contact:
Severity: urgent Docs Contact:
Priority: high    
Version: 4.9CC: hrupp, jshaughn
Target Milestone: GA   
Target Release: RHQ 4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elias Ross 2014-07-16 21:32:23 UTC
Description of problem:

The following transaction can take a long, long time:

public class PluginConfigurationMetadataManagerBean implements PluginConfigurationMetadataManagerLocal {

    public void updatePluginConfigurationDefinition(ResourceType existingType, ResourceType newType) {

...

                    //Use CriteriaQuery to automatically chunk/page through criteria query results
                    CriteriaQueryExecutor<Resource, ResourceCriteria> queryExecutor = new CriteriaQueryExecutor<Resource, ResourceCriteria>() {
                        @Override
                        public PageList<Resource> execute(ResourceCriteria criteria) {
                            return resourceMgr.findResourcesByCriteria(overlord, criteria);
                        }
                    };

                    CriteriaQuery<Resource, ResourceCriteria> resources = new CriteriaQuery<Resource, ResourceCriteria>(
                        criteria, queryExecutor);

                    for (Resource resource : resources) {
                        updateResourcePluginConfiguration(resource, updateReport);
                    }

If there are thousands of resources, then the updateResourcePluginConfiguration can take a long, long time (like > 10 minutes)

Version-Release number of selected component (if applicable): 4.12


How reproducible: Depending on number of resources


Steps to Reproduce:
1. Create 10,000+ resources with config
2. Try to update resource type

Actual results: Transaction hangs


Expected results: Transaction completes



Additional info:

Comment 1 Jay Shaughnessy 2014-07-17 14:31:16 UTC
I think in general we are in timeout danger when updating plugin types at scale.  We currently try to update all of the types in one Tx, which for something like the AS7 plugin and it's 200+ types could be an issue even with the extended, 30 minute timeout we apply on the outer Tx.  We may want to look at performing one type update per Tx.  At least that way, if one fails, we may have made progress towards updating the plugin overall.

Furthermore, we could apply the extended timeout to each type.  And past that we can still optimize as suggested above, using nested transactions as necessary, to break up the work and reduce the rollback logging for any single Tx.

Looking to see how difficult this re-working may be...

Comment 2 Jay Shaughnessy 2014-07-18 17:29:33 UTC
Elias, thanks for reporting this, please feel free to review the commit.


master commit 26e5712b8cefc7601f6ee95091922de667ee3752
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Jul 18 13:25:47 2014 -0400

    Another round of scalability enhancements for updating plugin metadata.  In
    the past we broke the update of each Plugin into its own Tx.  Later we split
    registering types and removing [obsolete] types into separate Tx and applied
    a 30 minute timeout to the type registration.  With this pass we now update
    each type in its own Tx and allow up to 30 minutes per type.  This can be
    necessary if updating plugin configurations for a large existing resource
    population.

Comment 3 Jay Shaughnessy 2014-07-25 22:29:20 UTC
The recent changes brought to the surface a few other things, leading to these commits.

master commit 001a3d1ed50713172774c4ae94effd34523aa9ca
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Jul 25 18:19:04 2014 -0400

    Another pass here given some oracle test failures in the CI env.
    - Fix an issue with PropertyDefinitionSimple.removeEnumeratedValues.
      An unexpected problem brought out, I guess, by the Tx reworking,
      must be careful not to replace hibernate proxy dealing with
      orphanRemoval.
    - remove unnecessary REQUIRES_NEW that could lead to locking issues
    - remove some dead code
    - start shortening xxxInNewTransaction to xxxInNewTx, purely for selfish
      reasons.


master commit 0e8b7ed217d5ec91ac1313e5fc87829c330b4ca9
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Jul 25 18:22:44 2014 -0400

    In a recent commit for [1120417] (and the resulting oracle test failures) we
    added protection against Hibernate errors related  to detached sets when
    orphanRemoval=true.  This applies similar changes to entity classes
    outside of the BZ work.