Description of problem: The JBoss ON server will accumulate 1000s of rows in RHQ_CONFIG and RHQ_CONFIG_PROPERTIES which do not seem to be referenced by a resource or other dependent object. This results in poor database performance and system degradation over time. Version-Release number of selected component (if applicable): 3.1, 3.2, 3.3 How reproducible: Always, over time. Steps to Reproduce: 1. Install, configuration, and start a JBoss ON system. 2. Allow the system to run for a few hours/days/weeks/months Actual results: Over time, the RHQ_CONFIG table grows in size and the ID column of some of the rows are no longer referenced. Some of these RHQ_CONFIG entries do not have corresponding RHQ_CONFIG_PROPERTY entries. Expected results: All RHQ_CONFIG rows should be referenced by one or more resources, operations, or other JBoss ON components/subsystems. Additional info: The following SQL query can be used to determine that the RHQ_CONFIG row is not referenced: SELECT RHQ_CONFIG.ID FROM RHQ_CONFIG LEFT JOIN RHQ_CONFIG_PROPERTY ON RHQ_CONFIG.ID = RHQ_CONFIG_PROPERTY.CONFIGURATION_ID WHERE NOT EXISTS ( SELECT RHQ_ALERT_NOTIFICATION.SENDER_CONFIG_ID FROM RHQ_ALERT_NOTIFICATION WHERE RHQ_ALERT_NOTIFICATION.SENDER_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_ALERT_NOTIFICATION.EXTRA_CONFIG_ID FROM RHQ_ALERT_NOTIFICATION WHERE RHQ_ALERT_NOTIFICATION.EXTRA_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_BUNDLE_DEPLOYMENT.CONFIG_ID FROM RHQ_BUNDLE_DEPLOYMENT WHERE RHQ_BUNDLE_DEPLOYMENT.CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_CONFIG_GROUP_UPDATE.CONFIGURATION_ID FROM RHQ_CONFIG_GROUP_UPDATE WHERE RHQ_CONFIG_GROUP_UPDATE.CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_CONFIG_TEMPLATE.CONFIG_ID FROM RHQ_CONFIG_TEMPLATE WHERE RHQ_CONFIG_TEMPLATE.CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_CONFIG_UPDATE.CONFIGURATION_ID FROM RHQ_CONFIG_UPDATE WHERE RHQ_CONFIG_UPDATE.CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_CONTENT_SOURCE.CONFIGURATION_ID FROM RHQ_CONTENT_SOURCE WHERE RHQ_CONTENT_SOURCE.CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_CREATE_RES_HIST.CONFIGURATION_ID FROM RHQ_CREATE_RES_HIST WHERE RHQ_CREATE_RES_HIST.CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_DASHBOARD.CONFIGURATION_ID FROM RHQ_DASHBOARD WHERE RHQ_DASHBOARD.CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_DASHBOARD_PORTLET.CONFIGURATION_ID FROM RHQ_DASHBOARD_PORTLET WHERE RHQ_DASHBOARD_PORTLET.CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_DRIFT_DEFINITION.CONFIG_ID FROM RHQ_DRIFT_DEFINITION WHERE RHQ_DRIFT_DEFINITION.CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_DRIFT_DEF_TEMPLATE.CONFIG_ID FROM RHQ_DRIFT_DEF_TEMPLATE WHERE RHQ_DRIFT_DEF_TEMPLATE.CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_INSTALLED_PKG_HIST.DEPLOYMENT_CONFIG_ID FROM RHQ_INSTALLED_PKG_HIST WHERE RHQ_INSTALLED_PKG_HIST.DEPLOYMENT_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_OPERATION_HISTORY.PARAMETERS_CONFIG_ID FROM RHQ_OPERATION_HISTORY WHERE RHQ_OPERATION_HISTORY.PARAMETERS_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_OPERATION_HISTORY.RESULTS_CONFIG_ID FROM RHQ_OPERATION_HISTORY WHERE RHQ_OPERATION_HISTORY.RESULTS_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_PACKAGE_VERSION.CONFIG_ID FROM RHQ_PACKAGE_VERSION WHERE RHQ_PACKAGE_VERSION.CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_PLUGIN.PLUGIN_CONFIG_ID FROM RHQ_PLUGIN WHERE RHQ_PLUGIN.PLUGIN_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_PLUGIN.JOBS_CONFIG_ID FROM RHQ_PLUGIN WHERE RHQ_PLUGIN.JOBS_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_RAW_CONFIG.CONFIG_ID FROM RHQ_RAW_CONFIG WHERE RHQ_RAW_CONFIG.CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_RESOURCE_TYPE.BUNDLE_CONFIG_ID FROM RHQ_RESOURCE_TYPE WHERE RHQ_RESOURCE_TYPE.BUNDLE_CONFIG_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_RESOURCE.RES_CONFIGURATION_ID FROM RHQ_RESOURCE WHERE RHQ_RESOURCE.RES_CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_RESOURCE.PLUGIN_CONFIGURATION_ID FROM RHQ_RESOURCE WHERE RHQ_RESOURCE.PLUGIN_CONFIGURATION_ID = RHQ_CONFIG.ID ) AND NOT EXISTS ( SELECT RHQ_SUBJECT.CONFIGURATION_ID FROM RHQ_SUBJECT WHERE RHQ_SUBJECT.CONFIGURATION_ID = RHQ_CONFIG.ID ); The rows which include a NULL RHQ_CONFIG_PROPERTY.ID value have no configuration properties. The ones that doe have configuration properties seem to be drift related. Perhaps drift or resource importing is creating these unused configurations and properties? In testing the counts are in the 10s or 100s of rows. But in production these counts are in the millions or hundreds of millions of rows.
This may be the same issue as upstream 534320. Added to See Also as I am not 100% confident they are the same at this time.
I spent too much time on this but I've created a PR: https://github.com/rhq-project/rhq/pull/257 This has a few different fixes in it, as described in the PR/commit comment. I'm not sure it solves the PR but it does stop some leaking config in the Drift area, and perhaps some other entity orphaning. Tagging Larry to see if how he may want to proceed...
@Jay, this does take care of the bulk of the issue. These drift property entries add up. Not sure what the empty RHQ_CONFIG rows are from though. Is there a reason that could or should happen? When I say "empty RHQ_CONFIG rows" what I am referring to are entries in RHQ_CONFIG which do not have any corresponding entries in RHQ_CONFIG_PROPERTY and are not referenced by any key relationships.
@Larry, I don't have an answer for lone rhq_config rows. An empty config is certainly possible to create, and apparently it happens. The commit includes a few fixes outside of drift, I have no idea what impact those may have. Also, I can say that a full run of server i-tests does still leave some config in the DB (not drift related). I don't know whether those are a problem or just not cleaned up well in the tests. Given the difficulty in tracking down the root cause, I'm not planning on pursuing this at this time.
I think we should proceed with the fixes you propose. This of course assumes that my understanding of the changes are correct and the risk is relatively low.
I think the risk is low, if the i-tests all pass I think we're good. I didn't see any relevant failures in my run. Of course this does not address the existing rows for customer databases.
PR 257 got merged some time ago. I'm leaving the commits for reference. commit 835cca57581b08e6d2e2f9ff36b9eb8b95071cc4 Merge: 8ccd6ac 8ae363d Author: Michael Burman <yak> Date: Tue May 31 16:27:09 2016 +0300 Merge pull request #257 from jshaughn/bz1323332 Bug 1323332 - Orphaned configuration left in database commit 8ae363d095c697a7521fb9ba17a6432ff34d4532 Author: jshaughn <jshaughn.home> Date: Wed May 25 12:05:57 2016 -0400 Bug 1323332 - Orphaned configuration left in database We knew from observation that drift-related configs were being orphaned in the DB. Running some i-tests showed backing configs for drift definitions being orphaned. This addresses a few different causes: - improper use of EntityManager.getReference() - This method is basically an optimized version of EntityManager.find(), with the difference being that it is not required to go to the DB, and instead basically wraps the given ID in a proxy for the desired class. It should generally not be used for reasons outside of using the proxy to set FK references in other entities, or to explicitly call setters on the proxy in some circumstances. It should not be used for an existence check, a way to perform subsequent gets on the proxy, or passed to EntityManager.remove(). We were violating all of these rules in various places. The calls to EM.remove were the cause of some orphaned config because the getReference proxy did not provide the proxy necessary for EM.remove to provide the cascade delete. - improper update of a DriftDefinition or DriftDefinitionTemplate - When updating these entities we were not removing the old backing config, or backing drift definition for a template, when replacing it with the new one. - improper test cleanup - this would not affect production. We were directly deleting DriftDefinition and DriftDefinitionTemplate, so the cascade delete was not happening. By changing the cleanup code to use EM.remove() the configs are now cleaned up properly. Note that this commit does not guarantee to solve this BZ entirely or even partially, but it is guaranteed to stop some orphan issues, as proven in testing. Also, it includes analogous changes outside of the drift domain.
Setting to POST as this is in the master
Moving to ON_QA as available for test with build: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=534002 http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/110/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip maps to ER01 build of JON 3.3.8.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0285.html