Bug 1230411
Summary: | Storage node results in 1000s of configuration changes filling up the database | ||
---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | dsteigne |
Component: | Plugin -- Other, Storage Node | Assignee: | Libor Zoubek <lzoubek> |
Status: | CLOSED ERRATA | QA Contact: | Sunil Kondkar <skondkar> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | JON 3.3.2 | CC: | fbrychta, loleary, lzoubek, skondkar, spinder, theute |
Target Milestone: | ER02 | Keywords: | Triaged |
Target Release: | JON 3.3.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-10-28 14:36:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
dsteigne
2015-06-10 20:29:38 UTC
To be clear, the issue here is the configuration properties that exist in the storage node plug-in. For example, for whatever reason, we include the tokens assigned to the node as a configuration property. This property changes constantly. This results in configuration change detection which results in hundreds of rows of data being added to the RHQ_CONFIG_PROPERTY table. It is not clear why this is a configuration property. It is read-only. Therefore, it is not a configuration option. Perhaps this should be a trait. There may be other properties that cause this behavior too. From the looks of it, most of the configuration is read-only. Therefore, it is not configuration. After having the customer look at there "Recent Operations" they were running snapshots of their 3 storage nodes every hour. This is what was causing some many rows in the rhq_config_property table To be clear, comment 2 only highlights one way of triggering this bug. In this case, the hourly snapshot was causing a flush which in turn resulted in the configuration changes. The run state of the node should not alter the configuration. Configuration are things that can be changed/tweaked or "configured" while tokens, state, status, number of loaded classes, etc. are all "states" that can be either treated as a numeric measurement or a identifiable trait. I tried to reproduce this issue and I agree with Larry, that we must move some configuration properties to traits I identified, that suspect resource is type Storage Service Config Properties: Load Map, Ownership, Token to Endpoint Map. Load Map - a Map<storage hosts, amount of stored data>. Changes *all the time** because we're still storing new metrics over time - I'll remove this config property and create "Load" trait which will only output Load value for current host Ownership - usually changes after snapshot/repair Token to Endpoint Map - changes after snapshot/repair - not sure what is this good for, maybe it can be removed. I don't see a good way of representig this as a trait metric. Also MessagingService contains lots of read-only config properties (usually couters) I'll redo those config properties to metrics. (In reply to Libor Zoubek from comment #4) > Ownership - usually changes after snapshot/repair > Token to Endpoint Map - changes after snapshot/repair - not sure what is > this good for, maybe it can be removed. I don't see a good way of > representig this as a trait metric. I am not sure what this data is good for either. Perhaps John S can comment? Moving this into operations could be a better solution then a trait. Of course that is assuming that the return data will not exceed 2000 characters. 2000 char limit applies only on simple operation result (where we store everthing into 1 text field), result of this operation would be a list of small maps, so I think we're ok here. branch: master link: https://github.com/rhq-project/rhq/commit/6ad035892 time: 2015-09-30 15:39:47 +0200 commit: 6ad035892c83059494469072415259e180df1ecc author: Libor Zoubek - lzoubek message: Bug 1230411 - Storage node results in 1000s of configuration changes filling up the database Several Configuration properties were transformed to metrics or operations. StorageService resourceType: LoadMap and Ownership config properties were removed (metrics already existed) we're "loosing" information about other nodes (which can be obtained on those node resources). TokenToEndpointMap property was removed and transformed to operation called "View Token To Map Ednpoint" MessagingService resourceType:DroppedMessages and RecentlyDroppedMessages transformed to operations, all other config properties transformed to metrics (no properties left) branch: release/jon3.3.x link: https://github.com/rhq-project/rhq/commit/59be2e606 time: 2015-09-30 19:33:16 +0200 commit: 59be2e6068a17799fb430332a2285222ff83bd36 author: Libor Zoubek - lzoubek message: Bug 1230411 - Storage node results in 1000s of configuration changes filling up the database Several Configuration properties were transformed to metrics or operations. StorageService resourceType: LoadMap and Ownership config properties were removed (metrics already existed) we're "loosing" information about other nodes (which can be obtained on those node resources). TokenToEndpointMap property was removed and transformed to operation called "View Token To Map Ednpoint" MessagingService resourceType:DroppedMessages and RecentlyDroppedMessages transformed to operations, all other config properties transformed to metrics (no properties left) (cherry picked from commit 6ad035892c83059494469072415259e180df1ecc) Signed-off-by: Libor Zoubek <lzoubek> Moving to ON_QA as available to test with the following build: https://brewweb.devel.redhat.com/buildinfo?buildID=460382 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of jon-server-3.3.0.GA-update-04.zip. Moving target milestone to ER02 to retest after latest Cassandra changes. Moving to ON_QA as available to test with the following build: https://brewweb.devel.redhat.com//buildinfo?buildID=461043 *Note: jon-server-patch-3.3.0.GA.zip maps to ER02 build of jon-server-3.3.0.GA-update-04.zip. Tested applying patch to JBoss ON 3.3GA before installation and after installation. 1) When the patch is applied in advance: Storage Node->Database Management Services->Storage Services does not have the configs - LoadMap and 'Ownership' and 'Token To Endpoint Map, . Verified that the 'Token To Endpoint Map is now an operation "View Token To Endpoint Map". Storage Node->Network Services->Messaging service does not have any configurations. The previous configs 'Command Pending Tasks', 'Command Completed Tasks', 'Response Pending Tasks', 'Response Completed Tasks', 'Timeouts Per Host', 'Recent Timeouts Per Host' are now listed as metrics and the previous configs 'Dropped Messages, and 'Recently Dropped Messages' are now operations "List Dropped Messages" and "List recently Dropped Messages". Verified that below operations are working: View Token To Endpoint Map List Dropped Messages List recently Dropped Messages The operation 'Take Snapshot' results an increase of around 10 rows of rhq_config_property table. --------------------------------------------------------------------------------- 2) Applying patch to already installed JBoss ON 3.3GA: Tested applying patch to already installed JBoss ON 3.3GA, I am facing the same result as explained in Bug# 1272528 The storage node availability is down in UI. Steps to reproduce: - Install and start JBoss ON 3.3GA - Stop with rhqctl stop - apply the ER02 patch (3.3.0.GA Update 04) - Merge the 'cassandra-jvm.properties.new' with 'cassandra-jvm.properties'. cp ./jon-server-3.3.0.GA/rhq-storage/conf/cassandra-jvm.properties.new ./jon-server-3.3.0.GA/rhq-storage/conf/cassandra-jvm.properties - rhqctl start - The storage node availability is down in the UI. Marking as assigned for storage node availability issue when tested with applying patch to already installed JBoss ON 3.3GA. Thanks Simeon..it works after doing the operation: Update all Plugins at agent or 'Update Plugins on Agents' in administration page->agent plugins. The storage node is up and the test is pass. Did some more testing as below: 1) Patch applied on Jboss 3.3GA which is installed but not started. - ./rhqctl install - Apply 3.3.0.GA Update 04 - Replace the JMX_OPTS value in cassandra-jvm.properties to JMX_OPTS="-Dcassandra.jmx.local.port=${jmx_port}" ( The value of JMX_OPTS in cassandra-jvm.properties.new is : JMX_OPTS="-Dcassandra.jmx.local.port=${jmx_port}" ) - ./rhqctl start The storage node is up and the test is pass. 2) Patch applied on Jboss 3.3GA which is installed, started and then stopped - ./rhqctl install - ./rhqctl start - Import the resources in inventory - ./rhqctl stop - Apply 3.3.0.GA Update 04 - Replace the JMX_OPTS value in cassandra-jvm.properties to JMX_OPTS="-Dcassandra.jmx.local.port=${jmx_port}" ( The value of JMX_OPTS in cassandra-jvm.properties.new is : JMX_OPTS="-Dcassandra.jmx.local.port=${jmx_port}" ) - ./rhqctl start - The storage node availability was down - Execute operation Update all Plugins at agent/ OR 'Update Plugins on Agents' in administration page->agent plugins. The storage node is up and the test is pass. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1947.html |