Bug 1559440
Summary: | Hawkular Metrics crashes with OutOfMemoryError under moderate load | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | John Sanda <jsanda> | |
Component: | Hawkular | Assignee: | John Sanda <jsanda> | |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.9.0 | CC: | aos-bugs, jsanda | |
Target Milestone: | --- | |||
Target Release: | 3.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1559443 (view as bug list) | Environment: | ||
Last Closed: | 2018-07-30 19:10:48 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1558677, 1559443, 1559448, 1559450 |
Description
John Sanda
2018-03-22 14:47:31 UTC
@John We don't want to let metrics run for 7 days, that will take too long to verify it, is there one better way to test this defect? And could you share the details about DeleteExpiredMetrics job? Thanks For this bug we altogether removed the job so it no longer executes and is no longer in the code base. There are a few things that can be checked to verify that the removal is complete. 1) Verify that the job is not scheduled $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select * from hawkular_metrics.scheduled_jobs_idx" | grep DELETE_EXPIRED_METRICS No matches should be returned. 2) Verify that the metrics_expiration_idx table has been dropped $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select table_name from system_schema.tables where keyspace_name = 'hawkular_metrics'" | grep metrics_expiration_idx No matches should be returned 3) Verify that the job configuration has been removed from cassandra $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select * from hawkular_metrics.sys_config where config_id = 'org.hawkular.metrics.jobs.DELETE_EXPIRED_METRICS'" This should return an empty result set (In reply to John Sanda from comment #3) > For this bug we altogether removed the job so it no longer executes and is > no longer in the code base. There are a few things that can be checked to > verify that the removal is complete. > > 1) Verify that the job is not scheduled > > $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select * > from hawkular_metrics.scheduled_jobs_idx" | grep DELETE_EXPIRED_METRICS > > No matches should be returned. > > 2) Verify that the metrics_expiration_idx table has been dropped > > $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select > table_name from system_schema.tables where keyspace_name = > 'hawkular_metrics'" | grep metrics_expiration_idx > > No matches should be returned > > 3) Verify that the job configuration has been removed from cassandra > > $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select * > from hawkular_metrics.sys_config where config_id = > 'org.hawkular.metrics.jobs.DELETE_EXPIRED_METRICS'" > > This should return an empty result set From your comments, I think these steps are enough to verify this defect, and we don't need the metrics run for a few days to verify it, since DeleteExpiredMetrics is already dropped from code. Am I right? (In reply to Junqi Zhao from comment #4) > (In reply to John Sanda from comment #3) > > For this bug we altogether removed the job so it no longer executes and is > > no longer in the code base. There are a few things that can be checked to > > verify that the removal is complete. > > > > 1) Verify that the job is not scheduled > > > > $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select * > > from hawkular_metrics.scheduled_jobs_idx" | grep DELETE_EXPIRED_METRICS > > > > No matches should be returned. > > > > 2) Verify that the metrics_expiration_idx table has been dropped > > > > $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select > > table_name from system_schema.tables where keyspace_name = > > 'hawkular_metrics'" | grep metrics_expiration_idx > > > > No matches should be returned > > > > 3) Verify that the job configuration has been removed from cassandra > > > > $ oc -n openshift-infra exec <any_cassandra_pod> -- cqlsh --ssl -e "select * > > from hawkular_metrics.sys_config where config_id = > > 'org.hawkular.metrics.jobs.DELETE_EXPIRED_METRICS'" > > > > This should return an empty result set > > From your comments, I think these steps are enough to verify this defect, > and we don't need the metrics run for a few days to verify it, since > DeleteExpiredMetrics is already dropped from code. Am I right? Yes, that is correct. Verification steps please see Comment 3, DeleteExpiredMetrics job is already dropped from code metrics-cassandra/images/v3.10.0-0.47.0.0 metrics-schema-installer/images/v3.10.0-0.47.0.0 metrics-hawkular-metrics/images/v3.10.0-0.47.0.0 metrics-hawkular-metrics/images/v3.10.0-0.47.0.0 # openshift version openshift v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 etcd 3.2.16 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |