Bug 1571045
| Summary: | Large number of data_temp tables cause request timeouts and other performance problems | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | John Sanda <jsanda> | ||||
| Component: | Hawkular | Assignee: | John Sanda <jsanda> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.7.0 | CC: | aos-bugs, jmalde, juzhao | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.7.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1570981 | Environment: | |||||
| Last Closed: | 2018-06-07 08:40:56 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1570981 | ||||||
| Bug Blocks: | 1570982 | ||||||
| Attachments: |
|
||||||
|
Description
John Sanda
2018-04-24 02:18:37 UTC
The number temp file is growing after running metrics for one day, not between 12-14, it is 22 now
Attached metrics logs, but the logs are short, can see the full logs
# oc -n openshift-infra exec hawkular-cassandra-1-b8cww -- ls -alR /cassandra_data/ | grep -i temp
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 09:00 data_temp_2018052806-80073da0623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 11:00 data_temp_2018052808-7fd99df0623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 13:00 data_temp_2018052810-80352b70623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 15:00 data_temp_2018052812-c946eb50623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 17:00 data_temp_2018052814-c8bd9710623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 19:00 data_temp_2018052816-c83469e0623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 21:00 data_temp_2018052818-c8eb84e0623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 23:00 data_temp_2018052820-c9a2c6f0623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 29 01:00 data_temp_2018052822-c974d920623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 258 May 29 01:00 data_temp_2018052900-c9192490623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 06:30 data_temp_2018052902-c88fd050623f11e8a7c0db96bf8c5620
drwxr-sr-x. 4 1000040000 1000040000 38 May 28 06:30 data_temp_2018052904-c86230a0623f11e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 08:24 data_temp_2018052906-8bbac8e0625011e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 10:24 data_temp_2018052908-4f443520626111e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 12:24 data_temp_2018052910-12ce1690627211e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 14:24 data_temp_2018052912-d65538e0628211e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 16:24 data_temp_2018052914-99de2ff0629311e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 18:24 data_temp_2018052916-5d6663b062a411e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 20:24 data_temp_2018052918-20efa8e062b511e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 28 22:24 data_temp_2018052920-e47a74b062c511e8a7c0db96bf8c5620
drwxr-sr-x. 3 1000040000 1000040000 21 May 29 00:24 data_temp_2018052922-a800fac062d611e8a7c0db96bf8c5620
# oc -n openshift-infra exec hawkular-cassandra-1-b8cww -- du -sh /cassandra_data/data/hawkular_metrics
7.5M /cassandra_data/data/hawkular_metrics
# oc -n openshift-infra exec hawkular-cassandra-1-b8cww -- cqlsh --ssl -e "select * from hawkular_metrics.scheduled_jobs_idx"
time_slice | job_id | job_name | job_params | job_type | status | trigger
--------------------------+--------------------------------------+----------------------+------------+----------------------+--------+--------------------------------------------------------------------------------------------------------------------
2018-05-29 02:24:00+0000 | 9debdbfc-773b-43b1-b271-743311a1d083 | TEMP_TABLE_CREATOR | {} | TEMP_TABLE_CREATOR | null | {type: 1, trigger_time: 1527560640000, delay: 60000, interval: 7200000, repeat_count: null, execution_count: null}
2018-05-29 03:00:00+0000 | 4f69448b-b3a4-49e7-a77b-5261cd108fba | TEMP_DATA_COMPRESSOR | {} | TEMP_DATA_COMPRESSOR | null | {type: 1, trigger_time: 1527562800000, delay: 60000, interval: 7200000, repeat_count: null, execution_count: null}
(2 rows)
# oc -n openshift-infra exec hawkular-cassandra-1-b8cww -- cqlsh --ssl -e "select * from hawkular_metrics.finished_jobs_idx"
time_slice | job_id
------------+--------
(0 rows)
metrics version is v3.7.51-1
Created attachment 1445007 [details]
metrics pods log
Junqi, When Cassandra drop a table the corresponding directory on disk does not get deleted. By default Cassandra creates a snapshot of the table before dropping it. We have set the auto_snapshot property in cassandra.yaml (in cassandra pod) to false so snapshots should not be generated by default. You will need to run the following query to determine the number of tables: $ oc -n openshift-infra exec <cassandra pod> -- cqlsh --ssl -e "select table_name from system_schema.tables where keyspace_name = 'hawkular_metrics'" My apologies for not pointing this out sooner. There are not two many data_temp tables after metrics running for one day,
metrics version is v3.7.51-1
# oc -n openshift-infra exec hawkular-cassandra-1-8znt6 -- cqlsh --ssl -e "select table_name from system_schema.tables where keyspace_name = 'hawkular_metrics'"
table_name
----------------------
active_time_slices
cassalog
data
data_0
data_compressed
data_temp_2018053000
data_temp_2018053002
data_temp_2018053004
data_temp_2018053006
data_temp_2018053008
data_temp_2018053010
data_temp_2018053012
data_temp_2018053014
data_temp_2018053016
data_temp_2018053018
data_temp_2018053020
finished_jobs_idx
jobs
leases
locks
metrics_idx
metrics_tags_idx
retentions_idx
scheduled_jobs_idx
sys_config
tasks
tenants
tenants_by_time
(28 rows)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1798 |