Bug 959593
Summary: | Alert history and recent alerts views are unavailable and timeout when a large number of alerts exist | ||
---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Larry O'Leary <loleary> |
Component: | UI | Assignee: | Larry O'Leary <loleary> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | JON 3.1.2 | CC: | hrupp, jshaughn, loleary, mazz, myarboro |
Target Milestone: | ER03 | Keywords: | TestCaseNeeded |
Target Release: | JON 3.2.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 720826 | Environment: | |
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 720826 | ||
Bug Blocks: |
Description
Larry O'Leary
2013-05-03 22:11:25 UTC
I have a feeling this may be fundamentally due to the problem in Bug 620603, recently fixed by Lukas. That problem basically was responsible for loading all rows into memory in order to perform a sort. As for indexing. There should already be an index on ctime. *** If a db is lacking this index then there must be a problem with upgrade, because it exists for newly created dbs. *** This index really MUST exist for perf. It is used for the recent alerts portlet, the alerts report and I believe for the resource/group alert views; all of which are sorted ctime desc by default. There is also an index of alert_definition_id, although I somewhat question this one due to the fairly low cardinality of values. But it does support the FK to the resource, so perhaps it is useful even if bloated. It would require testing. ID is the primary key, and is therefore indexed [unique] by default. Indexes generate a lot of overhead on insert/update/delete so over-indexing can be a problem as well. I would not recommend further indexing without perf proof of the necessity. But I would make sure that we have the expected indexes in place for upgraded dbs. As a follow-up, if we suspect that some dbs may be lacking the proper indexing we could add logic to recreate the indexes in the next db-upgrade. Larry, is this something you think is needed? I only raised the issue about CTIME because I didn't see the index when I reviewed the table metadata for RHQ_ALERT. I have taken another look and now see it as RHQ_ALERT_IDX_TIME. Not sure why I didn't see this the first time. (In reply to Jay Shaughnessy from comment #3) > As a follow-up, if we suspect that some dbs may be lacking the proper > indexing we could add logic to recreate the indexes in the next db-upgrade. > > Larry, is this something you think is needed? No. Everything looks intact. As for indexing the other columns, it appears that the improvement seen by my test is actually 3 seconds regardless of the total query execution time without the index. I can only assume that the improvement is due to index caching being done by the database itself and does not provide a significant performance boost to warrant any changes. I completely agree with your comment 1 and think that the lack of a LIMIT is the issue here. I will re-test with a alpha build of 3.2 to see if the work done in bug 620603 resolves this. Moving this to ON_QA so that a test case can be captured. I have retested this against 3.2.0.ER3 and the fix for bug 620603 fixes this issue. The test case should be captured so that this is part of automated testing or UI testing to ensure that a very large number of alerts can be handled/rendered in the UI. Here are the steps I used to test in 3.2: 1. Configure JBoss ON server with 2GB max heap. 2. Start JBoss ON system. 3. Import RHQ Agent resource into inventory. 4. Create an alert definition that will get fired off 200,000 times. To make this testable without waiting for 200,000 alert conditions to occur, the following can be done to simulate 200,000 alerts: The following Linux shell command will produce two SQL files. One named alertDef.sql which contains the alert definition for the platform resource (id 10001) and alerts.sql which will contain <_numOfAlerts> alerts based on the alertDef. function printLogTimestamp_ms() { timestamp=$(date +"%s|%N") ms=$((${timestamp#*|}/1000000 )) timestamp_ms=$(( (${timestamp%|*} * 1000) + ${ms})) echo -n "${timestamp_ms}" } _resourceId=10001 _alertTime=$(( $(printLogTimestamp_ms)-28800000)) _alertCondTime=$(( ${_alertTime}-600000 )) _alertId=10100 _alertDefId=11123 _alertCondId=32111 _numOfAlerts=200000 echo "INSERT INTO rhq_alert_definition (id, name, ctime, mtime, priority, resource_id, enabled, required, recovery_id, will_recover, notify_filtered, control_filtered, deleted, read_only, dampening_category, dampening_value, dampening_period) VALUES (${_alertDefId}, 'Test Alert Def 01', 1367353530092, 1367353530092, 'MEDIUM', ${_resourceId}, true, 0, 0, false, false, false, false, false, 0, 0, 0);" >alertDef.sql echo "INSERT INTO rhq_alert_condition (id, type, name, option_status, alert_definition_id) VALUES (${_alertCondId}, 'CONTROL', 'viewProcessList', 'SUCCESS', ${_alertDefId});" >>alertDef.sql echo "COPY rhq_alert (id, alert_definition_id, ctime, recovery_id, will_recover, ack_time, ack_subject) FROM stdin;" >alertsTmp.sql echo "COPY rhq_alert_condition_log (id, ctime, alert_id, condition_id, value) FROM stdin;" >alert_conditionsTmp.sql for (( i=1; i<=_numOfAlerts; i++ )); do echo "${_alertId}"$'\t'"${_alertDefId}"$'\t'"${_alertTime}"$'\t'"0"$'\t'"f"$'\t'"-1"$'\t'"\N" >>alertsTmp.sql echo "${_alertId}"$'\t'"${_alertCondTime}"$'\t'"${_alertId}"$'\t'"${_alertCondId}"$'\t'"Success" >>alert_conditionsTmp.sql (( _alertTime += 1000 )) (( _alertCondTime += 1000 )) (( _alertId++ )) done echo "\." >>alertsTmp.sql echo "" >>alertsTmp.sql echo "\." >>alert_conditionsTmp.sql echo "" >>alert_conditionsTmp.sql cat alertsTmp.sql alert_conditionsTmp.sql >alerts.sql rm alertsTmp.sql alert_conditionsTmp.sql The two files can be imported using psql: psql -d rhq -f alertDef.sql psql -d rhq -f alerts.sql 5. Login to the JBoss ON UI. 6. From the Dashboard page that contains the recent alerts portlet, perform a page refresh 5 times. 7. Navigate to the alerts history page for the platform resource. 8. Verify total rows is 200000. 9. Perform page refresh 5 times. In the end, the goal is to get at least 200,000 alerts to be displayed in the UI over several page reloads (within a 30 second time span) to confirm that such actions won't result in an out-of-memory condition to occur on the server or the UI to fail with a timeout. |