Bug 1465825
Summary: | Dashboard: can't see utilization squares (for cluster: CPU, memory and storage) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine-dwh | Reporter: | Yaniv Kaul <ykaul> | ||||||
Component: | ETL | Assignee: | Shirly Radco <sradco> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavel Novotny <pnovotny> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.1.0 | CC: | bugs, cjg9411, lsvaty, lveyde, mgoldboi, oourfali, sradco, ykaul, ylavi | ||||||
Target Milestone: | ovirt-4.1.7 | Flags: | rule-engine:
ovirt-4.1+
mgoldboi: planning_ack+ lsvaty: testing_ack+ |
||||||
Target Release: | 4.1.8 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ovirt-engine-dwh-4.1.8 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause:
lastAggHour value was set incorrectly.
Consequence:
The hourly aggregation run on hours that were not collected yet and that resulted in empty hours.
Also daily aggregations were affected by the hourly aggregations and aggregated empty hours.
Fix:
Added a fix the prevents hourly aggregations if the lastHourAgg is not valid and the hourly aggregation will wait for an hour until it tries to aggregate again on the same hour.
Result:
Hourly aggregation are run on the required time period and if the lastHourAgg is for some reason invalid, the hourly aggregation will fail with a warning.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-11-13 12:28:13 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Metrics | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1503090 | ||||||||
Attachments: |
|
Also, there is no status information for the clusters on the status cards. The different statuses should all sum to the total count of objects in the card's title. The dashboard_data JSON pull does not include status info for clusters, and the heatMapData is empty. Looks like the problem in is the DashboardDataServlet/DB tier. My mistake on the N/A on the cluster status card - that is expected. The heat maps are not being populated because no data is being sent from the server. No errors are happening in the SQL queries used for the heatmap data. I looked on the TLV server and this is in the dwhd log (/var/log/ovirt-engine-dwh/virt-engine-dwhd.log): 2017-06-29 05:51:15|Mu5B5g|QaRG6l|mHAkli|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 The message repeats just about every minute from the beginning of the log file on 2017-06-11. I lack additional credentials to triage the RHEV.TLV setup further. Hourly aggregation stopped at 2017-06-13 18:00:00+03. I see that dwh was restarted at that time. The log errors are indication that there are issues with engine db connection, The heartbeat does not update every 15 seconds as required. I restarted the service and will check if hourly job starts aggregating again. Moving to metrics (no more DWH oVirt team?) and assigning to Shirly. Shirly, please decide if to close, or there is an issue requires investigating. Is that on track to 4.1.5? (In reply to Yaniv Kaul from comment #5) > Is that on track to 4.1.5? Is that on track for 4.1.6? (In reply to Yaniv Kaul from comment #6) > (In reply to Yaniv Kaul from comment #5) > > Is that on track to 4.1.5? > > Is that on track for 4.1.6? Ping? (In reply to Yaniv Kaul from comment #7) > (In reply to Yaniv Kaul from comment #6) > > (In reply to Yaniv Kaul from comment #5) > > > Is that on track to 4.1.5? > > > > Is that on track for 4.1.6? > > Ping? Moved to 4.1.7... (In reply to Yaniv Kaul from comment #8) > (In reply to Yaniv Kaul from comment #7) > > (In reply to Yaniv Kaul from comment #6) > > > (In reply to Yaniv Kaul from comment #5) > > > > Is that on track to 4.1.5? > > > > > > Is that on track for 4.1.6? > > > > Ping? > > Moved to 4.1.7... This bug is repeated once in a while for customers. The lastHourAgg timestamp is set to a an hour that is a few hours the current time. This causes the dwh to try and aggregate hours that dont yet have samples and daily aggregation to aggregate on empty hour. I could not locate the issue. I know that in rhev-tlv there was a power outage that caused this. I tried to reproduce but could not. I can try to create a workaround for it by comparing the timestamp we plan to update to current time before updating the db and not update if it is not before the current hour. Please let me know if this is acceptable. (In reply to Shirly Radco from comment #9) > (In reply to Yaniv Kaul from comment #8) > > (In reply to Yaniv Kaul from comment #7) > > > (In reply to Yaniv Kaul from comment #6) > > > > (In reply to Yaniv Kaul from comment #5) > > > > > Is that on track to 4.1.5? > > > > > > > > Is that on track for 4.1.6? > > > > > > Ping? > > > > Moved to 4.1.7... > > This bug is repeated once in a while for customers. > The lastHourAgg timestamp is set to a an hour that is a few hours the > current time. > This causes the dwh to try and aggregate hours that dont yet have samples > and daily aggregation to aggregate on empty hour. > > I could not locate the issue. > > I know that in rhev-tlv there was a power outage that caused this. I tried > to reproduce but could not. > > I can try to create a workaround for it by comparing the timestamp we plan > to update to current time before updating the db and not update if it is not > before the current hour. > > Please let me know if this is acceptable. Yes, unless it causes a major performance issue. Verified in ovirt-engine-4.1.7.4-0.1.el7.noarch ovirt-engine-dwh-4.1.8-1.el7ev.noarch I used a freshly installed engine with a host, storage and a VM. After an hour, the utilization heatmap blocks appeared. See the attached screenshot. Created attachment 1343241 [details]
screen: utilization heatmap blocks visible
|
Created attachment 1292606 [details] Firefox Description of problem: On my F25 laptop, with Wayland or Gnome Classic, with Firefox (54.0 (64-bit)) or Chrome (60.0.3112.40 (Official Build) beta (64-bit) ) I can't see the bottom squares for utilization. Version-Release number of selected component (if applicable): 4.1.2.1-0.1.el7 How reproducible: Always, I'm testing against internal RHEV.TLV setup.