Bug 1465825 - Dashboard: can't see utilization squares (for cluster: CPU, memory and storage)
Summary: Dashboard: can't see utilization squares (for cluster: CPU, memory and storage)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine-dwh
Classification: oVirt
Component: ETL
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.1.7
: 4.1.8
Assignee: Shirly Radco
QA Contact: Pavel Novotny
URL:
Whiteboard:
Depends On:
Blocks: 1503090
TreeView+ depends on / blocked
 
Reported: 2017-06-28 09:55 UTC by Yaniv Kaul
Modified: 2017-11-13 12:28 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-dwh-4.1.8
Doc Type: Bug Fix
Doc Text:
Cause: lastAggHour value was set incorrectly. Consequence: The hourly aggregation run on hours that were not collected yet and that resulted in empty hours. Also daily aggregations were affected by the hourly aggregations and aggregated empty hours. Fix: Added a fix the prevents hourly aggregations if the lastHourAgg is not valid and the hourly aggregation will wait for an hour until it tries to aggregate again on the same hour. Result: Hourly aggregation are run on the required time period and if the lastHourAgg is for some reason invalid, the hourly aggregation will fail with a warning.
Clone Of:
Environment:
Last Closed: 2017-11-13 12:28:13 UTC
oVirt Team: Metrics
Embargoed:
rule-engine: ovirt-4.1+
mgoldboi: planning_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)
Firefox (164.12 KB, image/png)
2017-06-28 09:55 UTC, Yaniv Kaul
no flags Details
screen: utilization heatmap blocks visible (117.90 KB, image/png)
2017-10-25 13:09 UTC, Pavel Novotny
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 82321 0 master MERGED history: validate lastHourAgg value 2021-01-29 03:02:12 UTC
oVirt gerrit 82416 0 ovirt-engine-dwh-4.1 MERGED history: validate lastHourAgg value 2021-01-29 03:02:12 UTC

Description Yaniv Kaul 2017-06-28 09:55:40 UTC
Created attachment 1292606 [details]
Firefox

Description of problem:
On my F25 laptop, with Wayland or Gnome Classic, with Firefox (54.0 (64-bit)) or  Chrome (60.0.3112.40 (Official Build) beta (64-bit) ) I can't see the bottom squares for utilization.


Version-Release number of selected component (if applicable):
4.1.2.1-0.1.el7

How reproducible:
Always, I'm testing against internal RHEV.TLV setup.

Comment 1 Scott Dickerson 2017-06-28 16:11:50 UTC
Also, there is no status information for the clusters on the status cards.  The different statuses should all sum to the total count of objects in the card's title.

The dashboard_data JSON pull does not include status info for clusters, and the heatMapData is empty.  Looks like the problem in is the DashboardDataServlet/DB tier.

Comment 2 Scott Dickerson 2017-06-29 03:00:22 UTC
My mistake on the N/A on the cluster status card - that is expected.

The heat maps are not being populated because no data is being sent from the server.  No errors are happening in the SQL queries used for the heatmap data.  I looked on the TLV server and this is in the dwhd log (/var/log/ovirt-engine-dwh/virt-engine-dwhd.log): 

2017-06-29 05:51:15|Mu5B5g|QaRG6l|mHAkli|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704

The message repeats just about every minute from the beginning of the log file on 2017-06-11.

I lack additional credentials to triage the RHEV.TLV setup further.

Comment 3 Shirly Radco 2017-06-29 06:40:01 UTC
Hourly aggregation stopped at 2017-06-13 18:00:00+03.
I see that dwh was restarted at that time.

The log errors are indication that there are issues with engine db connection, 
The heartbeat does not update every 15 seconds as required.

I restarted the service and will check if hourly job starts aggregating again.

Comment 4 Oved Ourfali 2017-06-29 12:13:29 UTC
Moving to metrics (no more DWH oVirt team?) and assigning to Shirly.
Shirly, please decide if to close, or there is an issue requires investigating.

Comment 5 Yaniv Kaul 2017-08-06 07:09:35 UTC
Is that on track to 4.1.5?

Comment 6 Yaniv Kaul 2017-09-05 09:56:53 UTC
(In reply to Yaniv Kaul from comment #5)
> Is that on track to 4.1.5?

Is that on track for 4.1.6?

Comment 7 Yaniv Kaul 2017-09-07 12:10:23 UTC
(In reply to Yaniv Kaul from comment #6)
> (In reply to Yaniv Kaul from comment #5)
> > Is that on track to 4.1.5?
> 
> Is that on track for 4.1.6?

Ping?

Comment 8 Yaniv Kaul 2017-09-10 09:24:42 UTC
(In reply to Yaniv Kaul from comment #7)
> (In reply to Yaniv Kaul from comment #6)
> > (In reply to Yaniv Kaul from comment #5)
> > > Is that on track to 4.1.5?
> > 
> > Is that on track for 4.1.6?
> 
> Ping?

Moved to 4.1.7...

Comment 9 Shirly Radco 2017-09-27 15:23:33 UTC
(In reply to Yaniv Kaul from comment #8)
> (In reply to Yaniv Kaul from comment #7)
> > (In reply to Yaniv Kaul from comment #6)
> > > (In reply to Yaniv Kaul from comment #5)
> > > > Is that on track to 4.1.5?
> > > 
> > > Is that on track for 4.1.6?
> > 
> > Ping?
> 
> Moved to 4.1.7...

This bug is repeated once in a while for customers.
The lastHourAgg timestamp is set to a an hour that is a few hours the current time.
This causes the dwh to try and aggregate hours that dont yet have samples and daily aggregation to aggregate on empty hour.

I could not locate the issue.

I know that in rhev-tlv there was a power outage that caused this. I tried to reproduce but could not.

I can try to create a workaround for it by comparing the timestamp we plan to update to current time before updating the db and not update if it is not before the current hour.

Please let me know if this is acceptable.

Comment 10 Yaniv Kaul 2017-09-27 15:30:39 UTC
(In reply to Shirly Radco from comment #9)
> (In reply to Yaniv Kaul from comment #8)
> > (In reply to Yaniv Kaul from comment #7)
> > > (In reply to Yaniv Kaul from comment #6)
> > > > (In reply to Yaniv Kaul from comment #5)
> > > > > Is that on track to 4.1.5?
> > > > 
> > > > Is that on track for 4.1.6?
> > > 
> > > Ping?
> > 
> > Moved to 4.1.7...
> 
> This bug is repeated once in a while for customers.
> The lastHourAgg timestamp is set to a an hour that is a few hours the
> current time.
> This causes the dwh to try and aggregate hours that dont yet have samples
> and daily aggregation to aggregate on empty hour.
> 
> I could not locate the issue.
> 
> I know that in rhev-tlv there was a power outage that caused this. I tried
> to reproduce but could not.
> 
> I can try to create a workaround for it by comparing the timestamp we plan
> to update to current time before updating the db and not update if it is not
> before the current hour.
> 
> Please let me know if this is acceptable.

Yes, unless it causes a major performance issue.

Comment 11 Pavel Novotny 2017-10-25 13:08:28 UTC
Verified in
ovirt-engine-4.1.7.4-0.1.el7.noarch
ovirt-engine-dwh-4.1.8-1.el7ev.noarch

I used a freshly installed engine with a host, storage and a VM.
After an hour, the utilization heatmap blocks appeared.
See the attached screenshot.

Comment 12 Pavel Novotny 2017-10-25 13:09:30 UTC
Created attachment 1343241 [details]
screen: utilization heatmap blocks visible


Note You need to log in before you can comment on or make changes to this bug.