This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1465825 - Dashboard: can't see utilization squares (for cluster: CPU, memory and storage)
Dashboard: can't see utilization squares (for cluster: CPU, memory and storage)
Status: ON_QA
Product: ovirt-engine-dwh
Classification: oVirt
Component: ETL (Show other bugs)
4.1.0
Unspecified Unspecified
high Severity high (vote)
: ovirt-4.1.7
: 4.1.8
Assigned To: Shirly Radco
Pavel Novotny
:
Depends On:
Blocks: 1503090
  Show dependency treegraph
 
Reported: 2017-06-28 05:55 EDT by Yaniv Kaul
Modified: 2017-10-21 02:17 EDT (History)
9 users (show)

See Also:
Fixed In Version: ovirt-engine-dwh-4.1.8
Doc Type: Bug Fix
Doc Text:
Cause: lastAggHour value was set incorrectly. Consequence: The hourly aggregation run on hours that were not collected yet and that resulted in empty hours. Also daily aggregations were affected by the hourly aggregations and aggregated empty hours. Fix: Added a fix the prevents hourly aggregations if the lastHourAgg is not valid and the hourly aggregation will wait for an hour until it tries to aggregate again on the same hour. Result: Hourly aggregation are run on the required time period and if the lastHourAgg is for some reason invalid, the hourly aggregation will fail with a warning.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Metrics
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
mgoldboi: planning_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)
Firefox (164.12 KB, image/png)
2017-06-28 05:55 EDT, Yaniv Kaul
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 82321 master MERGED history: validate lastHourAgg value 2017-10-01 06:18 EDT
oVirt gerrit 82416 ovirt-engine-dwh-4.1 MERGED history: validate lastHourAgg value 2017-10-01 06:18 EDT

  None (edit)
Description Yaniv Kaul 2017-06-28 05:55:40 EDT
Created attachment 1292606 [details]
Firefox

Description of problem:
On my F25 laptop, with Wayland or Gnome Classic, with Firefox (54.0 (64-bit)) or  Chrome (60.0.3112.40 (Official Build) beta (64-bit) ) I can't see the bottom squares for utilization.


Version-Release number of selected component (if applicable):
4.1.2.1-0.1.el7

How reproducible:
Always, I'm testing against internal RHEV.TLV setup.
Comment 1 Scott Dickerson 2017-06-28 12:11:50 EDT
Also, there is no status information for the clusters on the status cards.  The different statuses should all sum to the total count of objects in the card's title.

The dashboard_data JSON pull does not include status info for clusters, and the heatMapData is empty.  Looks like the problem in is the DashboardDataServlet/DB tier.
Comment 2 Scott Dickerson 2017-06-28 23:00:22 EDT
My mistake on the N/A on the cluster status card - that is expected.

The heat maps are not being populated because no data is being sent from the server.  No errors are happening in the SQL queries used for the heatmap data.  I looked on the TLV server and this is in the dwhd log (/var/log/ovirt-engine-dwh/virt-engine-dwhd.log): 

2017-06-29 05:51:15|Mu5B5g|QaRG6l|mHAkli|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704

The message repeats just about every minute from the beginning of the log file on 2017-06-11.

I lack additional credentials to triage the RHEV.TLV setup further.
Comment 3 Shirly Radco 2017-06-29 02:40:01 EDT
Hourly aggregation stopped at 2017-06-13 18:00:00+03.
I see that dwh was restarted at that time.

The log errors are indication that there are issues with engine db connection, 
The heartbeat does not update every 15 seconds as required.

I restarted the service and will check if hourly job starts aggregating again.
Comment 4 Oved Ourfali 2017-06-29 08:13:29 EDT
Moving to metrics (no more DWH oVirt team?) and assigning to Shirly.
Shirly, please decide if to close, or there is an issue requires investigating.
Comment 5 Yaniv Kaul 2017-08-06 03:09:35 EDT
Is that on track to 4.1.5?
Comment 6 Yaniv Kaul 2017-09-05 05:56:53 EDT
(In reply to Yaniv Kaul from comment #5)
> Is that on track to 4.1.5?

Is that on track for 4.1.6?
Comment 7 Yaniv Kaul 2017-09-07 08:10:23 EDT
(In reply to Yaniv Kaul from comment #6)
> (In reply to Yaniv Kaul from comment #5)
> > Is that on track to 4.1.5?
> 
> Is that on track for 4.1.6?

Ping?
Comment 8 Yaniv Kaul 2017-09-10 05:24:42 EDT
(In reply to Yaniv Kaul from comment #7)
> (In reply to Yaniv Kaul from comment #6)
> > (In reply to Yaniv Kaul from comment #5)
> > > Is that on track to 4.1.5?
> > 
> > Is that on track for 4.1.6?
> 
> Ping?

Moved to 4.1.7...
Comment 9 Shirly Radco 2017-09-27 11:23:33 EDT
(In reply to Yaniv Kaul from comment #8)
> (In reply to Yaniv Kaul from comment #7)
> > (In reply to Yaniv Kaul from comment #6)
> > > (In reply to Yaniv Kaul from comment #5)
> > > > Is that on track to 4.1.5?
> > > 
> > > Is that on track for 4.1.6?
> > 
> > Ping?
> 
> Moved to 4.1.7...

This bug is repeated once in a while for customers.
The lastHourAgg timestamp is set to a an hour that is a few hours the current time.
This causes the dwh to try and aggregate hours that dont yet have samples and daily aggregation to aggregate on empty hour.

I could not locate the issue.

I know that in rhev-tlv there was a power outage that caused this. I tried to reproduce but could not.

I can try to create a workaround for it by comparing the timestamp we plan to update to current time before updating the db and not update if it is not before the current hour.

Please let me know if this is acceptable.
Comment 10 Yaniv Kaul 2017-09-27 11:30:39 EDT
(In reply to Shirly Radco from comment #9)
> (In reply to Yaniv Kaul from comment #8)
> > (In reply to Yaniv Kaul from comment #7)
> > > (In reply to Yaniv Kaul from comment #6)
> > > > (In reply to Yaniv Kaul from comment #5)
> > > > > Is that on track to 4.1.5?
> > > > 
> > > > Is that on track for 4.1.6?
> > > 
> > > Ping?
> > 
> > Moved to 4.1.7...
> 
> This bug is repeated once in a while for customers.
> The lastHourAgg timestamp is set to a an hour that is a few hours the
> current time.
> This causes the dwh to try and aggregate hours that dont yet have samples
> and daily aggregation to aggregate on empty hour.
> 
> I could not locate the issue.
> 
> I know that in rhev-tlv there was a power outage that caused this. I tried
> to reproduce but could not.
> 
> I can try to create a workaround for it by comparing the timestamp we plan
> to update to current time before updating the db and not update if it is not
> before the current hour.
> 
> Please let me know if this is acceptable.

Yes, unless it causes a major performance issue.

Note You need to log in before you can comment on or make changes to this bug.