Bug 1073529 - [DWH] - ETL service sampling has encountered an error. Please consult the service log for more details.
Summary: [DWH] - ETL service sampling has encountered an error. Please consult the ser...
Keywords:
Status: CLOSED DUPLICATE of bug 1076902
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-dwh
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.4.0
Assignee: Yaniv Lavi
QA Contact: Barak Dagan
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-06 15:47 UTC by Barak Dagan
Modified: 2023-09-14 02:04 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
We adding host shared memory mb in 3.4 and mistakenly added it as percent. This means that if value of shared memory is higher than short the dwh will fail. This has been corrected and will be introduced in the following release.
Clone Of:
Environment:
Last Closed: 2014-03-20 17:18:52 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine + dwh logs (25.76 KB, application/x-compressed-tar)
2014-03-06 15:47 UTC, Barak Dagan
no flags Details
engine + server + dwh logs (78.43 KB, application/x-compressed-tar)
2014-03-13 11:11 UTC, Barak Dagan
no flags Details

Description Barak Dagan 2014-03-06 15:47:33 UTC
Created attachment 871532 [details]
engine + dwh logs

Description of problem:
That message apeasrs in the events monitor.
Looking in dwh log: 

2014-03-06 16:55:13|n582Uz|pv5kxh|KaE7QY|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704


Version-Release number of selected component (if applicable):
av2.1

How reproducible:
100%

Steps to Reproduce:
1. install engine (remote DB)
2. Create DC + 5 running VMs
3. Install dwh + reports (remote DB)

Actual results:
error in the log

Expected results:


Additional info:

Comment 1 Yaniv Lavi 2014-03-06 17:02:26 UTC
This happens when engine is down for any reason, is this the case?



Yaniv

Comment 2 Barak Dagan 2014-03-06 17:12:01 UTC
"That message apears in the events monitor." - which is part of engine UI, so my answer is no - this is not the case.

Comment 3 Yaniv Lavi 2014-03-06 17:19:18 UTC
(In reply to Barak Dagan from comment #2)
> "That message apears in the events monitor." - which is part of engine UI,
> so my answer is no - this is not the case.

How many times does this appear?



Yaniv

Comment 4 Barak Dagan 2014-03-09 14:14:29 UTC
once I was aware of(In reply to Yaniv Dary from comment #3)
> (In reply to Barak Dagan from comment #2)
> > "That message apears in the events monitor." - which is part of engine UI,
> > so my answer is no - this is not the case.
> 
> How many times does this appear?
> 
> 
> 
> Yaniv

once I was aware of

Comment 5 Yaniv Lavi 2014-03-09 15:37:01 UTC
(In reply to Barak Dagan from comment #4)
> once I was aware of(In reply to Yaniv Dary from comment #3)
> > (In reply to Barak Dagan from comment #2)
> > > "That message apears in the events monitor." - which is part of engine UI,
> > > so my answer is no - this is not the case.
> > 
> > How many times does this appear?
> > 
> > 
> > 
> > Yaniv
> 
> once I was aware of

Then this is by design. 
Any shutdown of engine or during startup of it, will cause stats to not be updated. This will cause this non blocking warning and will stop once engine is back up.

Comment 6 Barak Dagan 2014-03-12 17:45:46 UTC
(In reply to Yaniv Dary from comment #5)
> (In reply to Barak Dagan from comment #4)
> > once I was aware of(In reply to Yaniv Dary from comment #3)
> > > (In reply to Barak Dagan from comment #2)
> > > > "That message apears in the events monitor." - which is part of engine UI,
> > > > so my answer is no - this is not the case.
> > > 
> > > How many times does this appear?
> > > 
> > > 
> > > 
> > > Yaniv
> > 
> > once I was aware of
> 
> Then this is by design. 
> Any shutdown of engine or during startup of it, will cause stats to not be
> updated. This will cause this non blocking warning and will stop once engine
> is back up.

Got tat issue again. Shutdown of engine results this warning message which contimes (at least 2 hours) after the engine is up again.

Comment 7 Yaniv Lavi 2014-03-12 17:48:25 UTC
Can you please look into this?


Yaniv

Comment 8 Martin Perina 2014-03-13 08:30:22 UTC
Barak, I didn't found any error in logs. I'm pretty sure that if DWH HeartBeat service wasn't able to write to database, there will be an error in the log.
So most probably there will be some error in DWH process who tries to sync data with remote db. 

Could you please post also server.log if there's not some error?

Anyway if this problem happens again, could you please connect to ovirt-engine database and attach results of following SQL:

  select * from dwh_history_timekeeping

So we can be sure if engine side is working fine.

Comment 9 Barak Dagan 2014-03-13 11:11:43 UTC
Created attachment 873922 [details]
engine + server + dwh logs

Comment 10 Barak Dagan 2014-03-13 11:14:31 UTC
engine=# select * from dwh_history_timekeeping;
      var_name       | var_value |         var_datetime          
---------------------+-----------+-------------------------------
 lastSampling        |           | 2014-03-13 14:54:51.33+02
 lastSync            |           | 2014-03-13 14:53:51+02
 lastFullHostCheck   |           | 2014-03-13 14:53:51+02
 lastErrorSent       |           | 2014-03-13 13:10:46.015+02
 timesFailed         | 18        | 
 heartBeat           |           | 2014-03-13 13:13:30.279+02
 DwhCurrentlyRunning | 1         | 
 lastOsinfoSync      |           | 2014-03-12 20:25:02.797+02
 lastOsinfoUpdate    |           | 2014-03-12 20:25:02.797321+02
(9 rows)

Comment 11 Martin Perina 2014-03-13 11:55:27 UTC
(In reply to Barak Dagan from comment #10)
> engine=# select * from dwh_history_timekeeping;
>       var_name       | var_value |         var_datetime          
> ---------------------+-----------+-------------------------------
>  lastSampling        |           | 2014-03-13 14:54:51.33+02
>  lastSync            |           | 2014-03-13 14:53:51+02
>  lastFullHostCheck   |           | 2014-03-13 14:53:51+02
>  lastErrorSent       |           | 2014-03-13 13:10:46.015+02
>  timesFailed         | 18        | 
>  heartBeat           |           | 2014-03-13 13:13:30.279+02
>  DwhCurrentlyRunning | 1         | 
>  lastOsinfoSync      |           | 2014-03-12 20:25:02.797+02
>  lastOsinfoUpdate    |           | 2014-03-12 20:25:02.797321+02
> (9 rows)

Well, "heartBeat" variable is updated every 30 sec to notify DWH, that it's alive. Unfortunately, ovirt-engine-dwh.log ends at 2014-03-13 13:07:25. Barak, did this error message appear also around 2014-03-13 13:13?

Comment 12 Barak Dagan 2014-03-13 13:36:08 UTC
# tail -f /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 
2014-03-13 15:17:31|C1QqeO|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:19:06|Z5hKSq|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:20:46|Veams5|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:22:31|mc1gzo|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:24:21|oZ7MGf|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:26:16|LNM5Tb|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:28:16|qQwHBs|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:30:21|sV1o13|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:32:31|GbZahj|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-13 15:34:46|2StNDs|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704

Comment 13 Martin Perina 2014-03-13 13:47:05 UTC
(In reply to Barak Dagan from comment #12)

Thanks. So it looks like that DWH HeartBeat works fine, but DWH cannot access data. Yaniv?

Comment 14 Yaniv Lavi 2014-03-16 12:17:20 UTC
How is the heartbeat working? the last update to that was 1300 in you cat and there are error in 1500. Seems like a heartbeat issue.



Yaniv

Comment 15 Martin Perina 2014-03-16 13:12:50 UTC
(In reply to Yaniv Dary from comment #14)
> How is the heartbeat working? the last update to that was 1300 in you cat
> and there are error in 1500. Seems like a heartbeat issue.
> 
> 
> 
> Yaniv

That't not my understanding. As I understand what Barak send us:

1) heartBeat variable is being updated regularly (Barak send us content of the dwh_history_timekeeping around 13:13)

2) Error message "Can not sample data ..." is appearing all the time

Barak, is it right? If not could you please aend content of the table and the log from the same time?

Thanks

Comment 16 Barak Dagan 2014-03-18 09:12:38 UTC
Martin, 

I belive you are right, the heartbeat is being updates but the errors keeps coming (the DB is installed on different server than the application):

engine=# select * from dwh_history_timekeeping;
      var_name       | var_value |         var_datetime          
---------------------+-----------+-------------------------------
 lastSampling        |           | 2014-03-18 12:07:53.25+02
 lastSync            |           | 2014-03-18 12:06:53+02
 lastFullHostCheck   |           | 2014-03-18 12:06:53+02
 lastErrorSent       |           | 2014-03-18 11:08:48.266+02
 timesFailed         | 33        | 
 heartBeat           |           | 2014-03-18 11:11:45.247+02
 DwhCurrentlyRunning | 1         | 
 lastOsinfoSync      |           | 2014-03-12 20:25:02.797+02
 lastOsinfoUpdate    |           | 2014-03-12 20:25:02.797321+02
(9 rows)


# tail /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 
2014-03-18 10:36:08|7cK5aJ|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:38:03|11tjGk|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:40:03|ORmdKN|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:42:08|5nmqx9|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:44:18|R1skc7|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:46:33|JC5FMD|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:48:53|yo2Beb|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:51:18|i3LcIj|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
2014-03-18 10:53:48|V07Spd|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704

Comment 17 Martin Perina 2014-03-18 09:24:14 UTC
Yaniv, just an idea, since DWH is installed in remote database, do you check heartBeat in correct database (I mean the database for engine)?

Comment 18 Barak Dagan 2014-03-18 09:43:51 UTC
(In reply to Martin Perina from comment #17)
> Yaniv, just an idea, since DWH is installed in remote database, do you check
> heartBeat in correct database (I mean the database for engine)?

In case i'm going to be asked, all 3 applications (engine, dwh & reports) are installed on the same machine, while all DBs are installed on the same remote machine.

Comment 19 Martin Perina 2014-03-18 09:50:17 UTC
(In reply to Barak Dagan from comment #18)
> (In reply to Martin Perina from comment #17)
> > Yaniv, just an idea, since DWH is installed in remote database, do you check
> > heartBeat in correct database (I mean the database for engine)?
> 
> In case i'm going to be asked, all 3 applications (engine, dwh & reports)
> are installed on the same machine, while all DBs are installed on the same
> remote machine.

In that case please ignore Comment 17

Comment 20 Yaniv Lavi 2014-03-19 19:48:36 UTC

*** This bug has been marked as a duplicate of bug 1076902 ***

Comment 21 Shai Revivo 2014-03-20 12:00:45 UTC
Yaniv, why would you close this bug as duplicate on a newly created one by Lev ?
you need to do the opposite!

Comment 22 Barak 2014-03-20 12:21:17 UTC
Shai we think it's a duplicate,
That bug has more details, why did you reopen ?
There is not rule of precedence between duplicate bugs

Comment 24 Yaniv Lavi 2014-03-20 17:18:52 UTC
(In reply to Shai Revivo from comment #21)
> Yaniv, why would you close this bug as duplicate on a newly created one by
> Lev ?
> you need to do the opposite!

See comment #22.


Yaniv

*** This bug has been marked as a duplicate of bug 1076902 ***

Comment 25 Red Hat Bugzilla 2023-09-14 02:04:34 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.