Created attachment 871532 [details] engine + dwh logs Description of problem: That message apeasrs in the events monitor. Looking in dwh log: 2014-03-06 16:55:13|n582Uz|pv5kxh|KaE7QY|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 Version-Release number of selected component (if applicable): av2.1 How reproducible: 100% Steps to Reproduce: 1. install engine (remote DB) 2. Create DC + 5 running VMs 3. Install dwh + reports (remote DB) Actual results: error in the log Expected results: Additional info:
This happens when engine is down for any reason, is this the case? Yaniv
"That message apears in the events monitor." - which is part of engine UI, so my answer is no - this is not the case.
(In reply to Barak Dagan from comment #2) > "That message apears in the events monitor." - which is part of engine UI, > so my answer is no - this is not the case. How many times does this appear? Yaniv
once I was aware of(In reply to Yaniv Dary from comment #3) > (In reply to Barak Dagan from comment #2) > > "That message apears in the events monitor." - which is part of engine UI, > > so my answer is no - this is not the case. > > How many times does this appear? > > > > Yaniv once I was aware of
(In reply to Barak Dagan from comment #4) > once I was aware of(In reply to Yaniv Dary from comment #3) > > (In reply to Barak Dagan from comment #2) > > > "That message apears in the events monitor." - which is part of engine UI, > > > so my answer is no - this is not the case. > > > > How many times does this appear? > > > > > > > > Yaniv > > once I was aware of Then this is by design. Any shutdown of engine or during startup of it, will cause stats to not be updated. This will cause this non blocking warning and will stop once engine is back up.
(In reply to Yaniv Dary from comment #5) > (In reply to Barak Dagan from comment #4) > > once I was aware of(In reply to Yaniv Dary from comment #3) > > > (In reply to Barak Dagan from comment #2) > > > > "That message apears in the events monitor." - which is part of engine UI, > > > > so my answer is no - this is not the case. > > > > > > How many times does this appear? > > > > > > > > > > > > Yaniv > > > > once I was aware of > > Then this is by design. > Any shutdown of engine or during startup of it, will cause stats to not be > updated. This will cause this non blocking warning and will stop once engine > is back up. Got tat issue again. Shutdown of engine results this warning message which contimes (at least 2 hours) after the engine is up again.
Can you please look into this? Yaniv
Barak, I didn't found any error in logs. I'm pretty sure that if DWH HeartBeat service wasn't able to write to database, there will be an error in the log. So most probably there will be some error in DWH process who tries to sync data with remote db. Could you please post also server.log if there's not some error? Anyway if this problem happens again, could you please connect to ovirt-engine database and attach results of following SQL: select * from dwh_history_timekeeping So we can be sure if engine side is working fine.
Created attachment 873922 [details] engine + server + dwh logs
engine=# select * from dwh_history_timekeeping; var_name | var_value | var_datetime ---------------------+-----------+------------------------------- lastSampling | | 2014-03-13 14:54:51.33+02 lastSync | | 2014-03-13 14:53:51+02 lastFullHostCheck | | 2014-03-13 14:53:51+02 lastErrorSent | | 2014-03-13 13:10:46.015+02 timesFailed | 18 | heartBeat | | 2014-03-13 13:13:30.279+02 DwhCurrentlyRunning | 1 | lastOsinfoSync | | 2014-03-12 20:25:02.797+02 lastOsinfoUpdate | | 2014-03-12 20:25:02.797321+02 (9 rows)
(In reply to Barak Dagan from comment #10) > engine=# select * from dwh_history_timekeeping; > var_name | var_value | var_datetime > ---------------------+-----------+------------------------------- > lastSampling | | 2014-03-13 14:54:51.33+02 > lastSync | | 2014-03-13 14:53:51+02 > lastFullHostCheck | | 2014-03-13 14:53:51+02 > lastErrorSent | | 2014-03-13 13:10:46.015+02 > timesFailed | 18 | > heartBeat | | 2014-03-13 13:13:30.279+02 > DwhCurrentlyRunning | 1 | > lastOsinfoSync | | 2014-03-12 20:25:02.797+02 > lastOsinfoUpdate | | 2014-03-12 20:25:02.797321+02 > (9 rows) Well, "heartBeat" variable is updated every 30 sec to notify DWH, that it's alive. Unfortunately, ovirt-engine-dwh.log ends at 2014-03-13 13:07:25. Barak, did this error message appear also around 2014-03-13 13:13?
# tail -f /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 2014-03-13 15:17:31|C1QqeO|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:19:06|Z5hKSq|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:20:46|Veams5|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:22:31|mc1gzo|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:24:21|oZ7MGf|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:26:16|LNM5Tb|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:28:16|qQwHBs|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:30:21|sV1o13|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:32:31|GbZahj|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-13 15:34:46|2StNDs|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
(In reply to Barak Dagan from comment #12) Thanks. So it looks like that DWH HeartBeat works fine, but DWH cannot access data. Yaniv?
How is the heartbeat working? the last update to that was 1300 in you cat and there are error in 1500. Seems like a heartbeat issue. Yaniv
(In reply to Yaniv Dary from comment #14) > How is the heartbeat working? the last update to that was 1300 in you cat > and there are error in 1500. Seems like a heartbeat issue. > > > > Yaniv That't not my understanding. As I understand what Barak send us: 1) heartBeat variable is being updated regularly (Barak send us content of the dwh_history_timekeeping around 13:13) 2) Error message "Can not sample data ..." is appearing all the time Barak, is it right? If not could you please aend content of the table and the log from the same time? Thanks
Martin, I belive you are right, the heartbeat is being updates but the errors keeps coming (the DB is installed on different server than the application): engine=# select * from dwh_history_timekeeping; var_name | var_value | var_datetime ---------------------+-----------+------------------------------- lastSampling | | 2014-03-18 12:07:53.25+02 lastSync | | 2014-03-18 12:06:53+02 lastFullHostCheck | | 2014-03-18 12:06:53+02 lastErrorSent | | 2014-03-18 11:08:48.266+02 timesFailed | 33 | heartBeat | | 2014-03-18 11:11:45.247+02 DwhCurrentlyRunning | 1 | lastOsinfoSync | | 2014-03-12 20:25:02.797+02 lastOsinfoUpdate | | 2014-03-12 20:25:02.797321+02 (9 rows) # tail /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 2014-03-18 10:36:08|7cK5aJ|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:38:03|11tjGk|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:40:03|ORmdKN|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:42:08|5nmqx9|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:44:18|R1skc7|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:46:33|JC5FMD|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:48:53|yo2Beb|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:51:18|i3LcIj|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-18 10:53:48|V07Spd|11QUuH|P7ptLS|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
Yaniv, just an idea, since DWH is installed in remote database, do you check heartBeat in correct database (I mean the database for engine)?
(In reply to Martin Perina from comment #17) > Yaniv, just an idea, since DWH is installed in remote database, do you check > heartBeat in correct database (I mean the database for engine)? In case i'm going to be asked, all 3 applications (engine, dwh & reports) are installed on the same machine, while all DBs are installed on the same remote machine.
(In reply to Barak Dagan from comment #18) > (In reply to Martin Perina from comment #17) > > Yaniv, just an idea, since DWH is installed in remote database, do you check > > heartBeat in correct database (I mean the database for engine)? > > In case i'm going to be asked, all 3 applications (engine, dwh & reports) > are installed on the same machine, while all DBs are installed on the same > remote machine. In that case please ignore Comment 17
*** This bug has been marked as a duplicate of bug 1076902 ***
Yaniv, why would you close this bug as duplicate on a newly created one by Lev ? you need to do the opposite!
Shai we think it's a duplicate, That bug has more details, why did you reopen ? There is not rule of precedence between duplicate bugs
(In reply to Shai Revivo from comment #21) > Yaniv, why would you close this bug as duplicate on a newly created one by > Lev ? > you need to do the opposite! See comment #22. Yaniv *** This bug has been marked as a duplicate of bug 1076902 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days