Hide Forgot
Description of problem: For QA process automation it is necessary to check consistency of dwh database, and if the ETL process is working correctly within dwh. For this purpse it is required to check consistency of database snapshot before and after dwh ETL process. At the moment we are unable to track when the ETL process of dwh starts as it is running aggregation every minute/hour/day and only way to check this is by higher cpu consumption of dwh. This should be included within logs. It was noted that adding this message every minute to logs would spam them, so I propose putting either all of them as DEBUG message which can be enabled by admin or just the minute/hour ETL start/stop to DEBUG and daily aggregation messages to INFO. What is required: Log messages like as DEBUG/INFO messages of dwh logs: Minutely dwh ETL process aggregation started. Minutely dwh ETL process aggregation stopped. Hourly dwh ETL process aggregation started. Hourly dwh ETL process aggregation stopped. Daily dwh ETL process aggregation started. Daily dwh ETL process aggregation stopped.
shirly - makes sense to me to add a DEBUG level log for this actions, so QE can (in their functionality tests) look for it.
(In reply to Yaniv Kaul from comment #1) > shirly - makes sense to me to add a DEBUG level log for this actions, so QE > can (in their functionality tests) look for it. Yes. Already checking how to implement this requirement.
How whould you like the log files? Each log file for each aggregation (samples/hourly/daily) or All in same file separated from the ovirt-engine-dwh.log file or All in ovirt-engine-dwh.log?
This seems to be like a suitable approach also for customer: I would say added to the ovirt-engine-dwh.log as DEBUG message disabled by default. With config in /etc/ovirt-engine-dwhd/ containing information and configuration about logging. Easier solution would be just to add these messages to ovirt-engine-dwhd.log when DEBUG is enabled on dwhd. Both should be ok to handle within qa automation.
I believe that different aggregations that happen on different threads simultaneously will not be clear if sent all to the same log file. Not sure I understand what you mean in the first option you suggested.
We need to be able to enable DEBUG messages -> thus some config file for DWH should have this. I would put it to some config file for example /etc/ovirt-engine-dwhd/logging.conf with values: logging = DEBUG/WARNING/INFO/ERROR Putting sample/hour/day logs to different files will create confusion. No need to create that many log files, putting this to ovirt-engine-dwhd.log is sufficient. These messages won't be visible unless the admin will specify that he wants to see DEBUG messages (what we will do in QA automation). Regarding multithreaded aggregation and logs. Aggregation should not run anyway while other aggregation on the same database is running, so because of this we will avoid the timing problem. I guess enough fflush will do the job. To verify the order pls add time of the log messages. Example flow: 1. change logging variable in /etc/ovirt-engine-dwhd/logging.conf from INFO to DEBUG 2. see logged messages in format, where INFO messages are already used, and debug will be implemented by this fix 2015-12-14 12:01:07 INFO ETL Service Stopped 2015-12-14 13:01:02 INFO ETL Service Started 2015-12-14 13:01:03 INFO ETL Service will now stop because a flag was raised to disconnect DWH on Engine DB 2015-12-14 00:00:00 DEBUG Aggregation on minutely tables started 2015-12-14 00:00:10 DEBUG Aggregation on minutely tables started 2015-12-14 00:00:12 DEBUG Aggregation on hourly tables started 2015-12-14 00:01:10 DEBUG Aggregation on hourly tables started 2015-12-14 00:00:12 DEBUG Aggregation on daily tables started 2015-12-14 00:01:10 DEBUG Aggregation on daily tables started
Any update on this?
(In reply to Shirly Radco from comment #7) > Any update on this? We are working on it. But as we have to use an unofficial build to do a pre-testing again, it takes longer time than usual verification.
Testing environment: Fedora 22 with fresh new engine and dwh (no reports installed). Tested on clean environment and on setup with 2hosts, 1nfs, 2 VMs up, 2VMs down. 1. Performance Performence-wise this patch does not generate any cpu/memory consumption. However some tests with bigger environment might be necessary when this feature is disaled. 2. Log message Format contains all the necessary information which is expected and is readable/parsable pretty easily. 3. Log frequency. Start of aggregation log message is logged every 1 minute/hour/day. What I see a problem in is the length of aggregation. Does really the 1 hour aggregation lasts whole hour or is it done only at end of each hour? See cut out log messages of hourly aggregation: 2016-01-12|18:00:00|hsoDyY|VlFHdx|KjUDQ7|16354|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6EeCOk4qSIFFDtw|3.6|Default||begin|| ../omitted output... 2016-01-12|19:00:00|hsoDyY|VlFHdx|KjUDQ7|16354|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6EeCOk4qSIFFDtw|3.6|Default||end|success|3599964 By the IDs the log is suggesting that this aggregation started @6pm and ended @7pm. I believe it might have aggregated data from clean engine a bit quicker than exact 1 hour. How can I track how much time did only the aggregation to tables take? Does that means we are aggregating (running aggregation process) whole day to get the daily tables (which seems to me like resource wasting)?
Example for the log in DEBUG mode: 2016-03-09 22:00:00|3VRs7n|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6EeCOk4qSIFFDtw|3.6|Default||begin|| 2016-03-09 22:01:28 Sampeling jobs ended. Duration: 115345 milliseconds 2016-03-09 22:01:28|g4WTw7|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|3.6|Default||end|success|115346 2016-03-09 22:01:28|Xre7kb|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|3.6|Default||begin|| 2016-03-09 22:01:45 Aggregation to Hourly ended. 2016-03-09 22:03:17 Sampeling jobs ended. Duration: 108699 milliseconds 2016-03-09 22:03:17|Xre7kb|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|3.6|Default||end|success|108700 2016-03-09 22:03:17|5mO6o0|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|3.6|Default||begin|| 2016-03-09 22:05:04 Sampeling jobs ended. Duration: 107059 milliseconds 2016-03-09 22:05:04|5mO6o0|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|3.6|Default||end|success|107060 2016-03-09 ..... 2016-03-09 23:00:00|3VRs7n|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6EeCOk4qSIFFDtw|3.6|Default||end|success|3599996 2016-03-09 23:00:00|fQ4i9B|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6EeCOk4qSIFFDtw|3.6|Default||begin|| ...
2016-03-09 22:01:45 Aggregation to Hourly ended. Thank you I'll be able to work with that. However few notes: Can we have the same format as the rest of the log? Also do we have this for daily and minutely aggregation? ID of which hourly aggregation ended? I suggest something like this: 2016-03-09 22:00:00|3VRs7n|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6EeCOk4qSIFFDtw|3.6|Default||end_of_aggregation|| Next new log message is 2016-03-09 22:03:17 Sampeling jobs ended. Duration: 108699 milliseconds Which sampling job? Hourly, Daily, Minutely? Again format consistency. Id of aggregation since we have them. Sampeling typo => Sampling. What is this message trying to tell me, it seems like a number that I can see in next log message: 22:05:04|5mO6o0|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|3.6|Default||end|success|107060 So I believe it's not necessary.
(In reply to Lukas Svaty from comment #11) > 2016-03-09 22:01:45 Aggregation to Hourly ended. > Thank you I'll be able to work with that. However few notes: > Can we have the same format as the rest of the log? > Also do we have this for daily and minutely aggregation? > ID of which hourly aggregation ended? > > I suggest something like this: > 2016-03-09 > 22:00: > 00|3VRs7n|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|_Q9_uQNI6E > eCOk4qSIFFDtw|3.6|Default||end_of_aggregation|| > I don't have all they details for this point. > Next new log message is > > 2016-03-09 22:03:17 Sampeling jobs ended. Duration: 108699 milliseconds > > Which sampling job? Hourly, Daily, Minutely? Again format consistency. Id of > aggregation since we have them. > Sampeling typo => Sampling. Sampling is for minutes, but can be change according to user settings to be less. This is why the job is called SampleTimeKeepingJob. The duration is not the same. I'm currently investigating why it is taking much longer than it should. > What is this message trying to tell me, it seems like a number that I can > see in next log message: > 22:05: > 04|5mO6o0|BdYeaY|AaeemT|7393|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqE > eCaj-T1n0SCFw|3.6|Default||end|success|107060 > So I believe it's not necessary.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
verified in ovirt-engine-dwh-4.0.0-1.el7ev.noarch
oVirt 4.0.0 has been released, closing current release.