Bug 1277591
Summary: | ETL service sampling has encountered an error. Please consult the service log for more details | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Lynn Dixon <ldixon> |
Component: | ovirt-engine-dwh | Assignee: | Shirly Radco <sradco> |
Status: | CLOSED NOTABUG | QA Contact: | Pavel Stehlik <pstehlik> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.4.2 | CC: | ecohen, gklein, ldixon, lsurette, rbalakri, Rhev-m-bugs, yeylon, ylavi |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-05 15:51:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Lynn Dixon
2015-11-03 15:33:43 UTC
I have collected the logs using engine-log-collector. I will be happy to share the logs with anyone that has a @redhat.com email address. Do not want to attach directly to this publicly facing BZ, since it may contain customer data. I will be happy to post sanitzied bits of the collected logs as needed. There seem to be two different problems here: 1. RHEV webadmin is slow. 2. DWH log error on not being able to collect from engine. I would guess that the first causes the second and not the other way around. This error is produced means that the engine heartbeat for DWH did not run in over a minute, causing to not collect the stats. Also the engine service might be down. Are we sure the DWH is the one causing the slowness? How much CPU\RAM is the DWH taking up? Yaniv, I am not sure which causes which. the RHEV webamdin will work great for hours/days at a time, but will eventually slow and become unresponsive. By stopping the ovirt-engine-dwhd service the webadmin console will begin responding normally. The customer can leave the dwdh service stopped and the webadmin console will not slow. This machine has 16 gig ram, and 4 procs. It is a virtual KVM guest on a RHEL6 host the customer is using specifically to run RHEV-M. Many many thanks to Yaniv Dary for helping find a solution to this issue. There were three tables in the engine database that had their dates very far into the future (see below:) lastSampling \N 2059-02-24 14:13:08.974-06 lastSync \N 2059-02-24 14:12:08-06 lastFullHostCheck \N 2059-02-24 14:12:08-06 Per Yaniv's suggestion I moved the dates of those three entires back to some time in the past (Jan 1st, 2000) so that DWHD would no long error every 30 seconds. Here is the postgresql statement I used: UPDATE dwh_history_timekeeping SET var_datetime = '2000-01-01' WHERE var_name = 'lastSampling'; UPDATE dwh_history_timekeeping SET var_datetime = '2000-01-01' WHERE var_name = 'lastFullHostCheck'; UPDATE dwh_history_timekeeping SET var_datetime = '2000-01-01' WHERE var_name = 'lastSync'; I then restarted ovirt-engine-dwhd and let the data warehouse collect overnight. Reporting began working correctly. Thank you very much to Yaniv for the help! |