Bug 1718165

Summary: ETL service sampling has encountered an error. Please consult the service log for more details.
Product: [oVirt] ovirt-engine-dwh Reporter: nicolas
Component: ETLAssignee: Shirly Radco <sradco>
Status: CLOSED EOL QA Contact: Lukas Svaty <lsvaty>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-13 06:28:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
dwhd log none

Description nicolas 2019-06-07 07:01:42 UTC
Created attachment 1578236 [details]
engine log

Description of problem:

We're running oVirt 4.1.9, not 4.2.0 as specified in this bug (cannot upgrade yet until [1] is released). Since a few days ago our event list if full of lines like this:

  ETL service sampling has encountered an error. Please consult the service log for more details.

Having a look at the log I see events like:

2019-06-06 13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_7|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
2019-06-06 13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_4|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
Exception in component tJDBCOutput_5
org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)
        at ovirt_engine_dwh.statisticssync_4_1.StatisticsSync.tJDBCInput_10Process(StatisticsSync.java:9030)
        at ovirt_engine_dwh.statisticssync_4_1.StatisticsSync$5.run(StatisticsSync.java:16071)
2019-06-06 13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_5|org.postgresql.util.PSQLException:ERROR: current transaction is aborted, commands ignored until end of transaction block|1
Exception in component tRunJob_5
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_5Process(SampleRunJobs.java:1654)
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_6Process(SampleRunJobs.java:1456)
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1228)
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000)
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767)
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642)
        at ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs$2.run(SampleRunJobs.java:2683)
2019-06-06 13:37:11|FdlWtU|TOlL8U|KNLNa4|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_5|java.lang.RuntimeException:Child job running failed|1
Exception in component tRunJob_1
java.lang.RuntimeException: Child job running failed
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6067)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5809)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:4444)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4319)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4188)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3593)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:2977)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2295)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1649)
        at ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11363)

Version-Release number of selected component (if applicable):

4.1.9

How reproducible:

All the time

Additional info:

engine and dwhd logs attached

Comment 1 nicolas 2019-06-07 07:02:07 UTC
Created attachment 1578237 [details]
dwhd log

Comment 2 Shirly Radco 2019-06-10 07:07:42 UTC
This maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1541924 that was fixed in version 4.2.2.


I believe you can update manually the history_id field in tables  - vm_interface_daily_history, vm_interface_hourly_history, vm_interface_samples_history
from integer to bigint and it should fix the issue, but I did not test on 4.1.

like in:
https://gerrit.ovirt.org/#/c/87139/4/packaging/dbscripts/upgrade/04_02_0020__updated_vm_interface_history_id_to_bigint.sql

Comment 3 nicolas 2019-06-10 08:07:24 UTC
(In reply to Shirly Radco from comment #2)
> This maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1541924
> that was fixed in version 4.2.2.
> 
> 
> I believe you can update manually the history_id field in tables  -
> vm_interface_daily_history, vm_interface_hourly_history,
> vm_interface_samples_history
> from integer to bigint and it should fix the issue, but I did not test on
> 4.1.
> 
> like in:
> https://gerrit.ovirt.org/#/c/87139/4/packaging/dbscripts/upgrade/
> 04_02_0020__updated_vm_interface_history_id_to_bigint.sql

I created a secondary database to test the change.

I tried doing it that way, however, the update fails because the history_id fields are referenced by the v4_1_statistics_vms_interfaces_resources_usage_daily view.

bla=# SELECT fn_db_change_column_type('vm_interface_daily_history', 'history_id', 'integer', 'bigint');
ERROR:  cannot alter type of a column used by a view or rule
DETALLE:  rule _RETURN on view v4_1_statistics_vms_interfaces_resources_usage_daily depends on column "history_id"
CONTEXTO:  SQL statement "ALTER TABLE vm_interface_daily_history ALTER COLUMN history_id TYPE bigint"
PL/pgSQL function fn_db_change_column_type(character varying,character varying,character varying,character varying) line 9 at EXECUTE statement
bla=# SELECT fn_db_change_column_type('vm_interface_hourly_history', 'history_id', 'integer', 'bigint');
ERROR:  cannot alter type of a column used by a view or rule
DETALLE:  rule _RETURN on view v4_1_statistics_vms_interfaces_resources_usage_hourly depends on column "history_id"
CONTEXTO:  SQL statement "ALTER TABLE vm_interface_hourly_history ALTER COLUMN history_id TYPE bigint"
PL/pgSQL function fn_db_change_column_type(character varying,character varying,character varying,character varying) line 9 at EXECUTE statement
bla=# SELECT fn_db_change_column_type('vm_interface_samples_history', 'history_id', 'integer', 'bigint');
ERROR:  cannot alter type of a column used by a view or rule
DETALLE:  rule _RETURN on view v4_1_statistics_vms_interfaces_resources_usage_samples depends on column "history_id"
CONTEXTO:  SQL statement "ALTER TABLE vm_interface_samples_history ALTER COLUMN history_id TYPE bigint"
PL/pgSQL function fn_db_change_column_type(character varying,character varying,character varying,character varying) line 9 at EXECUTE statement

Comment 4 Shirly Radco 2019-06-10 09:39:20 UTC
Please try adding a file to your 
packaging/dbscripts/upgrade/

like the one in the patch:
packaging/dbscripts/upgrade/04_02_0020__updated_vm_interface_history_id_to_bigint.sql

(make sure that the file number is higher then the last upgrade script)
After that run 
schema.sh -d ovirt_engine_history -u <user> -c apply
that should drop the views make the alterations and recreate the views.

Comment 5 nicolas 2019-06-10 10:47:32 UTC
/usr/share/ovirt-engine-dwh/dbscripts/upgrade# cat 04_01_0021_bigint_instead_of_int.sql
SELECT fn_db_change_column_type('vm_interface_daily_history', 'history_id', 'integer', 'bigint');
SELECT fn_db_change_column_type('vm_interface_hourly_history', 'history_id', 'integer', 'bigint');
SELECT fn_db_change_column_type('vm_interface_samples_history', 'history_id', 'integer', 'bigint');

/usr/share/ovirt-engine-dwh/dbscripts/upgrade# PGPASSWORD=... /usr/share/ovirt-engine-dwh/dbscripts/schema.sh -d ovirt_engine_history -u ovirt_engine_history -c apply
Creating schema ovirt_engine_history@localhost:5432/ovirt_engine_history
Saving custom users permissions on database objects...
upgrade script detected a change in Config, View or Stored Procedure...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/pre_upgrade/add_comment_col_to_schema_version.sql'...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/pre_upgrade/add_function_to_update_enum_table.sql'...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/pre_upgrade/changed_len_of_installed_by.sql'...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/pre_upgrade/set_default_lang_code.sql'...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/pre_upgrade/set_etl_minimal_version.sql'...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/04_01_0021_bigint_instead_of_int.sql'...
Creating views API 3.6...
Creating views API 4.0...
Creating views API 4.1...
Creating ovirt engine reports views...
Creating stored procedures...
Creating stored procedures from /usr/share/ovirt-engine-dwh/dbscripts/common_sp.sql...
Creating stored procedures from /usr/share/ovirt-engine-dwh/dbscripts/create_sp.sql...
Running upgrade sql script '/usr/share/ovirt-engine-dwh/dbscripts/upgrade/post_upgrade/0010_change_enum_lang_code.sql'...
Applying custom users permissions on database objects...

Having a look at the Dashboard, this seems to have fixed the problem.

Just one more question: Once I upgrade to 4.2, will I have any issue because version 4.2 expects to be those fields integers and not bigints? In other words, should I undo this change before upgrading to 4.2?

Thanks for the help.

Comment 6 Shirly Radco 2019-06-10 11:33:03 UTC
That is great! I'm very happy to hear that.
No. There shouldn't be any issue on upgrade.
Can we close this bug?

Comment 7 nicolas 2019-06-10 11:33:58 UTC
Sure. Thanks for the help.

Comment 8 Sandro Bonazzola 2019-06-13 06:28:44 UTC
Closing with resolution EOL since oVirt 4.1 reached end of life and no fix will be pushed to that version.