Bug 1121994

Summary: Ovirt-engine-dwh fails to collect statistics due to high number in database sequences.
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: ovirt-engine-dwhAssignee: Shirly Radco <sradco>
Status: CLOSED ERRATA QA Contact: Petr Matyáš <pmatyas>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.0CC: aberezin, abisogia, bazulay, dsulliva, ecohen, iheim, juwu, lnovich, meverett, pmatyas, rbalakri, Rhev-m-bugs, sradco, tdosek, yeylon, ylavi
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: All   
OS: Linux   
Whiteboard: infra
Fixed In Version: vt5 - rhevm-dwh-3.5.0-4.el6ev Doc Type: Release Note
Doc Text:
Previously, in specific cases the DWH was unable to insert new records to the ovirt_engine_history database as some of the sequences next value contained a higher number than what the primary key could store. With this update, the column of "history_id" for statistics tables has been changed from type integer to type bigint. Note that if users use the previous version through the REST API, they will need to update the history_id type to bigint in their environment.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 18:15:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1156162    

Description Roman Hodain 2014-07-22 10:13:47 UTC
Description of problem:
    The DWH is unable to insert new records to the ovirt-history-db as some of the sequences next value contain higher number than the primary key can store.

ovirt-engine-dwhd.log:
	...
	   4944 2014-07-16 08:37:06|MO7N73|QHsJuZ|QtaPvp|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1
	   4945 Exception in component tJDBCOutput_4
	   4946 org.postgresql.util.PSQLException: ERROR: integer out of range
	   4947         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
	   4948         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
	   4949         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	   4950         at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
	   4951         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367)
	   4952         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321)
	   4953         at ovirt_engine_dwh.statisticssync_3_4.StatisticsSync.tJDBCInput_8Process(StatisticsSync.java:7788)
	   4954         at ovirt_engine_dwh.statisticssync_3_4.StatisticsSync$5.run(StatisticsSync.java:17391)
	   4955 2014-07-16 08:37:09|FB0naJ|QHsJuZ|fkCOaB|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java Exception|tJDBCOutput_4|org.postgresql.util.PSQLException:ERROR: integer out of range|1
	   4956 Exception in component tJDBCOutput_5
	   4957 org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
	   4958         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
...

postgresql-Fri.log:
...
	ERROR:  integer out of range
	STATEMENT:  INSERT INTO host_interface_samples_history (history_datetime,host_interface_id,receive_rate_percent,transmit_rate_percent,host_interface_configuration_version) VALUES ($1,$2,$3,$4,$5)
	ERROR:  current transaction is aborted, commands ignored until end of transaction block
	STATEMENT:  INSERT INTO vm_interface_samples_history (history_datetime,vm_interface_id,receive_rate_percent,transmit_rate_percent,vm_interface_configuration_version) VALUES ($1,$2,$3,$4,$5)
...

Version-Release number of selected component (if applicable):
	rhevm-dwh-3.4.0-10.el6ev.noarch

How reproducible:
	100% if the service runs for logn time.

Steps to Reproduce:
	This is not easy to reproduce as it need really long time to insert enough
	records to the DB.

	It can be triggered by the following command:

		SELECT pg_catalog.setval('host_interface_history_seq1', 2147491275, true);

Actual results:

	The data are not collected any more

Expected results:

	The collection continues

Additional info:

The sequences are created in the following way

	CREATE SEQUENCE host_interface_history_seq1                                     
	     START WITH 1                                                                                                        
	     INCREMENT BY 1                                                                                                      
	     NO MAXVALUE                                                                                                         
	     NO MINVALUE                                                                                                         
	     CACHE 1;

There is not max value set. The primary key of the related table is an integer
(-2147483648 to +2147483647)

It would be more suitable to use bigserial for autoincrementing values.
We may also set the max value and set CYCLE option for tables where we
periodically remove the old content.

Comment 1 Yaniv Lavi 2014-07-23 14:41:34 UTC
We should change all columns to bigint.



Yaniv

Comment 2 Shirly Radco 2014-08-07 13:57:53 UTC
Barak, what do you think we should do regarding the legacy views?
If we change the column type to bigint it can brake their api and if we dont they will not they may hit the int limit.

Shirly

Comment 3 Barak 2014-08-10 10:42:21 UTC
This needs to be addressed by moving to big int.
The might only influence customers who upgrade to 3.5 and still use older viewes.

I don't thing there are any such customers 

Tomas ?

This fix should go in with appropriate release note message (for 3.5)

Comment 4 Tomas Dosek 2014-08-10 12:07:35 UTC
I totally agree with Barak. Releasing this fix with a proper release not seems like the solution we need here.

With regards to customers using legacy views, I don't really think we have any customer using these on never versions.

Comment 6 Shirly Radco 2014-09-09 11:27:16 UTC
Why did this fail qe?

Comment 7 Petr Matyáš 2014-09-09 11:40:17 UTC
Behaviour is the same as in description. using vt2.2

Comment 8 Shirly Radco 2014-09-10 07:11:47 UTC
Can you please attach the logs?

Thanks,
Shirly

Comment 9 Shirly Radco 2014-09-15 14:29:29 UTC
This bug need to be checked on upgrade from 3.4 to 3.5 and not between 3.5.0 versions.

Comment 11 errata-xmlrpc 2015-02-11 18:15:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0177.html