Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1057412

Summary: After upgrading to rhev-m 3.3, ETL service aggregation to hourly tables is failing due to tMap_12 NullPointerException
Product: Red Hat Enterprise Virtualization Manager Reporter: Aval <avyadav>
Component: ovirt-engine-dwhAssignee: Yaniv Lavi <ylavi>
Status: CLOSED ERRATA QA Contact: Barak Dagan <bdagan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: acathrow, adevolder, audgiri, avyadav, bazulay, bdagan, iheim, jbiddle, jhunsaker, pstehlik, Rhev-m-bugs, scohen, tdosek, vgaikwad, yeylon
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, during the upgrade of the database, a column was added with all null values. This meant the ETL was not able to aggregate this data, since there was no null handling. Now, the ETL handles null values in the column added, and aggregation works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 18:08:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aval 2014-01-24 03:16:46 UTC
Description of problem:After upgrading to rhevm3.3 couple of customers have reported getting the following errors in the event log.
"ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details."


Version-Release number of selected component (if applicable):
RHEV-M 3.3.0

How reproducible:
By upgrading to rhevm 3.3

Steps to Reproduce:
1.
2.
3.

Actual results:
ETL service failing due to NPE

Expected results:
Should be able to aggregate

Additional info:

- Errors in `/var/log/ovirt-engine/ovirt-engine-dwhd.log` are around tMap_12 NPE

~~~
Exception in component tMap_12
java.lang.NullPointerException
	at ovirt_engine_dwh.aggregationtohourly_3_3.AggregationToHourly.tJDBCInput_10Process(AggregationToHourly.java:23218)
	at ovirt_engine_dwh.aggregationtohourly_3_3.AggregationToHourly$10.run(AggregationToHourly.java:24718)
2014-01-22 17:00:02|twIhMK|mg7vpV|AwUaK7|OVIRT_ENGINE_DWH|AggregationToHourly|Default|6|Java Exception|tMap_12|java.lang.NullPointerException:null|1
Exception in component tRunJob_1
java.lang.RuntimeException: Child job running failed
	at ovirt_engine_dwh.hourlytimekeepingjob_3_3.HourlyTimeKeepingJob.tRunJob_1Process(HourlyTimeKeepingJob.java:1753)
	at ovirt_engine_dwh.hourlytimekeepingjob_3_3.HourlyTimeKeepingJob.tJDBCInput_1Process(HourlyTimeKeepingJob.java:1596)
	at ovirt_engine_dwh.hourlytimekeepingjob_3_3.HourlyTimeKeepingJob.tJDBCConnection_1Process(HourlyTimeKeepingJob.java:1071)
	at ovirt_engine_dwh.hourlytimekeepingjob_3_3.HourlyTimeKeepingJob.tJDBCConnection_2Process(HourlyTimeKeepingJob.java:971)
	at ovirt_engine_dwh.hourlytimekeepingjob_3_3.HourlyTimeKeepingJob$2.run(HourlyTimeKeepingJob.java:3661)
~~~

Comment 1 Yaniv Lavi 2014-01-26 08:31:04 UTC
Ask CU to run this and send the output:

"SELECT 
  history_id,
  history_datetime,
  current_user_name,
  cast(user_logged_in_to_guest as int),
  vm_id,
  minutes_in_status,
  cpu_usage_percent,
  memory_usage_percent,
  user_cpu_usage_percent,
  system_cpu_usage_percent,
  vm_ip,
  vm_client_ip,
  currently_running_on_host,
  vm_configuration_version,
  current_host_configuration_version
FROM vm_samples_history
WHERE (vm_status = 1
      AND history_datetime >= (SELECT var_datetime
				FROM history_configuration
				WHERE var_name = 'lastHourAggr'))
      AND (history_id IS NULL OR history_datetime IS NULL OR vm_id IS NULL OR minutes_in_status IS NULL)
ORDER BY history_datetime,
         current_user_name,
      	 vm_id"

Comment 2 Yaniv Lavi 2014-01-26 16:39:55 UTC
(In reply to Yaniv Dary from comment #1)
> Ask CU to run this and send the output:
> 
> "SELECT 
>   history_id,
>   history_datetime,
>   current_user_name,
>   cast(user_logged_in_to_guest as int),
>   vm_id,
>   minutes_in_status,
>   cpu_usage_percent,
>   memory_usage_percent,
>   user_cpu_usage_percent,
>   system_cpu_usage_percent,
>   vm_ip,
>   vm_client_ip,
>   currently_running_on_host,
>   vm_configuration_version,
>   current_host_configuration_version
> FROM vm_samples_history
> WHERE (vm_status = 1
>       AND history_datetime >= (SELECT var_datetime
> 				FROM history_configuration
> 				WHERE var_name = 'lastHourAggr'))
>       AND (history_id IS NULL OR history_datetime IS NULL OR vm_id IS NULL
> OR minutes_in_status IS NULL)
> ORDER BY history_datetime,
>          current_user_name,
>       	 vm_id"

Found the issue. We are issuing a repair async.



Yaniv

Comment 5 Barak Dagan 2014-01-29 17:18:24 UTC
Verification passed: rhevm-dwh-3.3.0-29.el6ev.noarch.

Having null value in user_logged_in_to_guest column used to result a NPE exception:
Exception in component tMap_12
java.lang.NullPointerException
	at ovirt_engine_dwh.aggregationtohourly_3_3.AggregationToHourly.tJDBCInput_10Process(AggregationToHourly.java:23218)
	at ovirt_engine_dwh.aggregationtohourly_3_3.AggregationToHourly$10.run(AggregationToHourly.java:24718)
2014-01-29 18:00:34|vMsUzs|tnh7Ev|jF5YLs|OVIRT_ENGINE_DWH|AggregationToHourly|Default|6|Java Exception|tMap_12|java.lang.NullPointerException:null|1



and statistics_vms_users_usage_hourly wasn't updated.

after fix was deployed, there is no error, and all previous data was aggregated.

Comment 6 Yaniv Lavi 2014-01-30 11:45:10 UTC
*** Bug 1059678 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2014-01-30 18:08:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0119.html

Comment 10 Yaniv Lavi 2014-02-02 07:53:26 UTC
*** Bug 1059764 has been marked as a duplicate of this bug. ***