Created attachment 875104 [details] DWH log Description of problem: After installing rhevm-reports/dwh the RHEVM periodically shows the following event message: ETL service sampling has encountered an error. Please consult the service log for more details. ovirt-engine-dwhd.log contains messages about NullPointerException. Version-Release number of selected component (if applicable): rhevm-dwh-3.4.0-3.el6ev.noarch rhevm-reports-3.4.0-2.el6ev.noarch rhevm-3.4.0-0.5.master.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. install RHEV 3.4 2. setup RHEV 3. install Reports for RHEV 3.4 4. setup Reports 5. login into RHEVM Actual results: Error messages appear. Expected results: System must function without errors. Additional info:
The value of mem_shared column in vds_statistics table in the engine is in MB and should be of int type. But in the view it is casted as smallint and called "ksm_shared_memory_percent" It should be called "ksm_shared_memory_mb".
*** Bug 1073529 has been marked as a duplicate of this bug. ***
*** Bug 1077714 has been marked as a duplicate of this bug. ***
Have similar issue in 3.3.2, should it be cloned or open as a new one? select * from information_schema.colselect * from information_schema.columns where table_name='vds_statistics' and column_name = 'mem_shared'umns where table_name='vds_statistics' and column_name = 'mem_shared' table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octet_length | numeric_pr ---------------+--------------+----------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+----------- engine | public | vds_statistics | mem_shared | 10 | | YES | bigint | | | (1 row) select * from information_schema.columns where table_name='vds' and column_name = 'mem_shared table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octet_length | numeric_precis ---------------+--------------+------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+--------------- engine | public | vds | mem_shared | 67 | | YES | bigint | | | (1 row) select * from information_schema.columns where table_name='vds' and column_name like '%ksm%' table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octet_length | numeric_pr ---------------+--------------+------------+-----------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+----------- engine | public | vds | ksm_cpu_percent | 70 | | YES | integer | | | engine | public | vds | ksm_pages | 71 | | YES | bigint | | | engine | public | vds | ksm_state | 72 | | YES | boolean | | | (3 rows)
Barak, The mem_shared field was added in 3.4, I don't understand how did you encounter it on 3.3.2, Please recheck.
# rpm -qa | egrep 'rhevm-3|dwh|reports' rhevm-3.3.2-0.49.el6ev.noarch rhevm-dwh-3.3.2-1.el6ev.noarch jasperreports-server-pro-5.5.0-6.el6ev.noarch rhevm-reports-3.3.2-3.el6ev.noarch engine=# select column_name from information_schema.columns where table_name='vds' and (column_name like '%ksm%' or column_name like '%mem%'); column_name ---------------------------- physical_mem_mb pending_vmem_size mem_commited max_vds_memory_over_commit reserved_mem usage_mem_percent mem_available mem_free mem_shared ksm_cpu_percent ksm_pages ksm_state (12 rows)
This is the engine db, not the history.
Barak D, Is this field also in the dwh views in the engine db? The view is called "vds".
Is that answer your questio, Shirly ? engine=# select column_name, data_type from information_schema.columns where table_name='vds' and (column_name like '%ksm%' or column_name like '%mem%'); column_name | data_type ----------------------------+----------- physical_mem_mb | integer pending_vmem_size | integer mem_commited | integer max_vds_memory_over_commit | integer reserved_mem | integer usage_mem_percent | integer mem_available | bigint mem_free | bigint mem_shared | bigint ksm_cpu_percent | integer ksm_pages | bigint ksm_state | boolean (12 rows)
You are referring to the engine DB. We had the problem in the engine on 3.4 because we added to the dwh view, cast( mem_shared as smallint), and it is supposed to stay as bigint. We are not supposed to have it on 3.3. I don't see a problem here.
Created attachment 878527 [details] dwh log engine=> select * from dwh_history_timekeeping; var_name | var_value | var_datetime -------------------+-----------+------------------------------- heartBeat | | 2014-03-25 17:33:10.072+02 lastOsinfoSync | | 2014-03-24 17:13:23.46+02 lastErrorSent | | 2014-03-25 17:33:12.288+02 timesFailed | 23 | lastSampling | | 2014-03-25 19:06:24.635+02 lastSync | | 2014-03-25 19:05:24+02 lastFullHostCheck | | 2014-03-25 19:05:24+02 lastOsinfoUpdate | | 2014-03-24 17:13:23.460384+02 (8 rows)
Barak, we checked with Barak Dagan. This is a known issue in 3.3 and unrelated. Please remove z-stream flag.
verification failed av4: # rpm -q rhevm-dwh rhevm-dwh-3.4.0-3.el6ev.noarch could not change directory to "/root" table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character ---------------+--------------+-----------------------+----------------------+------------------+----------------+-------------+-----------+--------------------------+---------- engine | public | dwh_host_history_view | ksm_cpu_percent | 5 | | YES | smallint | | engine | public | dwh_host_history_view | ksm_shared_memory_mb | 13 | | YES | bigint | | (2 rows) 2014-03-26 12:41:34|JodAHz|p20RrR|LqrhGM|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-26 12:56:34|1Q9SjX|p20RrR|LqrhGM|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-26 13:11:34|rFdC1Z|p20RrR|LqrhGM|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-26 13:26:34|2W5cZn|p20RrR|LqrhGM|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704
The error this is refering to is: "ETL service sampling has encountered an error " The reason you are failing this bug on is: "oVirt Engine is not updating the statistics" Please read the bug and understand the issue. The warning you are failing on is a warning, not a error, caused when ovirt-engine is not updating stats for some reason. If you stop engine for any reason, like upgrade or service stop this will appear in the log. Moving back to ON_QA. Yaniv
Created attachment 879445 [details] dwh + serer log Verified on AV4: rhevm-reports-3.4.0-2.el6ev.noarch rhevm-dwh-3.4.0-3.el6ev.noarch jasperreports-server-pro-5.5.0-8.el6ev.noarch rhevm-3.4.0-0.10.beta2.el6ev.noarch psql -d engine -c "select column_name, data_type from information_schema.columns where table_name='dwh_host_history_view' and column_name like '%ksm%' ;" | less -S column_name | data_type ----------------------+----------- ksm_cpu_percent | smallint ksm_shared_memory_mb | bigint (2 rows) psql -d engine_history -c "select column_name, data_type from information_schema.columns where table_name='v3_4_statistics_hosts_resources_usage_samples' and column_name like '%ksm%' ;" | less -S could not change directory to "/root" column_name | data_type ---------------------------+----------- ksm_shared_memory_percent | smallint ksm_cpu_percent | smallint (2 rows) psql -d engine_history -c "select column_name, data_type from information_schema.columns where table_name='v3_4_statistics_hosts_resources_usage_hourly' and column_name like '%ksm%' ;" | less -S column_name | data_type -------------------------------+----------- ksm_shared_memory_percent | smallint max_ksm_shared_memory_percent | smallint ksm_cpu_percent | smallint max_ksm_cpu_percent | smallint (4 rows) Seems that http://gerrit.ovirt.org/25789 is not in. Is it fixed ? Can it explain c#16 ? 2014-03-27 12:54:34|hAhLyS|iQRp3f|BzPXUZ|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java Exception|tRunJob_5|java.lang.RuntimeException:Child job running failed|1 Exception in component tRunJob_1 java.lang.RuntimeException: Child job running failed at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tRunJob_1Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCInput_2Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCConnection_1Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCConnection_2Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tRowGenerator_2Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCInput_3Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCInput_5Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCInput_4Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob.tJDBCConnection_3Process(Unknown Source) at ovirt_engine_dwh.sampletimekeepingjob_3_4.SampleTimeKeepingJob$2.run(Unknown Source) 2014-03-27 13:09:34|BzPXUZ|iQRp3f|C1EYd9|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|6|Java Exception|tRunJob_1|java.lang.RuntimeException:Child job running failed|1 2014-03-27 13:24:34|Qttsg9|iQRp3f|C1EYd9|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 Log attached.
ON av5: 2014-03-19 14:25:00|w35Z39|NTMNnU|U42Rbh|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 2014-03-27 14:59:20|ETL Service Stopped 2014-03-27 15:22:22|ETL Service Started Warning:the operation 'max' for the output column 'max_ksm_shared_memory_mb' can't be processed because of incompatible input and/or output types Is it a new Bug blocking the current one, or should this one be re-assign ?
(In reply to Barak Dagan from comment #19) > ON av5: > > 2014-03-19 > 14:25: > 00|w35Z39|NTMNnU|U42Rbh|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn > |tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. > Please check your oVirt Engine status.|9704 > 2014-03-27 14:59:20|ETL Service Stopped > 2014-03-27 15:22:22|ETL Service Started > Warning:the operation 'max' for the output column 'max_ksm_shared_memory_mb' > can't be processed because of incompatible input and/or output types > > Is it a new Bug blocking the current one, or should this one be re-assign ? Please attach full log.
Created attachment 880266 [details] dwh av5 log
(In reply to Shirly Radco from comment #15) > Barak, we checked with Barak Dagan. This is a known issue in 3.3 and > unrelated. > Please remove z-stream flag.
(In reply to Barak Dagan from comment #19) > ON av5: > > 2014-03-19 > 14:25: > 00|w35Z39|NTMNnU|U42Rbh|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn > |tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. > Please check your oVirt Engine status.|9704 > 2014-03-27 14:59:20|ETL Service Stopped > 2014-03-27 15:22:22|ETL Service Started > Warning:the operation 'max' for the output column 'max_ksm_shared_memory_mb' > can't be processed because of incompatible input and/or output types > > Is it a new Bug blocking the current one, or should this one be re-assign ? This bug is puzzling me. Can you try a fresh install of engine and let me know if this still happens? the definition in the project looks good. Yaniv
(In reply to Yaniv Dary from comment #24) > (In reply to Barak Dagan from comment #19) > > ON av5: > > > > 2014-03-19 > > 14:25: > > 00|w35Z39|NTMNnU|U42Rbh|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn > > |tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. > > Please check your oVirt Engine status.|9704 > > 2014-03-27 14:59:20|ETL Service Stopped > > 2014-03-27 15:22:22|ETL Service Started > > Warning:the operation 'max' for the output column 'max_ksm_shared_memory_mb' > > can't be processed because of incompatible input and/or output types > > > > Is it a new Bug blocking the current one, or should this one be re-assign ? > > This bug is puzzling me. Can you try a fresh install of engine and let me > know if this still happens? the definition in the project looks good. > > > Yaniv Never mind was able to recreate and fix. Movin to MODIFIED.
Verified on av4.1: rhevm-3.4.0-0.10.beta2.el6ev.noarch rhevm-dwh-3.4.0-6.el6ev.noarch rhevm-dwh-setup-3.4.0-6.el6ev.noarch rhevm-reports-3.4.0-2.el6ev.noarch rhevm-reports-setup-3.4.0-2.el6ev.noarch jasperreports-server-pro-5.5.0-9.el6ev.noarch bash-4.1$ psql -d engine -c "select column_name, data_type from information_schema.columns where table_name ='dwh_host_history_view' and (column_name like '%ksm%' or column_name like '%mem%'); > " could not change directory to "/root" column_name | data_type ----------------------+----------- memory_usage_percent | smallint ksm_cpu_percent | smallint ksm_shared_memory_mb | bigint bash-4.1$ psql -d ovirt_engine_history -c "select table_name, column_name, data_type from information_schema.columns where table_name like 'host%samples%' and (column_name like '%ksm%' or column_name like '%mem%');" could not change directory to "/root" table_name | column_name | data_type ----------------------+----------------------+----------- host_samples_history | memory_usage_percent | smallint host_samples_history | ksm_cpu_percent | smallint host_samples_history | ksm_shared_memory_mb | bigint bash-4.1$ psql -d ovirt_engine_history -c "select table_name, column_name, data_type from information_schema.columns where table_name like 'host%hour%' and column_name like '%ksm%';" could not change directory to "/root" table_name | column_name | data_type ---------------------+--------------------------+----------- host_hourly_history | ksm_cpu_percent | smallint host_hourly_history | max_ksm_cpu_percent | smallint host_hourly_history | ksm_shared_memory_mb | bigint host_hourly_history | max_ksm_shared_memory_mb | bigint (4 rows) bash-4.1$ psql -d ovirt_engine_history -c "select table_name, column_name, data_type from information_schema.columns where table_name like 'host%daily%' and column_name like '%ksm%';" could not change directory to "/root" table_name | column_name | data_type --------------------+--------------------------+----------- host_daily_history | ksm_cpu_percent | smallint host_daily_history | max_ksm_cpu_percent | smallint host_daily_history | ksm_shared_memory_mb | bigint host_daily_history | max_ksm_shared_memory_mb | bigint (4 rows) bash-4.1$ psql -d ovirt_engine_history -c "select date_trunc('hour', history_datetime), count(*) from host_samples_history group by 1;" | less -S date_trunc | count ------------------------+------- 2014-04-01 14:00:00+03 | 55 2014-04-01 15:00:00+03 | 120 2014-04-01 16:00:00+03 | 120 2014-04-01 17:00:00+03 | 120 ... 2014-04-02 07:00:00+03 | 120 2014-04-02 08:00:00+03 | 120 2014-04-02 09:00:00+03 | 120 2014-04-02 10:00:00+03 | 100 (21 rows) bash-4.1$ psql -d ovirt_engine_history -c "select date_trunc('hour', history_datetime), count(*) from host_hourly_history group by 1 order by 1;" | less -S date_trunc | count ------------------------+------- 2014-04-01 14:00:00+03 | 2 2014-04-01 15:00:00+03 | 2 2014-04-01 16:00:00+03 | 2 2014-04-01 17:00:00+03 | 2 ... 2014-04-02 05:00:00+03 | 2 2014-04-02 06:00:00+03 | 2 2014-04-02 07:00:00+03 | 2 2014-04-02 08:00:00+03 | 2 (19 rows) bash-4.1$ psql -d ovirt_engine_history -c "select date_trunc('hour', history_datetime), count(*) from host_daily_history group by 1 order by 1;" | less -S date_trunc | count ------------+------- (0 rows) ----> will continue monitor
sql -d ovirt_engine_history -c "select date_trunc('hour', history_datetime), count(*) from host_daily_history group by 1 order by 1;" | less -S date_trunc | count ------------------------+------- 2014-04-01 00:00:00+03 | 2 (1 row)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0601.html