1222447 – ETL service aggregation to hourly tables is failing for specific timezones due to the way the ETL interprets the timezone

Bug 1222447 - ETL service aggregation to hourly tables is failing for specific timezones due to the way the ETL interprets the timezone

Summary: ETL service aggregation to hourly tables is failing for specific timezones du...

Keywords:
Status:	CLOSED DUPLICATE of bug 1195395
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine-dwh
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.6.0
Assignee:	Shirly Radco
QA Contact:	Lukas Svaty
Docs Contact:
URL:
Whiteboard:	infra
Depends On:	1195395
Blocks:
TreeView+	depends on / blocked

Reported:	2015-05-18 08:57 UTC by wwu@redhat.com
Modified:	2019-08-15 04:36 UTC (History)
CC List:	12 users (show)
Fixed In Version:	ovirt-3.6.0-beta2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-20 07:14:24 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
sosreport (15.57 MB, application/x-gzip) 2015-05-18 08:57 UTC, wwu@redhat.com	no flags	Details
Some items are showing "null" (82.56 KB, image/png) 2015-06-05 13:53 UTC, Chen	no flags	Details
View All

Description wwu@redhat.com 2015-05-18 08:57:02 UTC

Created attachment 1026627 [details]
sosreport

Description of problem:
Here is part of log information:
~~~
# cat /var/log/ovirt-engine-reports/jasperserver.log
Caused by: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
(<------Here is the error information about network)

        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:134)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:123)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:28)
        at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:19)
        at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:29)
        at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:22)
        at org.postgresql.Driver.makeConnection(Driver.java:391)
        at org.postgresql.Driver.connect(Driver.java:265)
        at org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.getLocalManagedConnection(LocalManagedConnectionFactory.java:303)
        ... 15 more
Caused by: java.net.ConnectException: Connection refused
~~~

Comment 1 Yaniv Lavi 2015-05-18 10:18:50 UTC

This means that postgresql is either down\not install or blocked by firewall.
Is that the case?

Comment 2 wwu@redhat.com 2015-05-19 01:20:08 UTC

Hi Yaniv,

Thanks for your reminding,I will check that for customer.

Comment 3 wwu@redhat.com 2015-05-26 02:37:57 UTC

Hi Yaniv,

1.Here is the output of "ps":
~~~
# less ps|grep post
postfix    461  0.0  0.0  80936  3372 ?        S    12:49   0:00 pickup -l -t fifo -u
root      1713  0.0  0.0  80856  2548 ?        Ss   Mar26   0:22 /usr/libexec/postfix/master
postfix   1721  0.0  0.0  81108  2672 ?        S    Mar26   0:04 qmgr -l -t fifo -u
postgres  7311  0.1  0.5 269852 84976 ?        Ss   May21  10:08 postgres: engine engine 127.0.0.1(59064) idle     
postgres 12177  0.0  0.0 217340  5780 ?        S    May07   1:10 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 12179  0.0  0.0 178984  1132 ?        Ss   May07   1:54 postgres: logger process                          
postgres 12181  0.0  0.2 217600 35272 ?        Ss   May07  19:59 postgres: writer process                          
postgres 12182  0.0  0.0 217340  1400 ?        Ss   May07   2:54 postgres: wal writer process                      
postgres 12183  0.0  0.0 217980  1812 ?        Ss   May07   1:52 postgres: autovacuum launcher process             
postgres 12184  0.0  0.0 179588  1512 ?        Ss   May07   8:08 postgres: stats collector process                 
postgres 12188  0.0  0.0 221020  8212 ?        Ss   May07   1:54 postgres: engine engine ::1(37920) idle           
ovirt    16573  0.4  1.7 6781972 287884 ?      Sl   May07 108:20 ovirt-engine-dwhd -Dorg.ovirt.engine.dwh.settings=/tmp/tmpfYoZgi/settings.properties -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -classpath /usr/share/ovirt-engine-dwh/lib/*::/usr/share/java/dom4j-eap6/dom4j-1.6.1.redhat-6-sources.jar:/usr/share/java/dom4j-eap6/dom4j-1.6.1.redhat-6.jar:/usr/share/java/commons-collections.jar:/usr/share/java/postgresql-jdbc.jar ovirt_engine_dwh.historyetl_3_5.HistoryETL --context=Default
postgres 16587  0.0  0.0 218828 11180 ?        Ss   May07   3:59 postgres: engine engine 127.0.0.1(43981) idle     
postgres 16588  0.0  0.3 233412 49328 ?        Ss   May07   5:00 postgres: engine engine 127.0.0.1(43982) idle     
postgres 16589  0.0  0.2 222816 41736 ?        Ss   May07  10:23 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(43983) idle
postgres 16590  0.0  0.2 219652 33144 ?        Ss   May07   0:03 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(43984) idle in transaction
postgres 16591  0.0  0.0 219632 10916 ?        Ss   May07   0:00 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(43985) idle in transaction
postgres 16592  0.0  0.2 220028 39316 ?        Ss   May07   2:17 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(43986) idle in transaction
postgres 16923  0.0  0.2 235828 33192 ?        Ss   May07   0:12 postgres: ovirt_engine_reports ovirt_engine_reports 127.0.0.1(44014) idle
postgres 16929  0.0  0.1 226712 24744 ?        Ss   May07   0:12 postgres: ovirt_engine_reports ovirt_engine_reports 127.0.0.1(44016) idle
postgres 16946  0.0  0.1 227328 24280 ?        Ss   May07   0:12 postgres: ovirt_engine_reports ovirt_engine_reports 127.0.0.1(44025) idle
postgres 16947  0.0  0.1 226732 23068 ?        Ss   May07   0:12 postgres: ovirt_engine_reports ovirt_engine_reports 127.0.0.1(44026) idle
postgres 16949  0.0  0.1 234120 31284 ?        Ss   May07   0:12 postgres: ovirt_engine_reports ovirt_engine_reports 127.0.0.1(44027) idle
postgres 16950  0.0  0.1 227324 23424 ?        Ss   May07   0:12 postgres: ovirt_engine_reports ovirt_engine_reports 127.0.0.1(44028) idle
postgres 21357  0.1  0.6 293920 108492 ?       Ss   May20  12:54 postgres: engine engine 127.0.0.1(58545) idle     
postgres 21477  0.1  0.6 292228 109840 ?       Ss   May20  12:49 postgres: engine engine 127.0.0.1(58547) idle     
postgres 21484  0.1  0.5 276516 93424 ?        Ss   May20  12:50 postgres: engine engine 127.0.0.1(58553) idle     
postgres 22608  0.1  0.5 276456 93596 ?        Ss   May20  12:37 postgres: engine engine 127.0.0.1(58596) idle     
postgres 29055  0.0  0.0 221844 11052 ?        Ss   May11   0:00 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(46346) idle
postgres 29801  0.1  0.5 269344 83576 ?        Ss   May20  11:24 postgres: engine engine 127.0.0.1(58867) idle 
~~~

2.Here is the installed packages:
~~~
# less installed-rpms |grep sql
mysql-libs-5.1.73-3.el6_5.x86_64                            Fri Mar 20 08:24:42 2015
postgresql-8.4.20-1.el6_5.x86_64                            Wed Mar 25 23:45:01 2015
postgresql-jdbc-8.4.704-2.el6.noarch                        Thu Apr  9 15:04:56 2015
postgresql-libs-8.4.20-1.el6_5.x86_64                       Wed Mar 25 23:44:44 2015
postgresql-server-8.4.20-1.el6_5.x86_64                     Wed Mar 25 23:45:02 2015
qt-sqlite-4.6.2-28.el6_5.x86_64                             Fri Mar 20 08:24:40 2015
sqlite-3.6.20-1.el6.x86_64                                  Fri Mar 20 08:23:41 2015
~~~

3.No firewall in this machine.

4.Here is part of log:
~~~
2015-05-07 16:10:18,255 WARN  [org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory] (quartzScheduler_QuartzSchedulerThread) Destroying connection that is n
ot valid, due to the following exception: org.postgresql.jdbc4.Jdbc4Connection@67fa4139: org.postgresql.util.PSQLException: FATAL: terminating connection due to admini
strator command
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094) [postgresql.jar:]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1827) [postgresql.jar:]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) [postgresql.jar:]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:508) [postgresql.jar:]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:370) [postgresql.jar:]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:362) [postgresql.jar:]
        at org.jboss.jca.adapters.jdbc.CheckValidConnectionSQL.isValidConnection(CheckValidConnectionSQL.java:74)
        at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.isValidConnection(BaseWrapperManagedConnectionFactory.java:1100)
        at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.checkValid(BaseWrapperManagedConnection.java:513)
        at org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.matchManagedConnections(LocalManagedConnectionFactory.java:513)
        at org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreArrayListManagedConnectionPool.getConnection(SemaphoreArrayListManagedConnectionPool.java:314)
        at org.jboss.jca.core.connectionmanager.pool.AbstractPool.getSimpleConnection(AbstractPool.java:406)
        at org.jboss.jca.core.connectionmanager.pool.AbstractPool.getConnection(AbstractPool.java:378)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.getManagedConnection(AbstractConnectionManager.java:354)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.getManagedConnection(AbstractConnectionManager.java:327)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:491)
        at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:143)
        at org.springframework.scheduling.quartz.LocalDataSourceJobStore$2.getConnection(LocalDataSourceJobStore.java:129) [spring-context-support-3.1.0.RELEASE.jar:3.1.0.RELEASE]
        at org.quartz.utils.DBConnectionManager.getConnection(DBConnectionManager.java:108) [quartz-2.1.2.jar:]
        at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:165) [quartz-2.1.2.jar:]
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3807) [quartz-2.1.2.jar:]
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:2751) [quartz-2.1.2.jar:]
        at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:264) [quartz-2.1.2.jar:]
~~~

5.pls check the attachment of sosreport.

6.Cu has destroyed this environment,it will be impossible to collect any data.

Comment 4 Shirly Radco 2015-05-28 09:30:28 UTC

Hi,

1. Did you try restarting the posrtgresql service? and ovirt-engine-reportsd service ?
2. Please attach the logs under /var/log/ovirt-engine/ and /var/log/ovirt-engine-reports/.

Comment 5 wwu@redhat.com 2015-06-01 03:44:44 UTC

Hi,

The attachment is the tarall of rhevm's log.

Thanks
Victor

Comment 6 Shirly Radco 2015-06-01 08:05:23 UTC

Hi, 

The attached sos reports is not the logs I require.
Please provide the logs I referred to in comment #4.

Also, please reply if you tried restarting the posrtgresql service? and ovirt-engine-reportsd service ?

Comment 7 wwu@redhat.com 2015-06-02 02:48:41 UTC

Hi

1.I'm sorry but the log you required is contained in the attachment.
~~~
[root@victor newlog]# tar xf rhevm.log.gz    <------It's the attachment
[root@victor newlog]# ls
rhevm-2015050514501430808611  rhevm.log.gz
[root@victor newlog]# cd rhevm-2015050514501430808611/var/log/ovirt-engine
ovirt-engine/         ovirt-engine-dwh/     ovirt-engine-reports/ 
[root@victor newlog]# cd rhevm-2015050514501430808611/var/log/ovirt-engine
[root@victor ovirt-engine]# ls
boot.log     engine.log           host-deploy  ovirt-iso-uploader   redhat-support-plugin-rhev.log  setup
console.log  engine.log-20150505  notifier     ovirt-log-collector  server.log
[root@victor ovirt-engine]# cd ../ovirt-engine-reports/
[root@victor ovirt-engine-reports]# ls
boot.log  console.log  jasperserver.log  reports.log  server.log
~~~

>>>please reply if you tried restarting the posrtgresql service? and ovirt-engine-reportsd service ?

cu tried that.

Regards
Victor

Comment 9 wwu@redhat.com 2015-06-03 03:38:48 UTC

>>>1. Is the jasper server on the same machine as the engine?

The Jasper report is combined with RHEVM in the same virtual machine 

>>>2. Is the fqdn of the machine fully qualified?
there is local DNS resolve in /etc/hosts.

~~~
# cat etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.2.22.165	rhevm.lab.com
10.2.22.119     rhevh.lab.com
# cat etc/sysconfig/network
NETWORKING=yes
HOSTNAME=rhevm.lab.com
NTPSERVERARGS=iburst
~~~


Thanks 
Victor

Comment 10 wwu@redhat.com 2015-06-04 01:53:59 UTC

I've reproduced this issue successfully.

Jasper-report is newly installed on this machine(last night).

How to reproduce:
~~~
# yum install rhevm-dwh rhevm-reports
# engine-setup
Configure Data Warehouse on this host (Yes, No) [Yes]:Yes
Configure Reports on this host (Yes, No) [Yes]:Yes
Setup can automatically configure the firewall on this system.
Note: automatic configuration of the firewall may overwrite current
settings.
Do you want Setup to configure the firewall? (Yes, No) [Yes]:No
...
(Just as the steps of the RHEV3.5 Installation Guide)
~~~

1.Firewall status
~~~
# /etc/init.d/iptables status
Table: filter
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination
~~~

2.Selinux status
~~~
# getenforce 
Disabled
~~~

3.FQDN status
~~~
[root@rhevm ovirt-engine]# hostname 
rhevm.cchen.redhat.com
[root@rhevm ovirt-engine]# hostname -s
rhevm
[root@rhevm ovirt-engine]# hostname -f
rhevm.cchen.redhat.com
~~~

Thanks
Victor

Comment 11 wwu@redhat.com 2015-06-04 07:37:51 UTC

Hi ,

BTW,
I checked my database:

Here is detailed steps:
~~~
-bash-4.1$ psql 
psql (8.4.20)
Type "help" for help.

postgres=# \l
                                                            List of databases
                Name                 |                Owner                | Encoding |  Collation  |    Ctype    |   Access privileges   
-------------------------------------+-------------------------------------+----------+-------------+-------------+-----------------------
 engine                              | engine                              | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 ovirt_engine_history                | ovirt_engine_history                | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 ovirt_engine_history_20150603102707 | ovirt_engine_history_20150603102707 | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 ovirt_engine_reports                | ovirt_engine_reports                | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 ovirt_engine_reports_20150603102714 | ovirt_engine_reports_20150603102714 | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres                            | postgres                            | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 rhevm_history                       | postgres                            | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0                           | postgres                            | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
                                                                                                                  : postgres=CTc/postgres
 template1                           | postgres                            | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
                                                                                                                  : postgres=CTc/postgres
(9 rows)
postgres=# \c ovirt_engine_history
psql (8.4.20)
You are now connected to database "ovirt_engine_history".
ovirt_engine_history=# \d
                                          List of relations
 Schema |                           Name                           |   Type   |        Owner         
--------+----------------------------------------------------------+----------+----------------------
 public | calendar                                                 | table    | ovirt_engine_history
 public | cluster_configuration                                    | table    | ovirt_engine_history
 public | configuration_seq                                        | sequence | ovirt_engine_history
 public | datacenter_configuration                                 | table    | ovirt_engine_history
 public | datacenter_daily_history                                 | table    | ovirt_engine_history
 public | datacenter_history_seq1                                  | sequence | ovirt_engine_history
 public | datacenter_history_seq2                                  | sequence | ovirt_engine_history
ovirt_engine_history=# \d+ datacenter_daily_history
                                                 Table "public.datacenter_daily_history"
              Column              |     Type     |                           Modifiers                           | Storage | Description 
----------------------------------+--------------+---------------------------------------------------------------+---------+-------------
 history_id                       | bigint       | not null default nextval('datacenter_history_seq3'::regclass) | plain   | 
 history_datetime                 | date         | not null                                                      | plain   | 
 datacenter_id                    | uuid         | not null                                                      | plain   | 
 datacenter_status                | smallint     | not null                                                      | plain   | 
 minutes_in_status                | numeric(7,2) | not null default 1                                            | main    | 
 datacenter_configuration_version | integer      | not null                                                      | plain   | 
Indexes:
    "datacenter_daily_history_pkey" PRIMARY KEY, btree (history_id)
    "datacenter_daily_history_datacenter_id_idx" btree (datacenter_id)
    "idx_datacenter_configuration_version_daily" btree (datacenter_configuration_version)
    "idx_datacenter_history_datetime_daily" btree (history_datetime)
Foreign-key constraints:
    "datacenter_daily_history_datacenter_configuration_version_fkey" FOREIGN KEY (datacenter_configuration_version) REFERENCES datacenter_configuration(history_id)
Has OIDs: yes

ovirt_engine_history=# SELECT * from datacenter_daily_history
ovirt_engine_history-# ;
 history_id | history_datetime | datacenter_id | datacenter_status | minutes_in_status | datacenter_configuration_version 
------------+------------------+---------------+-------------------+-------------------+----------------------------------
(0 rows)
~~~

Comment 12 Shirly Radco 2015-06-04 14:11:57 UTC

1. Are the databases on the same machine as the engine,dwh and reports?
2. You should log to  ovirt_engine_history_20150603102707 which is the db used currently on your setup and run the sql query again.
3. All dns addresses must be fully resolvable. We do not support hacks on /etc/hosts.
4. Please provide dwh log after the service restart.

Comment 13 Shirly Radco 2015-06-04 14:13:41 UTC

sorry, please provide again all the logs. not only dwh.

Comment 14 wwu@redhat.com 2015-06-05 12:35:45 UTC

Hi 

I can reproduce that every time when I reinstall it.

So did my colleagues and my customers.

So I'm wondering if it is a known bug for :
~~~
couldn't get the status of "Uptime"/"Planned Downtime"/"Unplanned Downtime" of "Single Host Uptime(BR8)" in "Jasper-report".
~~~
It might happens also in "Single Host Resource Usage" of "jasper-report" 

It'd be better if you could try to install and run jasper-report to check that.

Thanks 
Victor

Comment 15 Chen 2015-06-05 13:42:54 UTC

Hi Shirly,

Thank you for your work on this bug.

I work with victor and currently we are co-working on this hot issue together.
This is a hot issue because our partner and SA are delivering PoC to the customer but we failed to get some items in BRXXX Library. This issue is causing quite business impact and I hope we can co-operate with each other to figure it out.

Let's take BR8 as an example:

If my understanding is correct, the UI page fills in the Uptime/Planned Downtime/Unplanned Downtime data using the SQL query as following (if I'm wrong please kindly correct me.):

# vim /usr/share/ovirt-engine-reports/ovirt-reports/resources/Reports/Service_level/Hosts/single_host_uptime_br8_files/single_host_uptime_br8_jrxml.data 

SELECT
    SUM (
        CASE
            WHEN v3_5_statistics_hosts_resources_usage_$P!{table_name}.host_status = 2
                /* If "Period" equals to "Daily" then "table_name" parameter equals to "hourly" else "daily" */
                THEN
                    COALESCE (
                        v3_5_statistics_hosts_resources_usage_$P!{table_name}.minutes_in_status, 0
                    )
            ELSE 0
        END
    )
    AS planned_downtime_mins,
    SUM (
        CASE
            WHEN v3_5_statistics_hosts_resources_usage_$P!{table_name}.host_status = 3
                THEN
                    COALESCE (
                        v3_5_statistics_hosts_resources_usage_$P!{table_name}.minutes_in_status, 0
                    )
            ELSE 0
        END
    )
    AS unplanned_downtime_mins,
    SUM (
        CASE
            WHEN v3_5_statistics_hosts_resources_usage_$P!{table_name}.host_status = 1
                THEN
                    COALESCE (
                        v3_5_statistics_hosts_resources_usage_$P!{table_name}.minutes_in_status, 0
                    )
            ELSE 0
        END
    )
    AS uptime_mins,
    SUM (
        v3_5_statistics_hosts_resources_usage_$P!{table_name}.minutes_in_status
    ) AS total
FROM v3_5_statistics_hosts_resources_usage_$P!{table_name}

Two hours ago I fresh installed a RHEV 3.5 and added a host into it. 

engine=# SELECT host_name,vds_name from vds
;
   host_name   | vds_name 
---------------+----------
 10.66.192.136 | h1
(1 row)


I had a look at v3_5_statistics_hosts_resources_usage_XXXXXX tables but there seems to be no data in those tables.

ovirt_engine_history=# SELECT count(*) from v3_5_statistics_hosts_resources_usage_hourly ;
 count 
-------
     0
(1 row)

ovirt_engine_history=# SELECT count(*) from v3_5_statistics_hosts_resources_usage_daily ;
 count 
-------
     0
(1 row)

I'm not sure whether this is the cause that all the three items are showing "Null".

As victor mentioned, all of us (on-site SA, partner, customers, victor and I) are encountering the same issue. Please help us to confirm whether that is the cause (or whether we are misunderstanding the usage of BRXXX). We are awaiting for your feedback.

My test environment: ( if you need it )

10.66.219.175 root/redhat  admin/redhat

All the databases are deployed in the same box and hostnames can be resolved correctly using a DNS.

Best Regards,
Chen

Comment 16 Chen 2015-06-05 13:53:24 UTC

Created attachment 1035274 [details]
Some items are showing "null"

Comment 18 Chen 2015-06-05 13:55:13 UTC

Sorry the correct screenshot is attachment 1035274 [details].

Comment 19 Shirly Radco 2015-06-07 07:27:07 UTC

Hi,

I now understand the issue. The aggregation is not working due to an issue with the date formatting.
There is a bug already open for this issue. #1195395
It is fixed for master and I'm currently working on fixing it for 3.5.
If you take a look at history_configuration table in the history database you will see that the lastHourAggr is incorrect.

Comment 21 Chen 2015-06-08 03:00:19 UTC

Hi Shirly,

Thank you for your reply.

I tried the following steps in my test environment to workaround this issue and indeed the NPE has gone but I still can not get those data ( still shows as NULL ). ( If anything wrong please kindly let me know )

Here is what I have done:

1. Backed up /usr/share/pgsql/timezonesets/Default and changed CST as the following:

CST    28800

( China is in GMT+8:00 so 3600*8=28800 
# date
Mon Jun  8 10:51:26 CST 2015
)

2. Restarted a set of services in the following sequence:

# service ovirt-engine stop
# service ovirt-engine-dwhd stop
# service postgresql restart
# service ovirt-engine start
# service ovirt-engine-dwhd start

3. The call trace of java NPE has gone but I still can not get the data.

Are these two bugs the same cause ? The v3_5_statistics_hosts_resources_usage_* tables are still empty. 

>If you take a look at history_configuration table in the history database you will see that the lastHourAggr is incorrect.

Here is what lastHourAggr:

# SELECT * from history_configuration where var_name='lastHourAggr' ;
   var_name   | var_value |      var_datetime      
--------------+-----------+------------------------
 lastHourAggr |           | 2015-06-08 23:00:00+08
(1 row)

Would you please teach me what a correct lasHourAggr would look like ? 

>It is fixed for master and I'm currently working on fixing it for 3.5.

If any proposal patch is welcome as we should come over PoC first.

Thank you !

Best Regards,
Chen

Comment 23 Shirly Radco 2015-06-08 07:51:38 UTC

(In reply to Chen from comment #21)
> Hi Shirly,
> 
> Thank you for your reply.
> 
> I tried the following steps in my test environment to workaround this issue
> and indeed the NPE has gone but I still can not get those data ( still shows
> as NULL ). ( If anything wrong please kindly let me know )
> 
> Here is what I have done:
> 
> 1. Backed up /usr/share/pgsql/timezonesets/Default and changed CST as the
> following:
> 
> CST    28800
> 
> ( China is in GMT+8:00 so 3600*8=28800 
> # date
> Mon Jun  8 10:51:26 CST 2015
> )
> 

In order to deal with this issue at the moment you will need to switch the server
where the database is installed to UTC timezone. Not CST.
"Mon Jun  8 10:51:26 CST 2015" is equal to "Mon Jun  8 3:51:26 UTC 2015"

You can also set once the lastHourAggr to beginning of the month (01-06-2015).
so data that is already in the samples table will be aggregated after an hour.
But this is not mandatory and if you wait until the time that is already set there the data will be aggregated as well.

> 2. Restarted a set of services in the following sequence:
> 
> # service ovirt-engine stop
> # service ovirt-engine-dwhd stop
> # service postgresql restart
> # service ovirt-engine start
> # service ovirt-engine-dwhd start
> 
> 3. The call trace of java NPE has gone but I still can not get the data.
> 
> Are these two bugs the same cause ? The
> v3_5_statistics_hosts_resources_usage_* tables are still empty. 
> 
> >If you take a look at history_configuration table in the history database you will see that the lastHourAggr is incorrect.
> 
> Here is what lastHourAggr:
> 
> # SELECT * from history_configuration where var_name='lastHourAggr' ;
>    var_name   | var_value |      var_datetime      
> --------------+-----------+------------------------
>  lastHourAggr |           | 2015-06-08 23:00:00+08
> (1 row)
> 
> Would you please teach me what a correct lasHourAggr would look like ? 
> 

After hourly aggregation the lastHourAggr should be set to the hour before last.
It means samples data was aggregated until that hour.


> >It is fixed for master and I'm currently working on fixing it for 3.5.
> 
> If any proposal patch is welcome as we should come over PoC first.
> 
> Thank you !
> 
> Best Regards,
> Chen

Comment 24 Chen 2015-06-08 09:19:59 UTC

Hi Shirly,

I followed your advice.

1. Switch the database box timezone to UTC.

# ln -sf /usr/share/zoneinfo/UTC /etc/localtime

2. I rebooted the database box.

3. Change the lastHourAggr to the beginning of this month.

# UPDATE history_configuration SET var_datetime = '2015-06-01 20:00:00+00' where var_name='lastHourAggr';

After that, I found that lastHourAggr is changing back quickly by 7 hours.

# SELECT * from history_configuration where var_name='lastHourAggr';
   var_name   | var_value |      var_datetime      
--------------+-----------+------------------------
 lastHourAggr |           | 2015-04-09 18:00:00+00
(1 row)

# SELECT * from history_configuration where var_name='lastHourAggr';
   var_name   | var_value |      var_datetime      
--------------+-----------+------------------------
 lastHourAggr |           | 2015-04-09 11:00:00+00
(1 row)

# SELECT * from history_configuration where var_name='lastHourAggr';
   var_name   | var_value |      var_datetime      
--------------+-----------+------------------------
 lastHourAggr |           | 2015-04-09 04:00:00+00
(1 row)

Except this, nothing has changed. The aggregation still didn't start and daily and hourly tables are still empty.

In order to save time and make things more clear, you are welcome to login to my test box to have a look.

10.66.219.175 all the credentials are redhat.

>But this is not mandatory and if you wait until the time that is already set there the data will be aggregated as well.

BTW, what is the "time that is already set there" ? Can I check that time in the database ?

Best Regards,
Chen

Comment 25 Shirly Radco 2015-06-08 11:09:44 UTC

(In reply to Chen from comment #24)
> Hi Shirly,
> 
> I followed your advice.
> 
> 1. Switch the database box timezone to UTC.
> 
> # ln -sf /usr/share/zoneinfo/UTC /etc/localtime
> 
> 2. I rebooted the database box.
> 
> 3. Change the lastHourAggr to the beginning of this month.
> 
> # UPDATE history_configuration SET var_datetime = '2015-06-01 20:00:00+00'
> where var_name='lastHourAggr';
> 
> After that, I found that lastHourAggr is changing back quickly by 7 hours.
> 
> # SELECT * from history_configuration where var_name='lastHourAggr';
>    var_name   | var_value |      var_datetime      
> --------------+-----------+------------------------
>  lastHourAggr |           | 2015-04-09 18:00:00+00
> (1 row)
> 
> # SELECT * from history_configuration where var_name='lastHourAggr';
>    var_name   | var_value |      var_datetime      
> --------------+-----------+------------------------
>  lastHourAggr |           | 2015-04-09 11:00:00+00
> (1 row)
> 
> # SELECT * from history_configuration where var_name='lastHourAggr';
>    var_name   | var_value |      var_datetime      
> --------------+-----------+------------------------
>  lastHourAggr |           | 2015-04-09 04:00:00+00
> (1 row)
> 
> Except this, nothing has changed. The aggregation still didn't start and
> daily and hourly tables are still empty.
> 
> In order to save time and make things more clear, you are welcome to login
> to my test box to have a look.
> 
> 10.66.219.175 all the credentials are redhat.
> 
> >But this is not mandatory and if you wait until the time that is already set there the data will be aggregated as well.
> 
> BTW, what is the "time that is already set there" ? Can I check that time in
> the database ?
> 
> Best Regards,
> Chen

What is this environment? I see that when hourly aggregation runs then lasHourAgg is update backwards instead of forwords.
That is very strange. Did you try and to change anything else in this environment?

Comment 26 Chen 2015-06-09 02:19:31 UTC

All I did was in comment#24.

This morning, while trial and error (changed many times lastHourAggr and rebooted), finally lastHourAggr stopped changing and there comes one line in hourly table.. ( But I'm not sure which one exactly solves this )

# SELECT count(*) from v3_5_statistics_hosts_resources_usage_hourly ;
 count 
-------
     1
(1 row)

And the reports can show data.

So we are sure about two things here:

1. Reports did read from hourly table to fill in the web page.

2. Wrong lastHourAggr is stopping the aggregation, which leaves hourly and daily table empty.

Currently I'm reinstalling an environment which is using UTC timezone, hopefully this solves it.

Any other good advice Shirly ?

Best Regards,
Chen

Comment 27 Chen 2015-06-09 07:11:29 UTC

I created a new RHEV 3.5 ( based on an existent template ).

1. Change TZ to UTC

# ln -sf /usr/share/zoneinfo/UTC /etc/localtime

2. Reboot

3. Check lastHourAggr.

lastHourAggr started to move backwards by 7 hours....

Wondering where could change lastHourAggr backwards...

Comment 28 Chen 2015-06-09 10:21:26 UTC

Hi Shirly,

I've tried many times (switched the TZ to UTC) and the results are almost the same:

lastHourAggr begines to move back by 7 hours quickly several minutes later.

1. When is the aggregation triggered? On every o'clock ?

( I think the logic should be: 

a. On every o'clock, aggregation is triggered and dwh aggregates the last hour's samples to hourly table by comparing history_datetime and lastHourAggr.
b. After step a, Update lastHourAggr + 1

Would you please confirm that ?

)

2. Is there any method to trace the aggregation such as enable debug log ? I'm pleased to enable debug log to see when aggregation is executed and some details.

3. If lastHourAggr keeps moving backwards, newer data would not be aggregated...

Best Regards,
Chen

Comment 29 Chen 2015-06-10 11:03:41 UTC

Hi Shirly,

The newest update is that, I re-installed the whole box using Etc/UTC TZ and by now things all look good.

Best Regards,
Chen

Comment 30 Shirly Radco 2015-06-10 11:22:02 UTC

That is good news indeed.
Please close this bug if it is fixed.
We will release a fix for z-stream for this issue.

Comment 31 Lukas Svaty 2015-08-20 06:51:17 UTC

This issue was fixed within another bug BZ#1195395, I suggest closing this as duplicate as no patches were required in this bug. What do you think?

Comment 32 Yaniv Lavi 2015-08-20 07:14:24 UTC

Done.

*** This bug has been marked as a duplicate of bug 1195395 ***

Note You need to log in before you can comment on or make changes to this bug.