Bug 1673808

Summary: ovirt-dwh prevents autovacuum from data garbage collection.
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: ovirt-engine-dwhAssignee: Shirly Radco <sradco>
Status: CLOSED ERRATA QA Contact: David Vaanunu <dvaanunu>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.8-2CC: dagur, dfediuck, kkawana, lleistne, lsurette, mlehrer, mtessun, obockows, rdlugyhe, rgolan, Rhev-m-bugs, sradco, srevivo, tburke
Target Milestone: ovirt-4.3.5Keywords: Performance, ZStream
Target Release: 4.3.5Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-dwh-4.3.5-1.el7ev Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-12 11:55:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1723168, 1723791    

Description Roman Hodain 2019-02-08 08:11:26 UTC
Description of problem:
The vacuum process is not able to remove the dead rows as they are held by an unfinished transaction from ovirt-dwh.

    vacuumdb -d ovirt_engine_history -t vm_disk_samples_history -v
    vacuumdb: vacuuming database "ovirt_engine_history"
    INFO:  vacuuming "public.vm_disk_samples_history"
    INFO:  index "idx_vm_disk_history_datetime_samples" now contains 63428 row versions in 176 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  index "idx_vm_disk_configuration_version_samples" now contains 63428 row versions in 181 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  index "vm_disk_samples_history_vm_disk_id_idx" now contains 63428 row versions in 253 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  index "vm_disk_samples_history_pkey" now contains 63428 row versions in 176 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  "vm_disk_samples_history": found 0 removable, 47958 nonremovable row versions in 738 out of 976 pages
==> DETAIL:  45888 dead row versions cannot be removed yet.
    There were 0 unused item pointers.
    Skipped 0 pages due to buffer pins.
    0 pages are entirely empty.
    CPU 0.00s/0.01u sec elapsed 0.01 sec.

Here is the idle in transaction connection:
    select pid, query from pg_stat_activity where state='idle in transaction' and datname='ovirt_engine_history'

      pid  |                                       query                                        
    -------+------------------------------------------------------------------------------------
     17061 | SELECT 'continueAgg', '1'                                                         +
           | FROM history_configuration                                                        +
           | WHERE var_name = 'lastHourAggr'                                                   +
           | AND var_datetime < '2019-02-08 06:00:00.000000+0000'                              +
           | 
     17062 | SELECT DISTINCT 'continueAgg', '1'                                                +
           | FROM history_configuration                                                        +
           | WHERE var_name = 'lastDayAggr'                                                    +
           |         AND var_datetime < '2019-02-07 00:00:00.000000+0000'                      +
           |         AND                                                                       +
           |                 CASE                                                              +
           |                         WHEN                                                      +
           |                                 var_datetime = '2019-02-07 00:00:00.000000+0000'  +
           |                                 AND 'true' =                                      +
           |                                         (                                         +
           |                                                 SELECT var_value                  +
           |                                                 FROM history_configuration        +
           |                                                 WHERE var_name = 'HourlyAggFailed'+
           |                                         )                                         +
           |                                 THEN FALSE                                        +
           |                         ELSE TRUE                                                 +
           |                 END                                                               +
           | 
    (2 rows)


Version-Release number of selected component (if applicable):
    ovirt-engine-dwh-4.2.4.3-1.el7ev.noarch

How reproducible:
    Check for the idle in transaction connections

Steps to Reproduce:
    select pid, query from pg_stat_activity where state='idle in transaction' and datname='ovirt_engine_history'


Actual results:
    The connection prevents the garbage collection

Expected results:
    The trnasaction is closed as soon as the aggregation is finished.

Additional info:
    After killing the process 17062, the dead rows are removed. The transaction can cause significant growth of the DB in some busy environments.

Comment 11 Doron Fediuck 2019-03-28 08:18:12 UTC
Not ready for GA, so will be handled during 4.3.z.

Comment 15 Roy Golan 2019-06-25 09:06:24 UTC
Thanks Shirly that's a very welcomed fix!

Comment 20 David Vaanunu 2019-07-16 14:58:19 UTC
Verified on:

redhat-release-server-7.7-10.el7.x86_64
rhv-release-4.3.5-6-001.noarch

ovirt-engine-dwhd.log (DEBUG)


Actions:

1) No idle transactions

ovirt_engine_history=# select pid, query from pg_stat_activity where state='idle in transaction' and datname='ovirt_engine_history';
 pid | query 
-----+-------
(0 rows)


2) Deletion

 *_samples_history - have data from 03:00AM (~38Hrs)
 (host / vm)

3) running 'vacuum' - no dead rows left


-bash-4.2$ vacuumdb -d ovirt_engine_history -t vm_disk_samples_history -v
vacuumdb: vacuuming database "ovirt_engine_history"
INFO:  vacuuming "public.vm_disk_samples_history"
INFO:  index "idx_vm_disk_history_datetime_samples" now contains 750672 row versions in 2219 pages
DETAIL:  0 index row versions were removed.
337 index pages have been deleted, 337 are currently reusable.
CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.02 s.
INFO:  index "idx_vm_disk_configuration_version_samples" now contains 750672 row versions in 3481 pages
DETAIL:  0 index row versions were removed.
198 index pages have been deleted, 198 are currently reusable.
CPU: user: 0.01 s, system: 0.02 s, elapsed: 0.03 s.
INFO:  index "vm_disk_samples_history_vm_disk_id_idx" now contains 750672 row versions in 4554 pages
DETAIL:  0 index row versions were removed.
281 index pages have been deleted, 281 are currently reusable.
CPU: user: 0.02 s, system: 0.01 s, elapsed: 0.03 s.
INFO:  index "vm_disk_samples_history_pkey" now contains 750672 row versions in 2426 pages
DETAIL:  0 index row versions were removed.
364 index pages have been deleted, 364 are currently reusable.
CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.02 s.
INFO:  "vm_disk_samples_history": found 0 removable, 176443 nonremovable row versions in 3077 out of 13669 pages
==> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 320072
There were 22916 unused item pointers.
Skipped 0 pages due to buffer pins, 1707 frozen pages.
0 pages are entirely empty.
CPU: user: 0.09 s, system: 0.11 s, elapsed: 0.22 s.



Open new bug - Closed connection issue

https://bugzilla.redhat.com/show_bug.cgi?id=1730384

Comment 22 errata-xmlrpc 2019-08-12 11:55:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2444

Comment 23 Daniel Gur 2019-08-28 13:12:48 UTC
sync2jira

Comment 24 Daniel Gur 2019-08-28 13:17:01 UTC
sync2jira