Bug 1393270

Summary:	Database table, vm_dynamic, grows without limit until the disk fills up.
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Greg Scott <gscott>
Component:	ovirt-engine	Assignee:	Martin Perina <mperina>
Status:	CLOSED ERRATA	QA Contact:	Lucie Leistnerova <lleistne>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.5.7	CC:	bcholler, bgraveno, emesika, gscott, gveitmic, jvmarinelli, lleistne, lsurette, mgoldboi, michal.skrivanek, mkalinin, mperina, mtessun, oourfali, pstehlik, rbalakri, rgolan, Rhev-m-bugs, rmcswain, sradco, srevivo, ykaul
Target Milestone:	ovirt-4.1.1
Target Release:	---
Hardware:	x86_64
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	After updating autovacuuming options of PostgreSQL, tables should be vacuumed automatically and properly, so no table should grow without any limits.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-04-25 00:43:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Infra	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1388433
Bug Blocks:

Description Greg Scott 2016-11-09 08:53:49 UTC

Description of problem:

In a large VDI deployment, the database table, vm_dynamic, grows without limit and eventually fills up the disk. The contents of the table and number of rows always seems reasonable. But the size of the table continues to grow without limit.

Version-Release number of selected component (if applicable):
3.5.8. Unknown if the problem is also with newer RHV versions.

How reproducible:
Only in sites with around 1000 or more stateless VDI VMs.

Steps to Reproduce:
1. Run normal day to day production operations.
2. Keep an eye on the size of the vm_dynamic database table.
3.

Actual results:
The vm_dynamic database table grows until the disk fills up, or until we do a manual vacuum, or we do a backup, destroy the database, recreate it, and restore from backup. When we go back into production, that table grows again without limit. But the growth is all garbage; usable data always seems reasonable.

Expected results:

The table should keep itself clean.

Additional info:

See support case number 01654361.

We first noticed back in June that the database table, vm_dymamic, seemed to be growing without limit. The number and size of each record in the table seemed reasonable, so it was somehow accumulating garbage.

By now, we've worked around the problem several times by backing up the engine database, dropping it from Postgres, and then restoring it. And we've tried a few Postgres vacuum operations.

Last week, we also tried this in an effort to fix the problem once and for all:

cd /var/lib/pgsql/data
cp postgressql.conf postgressql.conf-20161106 (today's date)
vi (or your favorite editor) postgressql.conf
Change:
#checkpoint_segments = 3
# in logfile segments, min 1, 16MB each

checkpoint_segments = 90
# in logfile segments, min 1, 16MB each

After deleting and recovering the database again, the vm_dynamic table continues to grow without limit.

Before the change, we found gazillions of log entries like this:

LOG: checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".

After the config change, the above messages are gone, but now we see:

INFO: "vm_dynamic": found 14282 removable, 2054 nonremovable row versions in 1400 out of 10643 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 46605 unused item pointers. 0 pages are entirely empty.
CPU 0.05s/0.02u sec elapsed 0.15 sec.

What are we missing? And what can we do with RHV to automate whatever fixes we need at this scale? We have lots of logs and data in case number 01654361.

Comment 1 Oved Ourfali 2016-11-09 09:24:04 UTC

Michal - isn't this table limited to the number of VMs?
What makes it grow and grow?

Comment 4 Martin Perina 2016-11-12 20:13:10 UTC

Eli, could you please investigate?

Comment 5 Eli Mesika 2016-11-13 08:37:48 UTC

Can you please try to first VACUUM FULL and then Cluster the table (you need free space available for that)
See
https://www.postgresql.org/docs/9.1/static/sql-cluster.html

Comment 6 Greg Scott 2016-11-13 09:12:07 UTC

The customer has not done a VACUUM FULL to my knowledge, but has backed up the entire database, dropped the old database, created a new database, and restored it, which produces the same result.  He's done this numerous times for both sites, most recently, yesterday at site 1.

I looked at the link above - what index should he cluster it around?  Can he do this live? I may still be able to squeeze that in today during the scheduled weekend change control if I can get concise how-to instructions to paste into the support case.

- Greg

Comment 7 Greg Scott 2016-11-13 09:29:01 UTC

I just tried CLUSTER vm_dynamic on my own system - not the customer one.  There's no existing index on vm_dynamic to cluster around.  I am running 4.0.

[root@rhevm2017 ~]# su - postgres
Last login: Fri Oct 21 21:35:58 CDT 2016 on pts/0
-bash-4.2$
-bash-4.2$ psql engine
psql (9.2.15)
Type "help" for help.

engine=# CLUSTER vm_dynamic;
ERROR:  there is no previously clustered index for table "vm_dynamic"
engine=# \q
-bash-4.2$

I still have my 3.6 RHEVM system - let's see if 3.6 had an existing index. Nope - same result.  So 3.5 is probably the same.

[root@rhevm2014 ~]#
[root@rhevm2014 ~]# su - postgres
-bash-4.1$ psql engine
psql (8.4.20)
Type "help" for help.

engine=# CLUSTER vm_dynamic;
ERROR:  there is no previously clustered index for table "vm_dynamic"
engine=# \q
-bash-4.1$ logout
[root@rhevm2014 ~]#

Comment 8 Eli Mesika 2016-11-13 10:16:25 UTC

try :

CLUSTER vm_dynamic using pk_vm_dynamic;

Comment 9 Eli Mesika 2016-11-13 10:24:26 UTC

I run also 

CLUSTER vm_dynamic;

and it runs OK as well 
Is it possible that you lost your indexes ?

please do from psql prompt  "\d vm_dynamic" and attach the complete output of it

Comment 10 Greg Scott 2016-11-14 20:05:02 UTC

Question - when John does VACUUM FULL in the lab, it finishes subsecond.  But the last time we tried a VACUUM ANALYZE in production, it took a long time to finish.  I imagine VACUUM FULL also will not set any speed records. What happens to other VM users while VACUUM FULL vm_dynamic is running?

Does the database lock the vm_dynamic table while that whole vacuum operation is going on, or can people get start and stop VMs s normal?

Same question for CLUSTER - while it's building its index, can users still start/stop VMs as normal?


And here is the \d output from my system here at my place.  John is working on the customer production ones.

[root@rhevm2017 ~]# su - postgres
Last login: Sun Nov 13 03:17:29 CST 2016 on pts/0
-bash-4.2$ psql engine
psql (9.2.15)
Type "help" for help.

engine=# \d vm_dynamic
                                    Table "public.vm_dynamic"
         Column         |           Type           |                  Modifiers 
                 
------------------------+--------------------------+----------------------------
-----------------
 vm_guid                | uuid                     | not null
 status                 | integer                  | not null
 vm_ip                  | text                     | 
 vm_host                | character varying(255)   | 
 vm_pid                 | integer                  | 
 last_start_time        | timestamp with time zone | 
 guest_cur_user_name    | character varying(255)   | 
 guest_os               | character varying(255)   | 
 run_on_vds             | uuid                     | 
 migrating_to_vds       | uuid                     | 
 app_list               | text                     | 
 acpi_enable            | boolean                  | 
 session                | integer                  | 
 kvm_enable             | boolean                  | 
 utc_diff               | integer                  | 
 last_vds_run_on        | uuid                     | 
 client_ip              | character varying(255)   | 
 guest_requested_memory | integer                  | 
 boot_sequence          | integer                  | 
 exit_status            | integer                  | not null default 0
 pause_status           | integer                  | not null default 0
 exit_message           | character varying(4000)  | 
 hash                   | character varying(30)    | 
 console_user_id        | uuid                     | 
 guest_agent_nics_hash  | integer                  | 
 console_cur_user_name  | character varying(255)   | 
 last_watchdog_event    | bigint                   | 
 last_watchdog_action   | character varying(8)     | 
 is_run_once            | boolean                  | not null default false
 vm_fqdn                | text                     | default ''::text
 cpu_name               | character varying(255)   | 
 last_stop_time         | timestamp with time zone | 
 current_cd             | character varying(4000)  | default NULL::character var
ying
 guest_cpu_count        | integer                  | 
 reason                 | text                     | 
 exit_reason            | integer                  | default (-1)
 emulated_machine       | character varying(255)   | 
 spice_port             | integer                  | 
 spice_tls_port         | integer                  | 
 spice_ip               | character varying(255)   | default NULL::character var
ying
 vnc_port               | integer                  | 
 vnc_ip                 | character varying(255)   | default NULL::character var
ying
 guest_agent_status     | integer                  | default 0
 guest_mem_free         | bigint                   | 
 guest_mem_buffered     | bigint                   | 
 guest_mem_cached       | bigint                   | 
 guest_timezone_offset  | integer                  | 
 guest_timezone_name    | character varying(255)   | 
 guestos_arch           | integer                  | not null default 0
 guestos_codename       | character varying(255)   | 
 guestos_distribution   | character varying(255)   | 
 guestos_kernel_version | character varying(255)   | 
 guestos_type           | character varying(255)   | not null default 'Other'::c
haracter varying
 guestos_version        | character varying(255)   | 
 guest_containers       | text                     | default '[]'::text
Indexes:
    "pk_vm_dynamic" PRIMARY KEY, btree (vm_guid)
    "idx_vm_dynamic_migrating_to_vds" btree (migrating_to_vds)
    "idx_vm_dynamic_run_on_vds" btree (run_on_vds)
Foreign-key constraints:
    "vds_static_vm_dynamic_m" FOREIGN KEY (migrating_to_vds) REFERENCES vds_stat
ic(vds_id)
    "vds_static_vm_dynamic_r" FOREIGN KEY (run_on_vds) REFERENCES vds_static(vds
_id)
    "vm_static_vm_dynamic" FOREIGN KEY (vm_guid) REFERENCES vm_static(vm_guid) O
N DELETE CASCADE

engine=#

Comment 11 jvmarinelli 2016-11-14 20:06:47 UTC

[root@gdcpl3911 ~]# su - postgres
su: warning: cannot change directory to /usr/local/postgres: No such file or directory
-sh-4.1$ psql engine
could not change directory to "/root"
Password:
psql (8.4.20)
Type "help" for help.

engine=# \d vm_dynamic
                              Table "public.vm_dynamic"
         Column         |           Type           |            Modifiers

------------------------+--------------------------+----------------------------
-----
 vm_guid                | uuid                     | not null
 status                 | integer                  | not null
 vm_ip                  | text                     |
 vm_host                | character varying(255)   |
 vm_pid                 | integer                  |
 last_start_time        | timestamp with time zone |
 guest_cur_user_name    | character varying(255)   |
 guest_last_login_time  | timestamp with time zone |
 guest_last_logout_time | timestamp with time zone |
 guest_os               | character varying(255)   |
 run_on_vds             | uuid                     |
 migrating_to_vds       | uuid                     |
 app_list               | text                     |
 display                | integer                  |
 acpi_enable            | boolean                  |
 session                | integer                  |
 display_ip             | character varying(255)   |
 display_type           | integer                  |
 kvm_enable             | boolean                  |
 display_secure_port    | integer                  |
 utc_diff               | integer                  |
 last_vds_run_on        | uuid                     |
 client_ip              | character varying(255)   |
 guest_requested_memory | integer                  |
 hibernation_vol_handle | character varying(255)   |
 boot_sequence          | integer                  |
 exit_status            | integer                  | not null default 0
 pause_status           | integer                  | not null default 0
 exit_message           | character varying(4000)  |
 hash                   | character varying(30)    |
 console_user_id        | uuid                     |
 guest_agent_nics_hash  | integer                  |
 console_cur_user_name  | character varying(255)   |
 last_watchdog_event    | bigint                   |
 last_watchdog_action   | character varying(8)     |
 is_run_once            | boolean                  | not null default false
 vm_fqdn                | text                     | default ''::text
 cpu_name               | character varying(255)   |
 last_stop_time         | timestamp with time zone |
 current_cd             | character varying(4000)  | default NULL::character var
ying
 guest_cpu_count        | integer                  |
 reason                 | text                     |
 exit_reason            | integer                  | default (-1)
Indexes:
    "pk_vm_dynamic" PRIMARY KEY, btree (vm_guid)
    "idx_vm_dynamic_run_on_vds" btree (run_on_vds)
Foreign-key constraints:
    "vds_static_vm_dynamic_m" FOREIGN KEY (migrating_to_vds) REFERENCES vds_stat
ic(vds_id)
    "vds_static_vm_dynamic_r" FOREIGN KEY (run_on_vds) REFERENCES vds_static(vds
_id)
    "vm_static_vm_dynamic" FOREIGN KEY (vm_guid) REFERENCES vm_static(vm_guid) O
N DELETE CASCADE

engine=#

Comment 12 Eli Mesika 2016-11-15 12:19:36 UTC

VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an exclusive lock on each table while it is being processed.

When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.

Comment 13 Greg Scott 2016-11-15 15:42:29 UTC

Thanks Eli - not to beat a dead horse to death, but confirm this reasoning for me.  Since both VACUUM and CLUSTER want an exclusive lock on the table in question, in this case, vm_dynamic, then any other operation on that table from anybody else will either stall or error.  Since these operations will take a long time on the large production systems, the customer should schedule them during a quiet period.  And it might be best to stop the RHEVM processes before running them.

Am I on track?

thanks

- Greg

Comment 14 Eli Mesika 2016-11-16 14:54:05 UTC

(In reply to Greg Scott from comment #13)
> Thanks Eli - not to beat a dead horse to death, but confirm this reasoning
> for me.  Since both VACUUM and CLUSTER want an exclusive lock on the table
> in question, in this case, vm_dynamic, then any other operation on that
> table from anybody else will either stall or error.  Since these operations
> will take a long time on the large production systems, the customer should
> schedule them during a quiet period.  And it might be best to stop the RHEVM
> processes before running them.
> 
> Am I on track?

Exactly 

> 
> thanks
> 
> - Greg

Comment 15 Lucie Leistnerova 2016-11-18 09:29:13 UTC

I tried to work on this issue.
I created 1050 VMs in two pools. I ran over 10 VMs from the pool and restared them in random. The size of table vm_dynamic was growing with every new running VM. Then I deleted the pools. The size of the table shrank but there stayed overhead of about 120kB (without size of indexes, but they had overhead also). So the table size grew with every create/delete pools.
After running 'vacuum full' size of vm_dynamic was back few kB.

Comment 16 Lucie Leistnerova 2016-11-18 09:39:57 UTC

I'm sorry, I forgot to mention that I was testing it on engine 4.0.5 with postgres 9.2.18

Comment 17 Martin Perina 2016-11-18 09:43:22 UTC

Well, now it's obvious that we need to execute vacuum full regularly for engine and DWH databases as suggested by BZ1388430

Comment 19 Greg Scott 2016-11-21 16:24:57 UTC

Lucie, how long did the vacuum full take you?  When my large customer tried a vacuum (not vacuum full, just a standard vacuum) by hand earlier, it took a couple of hours. But vm_dynamic was hundreds of GB in size by then.

- Greg

Comment 20 Martin Perina 2016-11-21 16:37:55 UTC

(In reply to Greg Scott from comment #19)
> Lucie, how long did the vacuum full take you?  When my large customer tried
> a vacuum (not vacuum full, just a standard vacuum) by hand earlier, it took
> a couple of hours. But vm_dynamic was hundreds of GB in size by then.
> 
> - Greg

Greg, we tried to execute vacuum full on our internal medium sized setup and if I remember correctly for the first time it took around 45 minutes for DWH db and around 15 minutes for engine db. Next executions were much faster, that's why we are going to suggest to execute vacuum full each day as described in BZ1388430 (of course the interval may be customized by specific needs of the customer).

Anyway in your case I'd suggest to perform backup and restore before starting regular executions of vacuum full to make db file as clean as possible (I don't know of any other customer who would complain about that huge PostgreSQL file allocations).

Comment 21 Lucie Leistnerova 2016-11-22 08:19:13 UTC

(In reply to Greg Scott from comment #19)
> Lucie, how long did the vacuum full take you?  When my large customer tried
> a vacuum (not vacuum full, just a standard vacuum) by hand earlier, it took
> a couple of hours. But vm_dynamic was hundreds of GB in size by then.
> 
> - Greg

It was few minutes, I can't run all the VMs so the table was not so big. I was exploring only the grow of table size.

Comment 22 Greg Scott 2016-11-24 01:58:12 UTC

My customer is running a VACUUM FULL vm_dynamic; right now.  It started about 90 minutes ago, still running. And it just sits there like a bump on a log with no progress indicator.

- Greg

Comment 23 Greg Scott 2016-11-24 02:28:14 UTC

Statistics:

The file system with the database at this site is 251 GB.  On Nov. 5, John did a database backup, and then destroyed, recreated, and restored the database. The file system went from 82 percent used, or 194 GB before, to 9 percent after, or 21 GB used, for a savings of 173 GB.

Today is Nov. 23. The file system is 86 percent (204 GB) used, with 35GB free. So vm_dynamic piled up 183 GB of garbage in 18 days.  We just crossed 2 hours and vacuum full is still running.

The statistics above won't tell us how much longer *this* vacuum full will take, but they may be helpful for the next time.  Also, this gives us an idea for how much and how fast that vm_dynamic table grows.

- Greg

Comment 24 Greg Scott 2016-11-24 03:41:15 UTC

The vacuum full of vm_dynamic finished in 2 hours, 54 minutes. Just call it 3 hours. The file system went to 13 percent used, or 31 GB, not quite as nice as dropping and recreating the whole database.

Also, I can't attribute all of that 183 GB of garbage to vm_dynamic. Comparing Nov. 5 to today, it looks like around 10 GB of garbage is spread among other tables, which means vm_dynamic grew by around 173 GB over 18 days.

All in all, it looks like the workaround to drop and recreate the database is much less calendar time - John thinks he can do that in about 1/2 hour by now - and ends up with a cleaner database since we only vacuumed one table tonight.

- Greg

Comment 28 Roy Golan 2016-11-30 09:13:43 UTC

Greg, my assumption is that autovacuum simply falls behind and is in capable of treating the amount of garbage created. 

1. logs
 - To start investigating that lets see open the vaccum logs (do it on your lab or  the production if you can)
 put an include directive to a tweak.conf under postgres.conf and lets put all our changes there
 
 postgres.conf
   include=tweak.conf

 tweak.conf
   log_min_messages = debug1

2. Tweak the pg config to be more aggressive on vacuum
 - add this to tweak.conf:

   # set vacuum workers to the amount of busy tables
   autovacuum_max_workers = 8
   # more memory for workers, since we have 8 let raise it from the default
   maintenance_work_mem = 512MB
   # relax the threshold above the # of vms
   autovacuum_vacuum_threshold = 1200
   
   ```
   # maybe reload is enough? going with restart to be sure
   service postgresql restart
   ```

3. tweak engine
   to perform less refreshed of statistics and dynamic status writes
   ```
   engine-config -s VdsRefreshRate=6
   service ovirt-engine restart
   ```

Comment 29 Greg Scott 2016-12-12 18:21:06 UTC

Customer applied the database config tuning changes detailed in the KCS solution at https://access.redhat.com/solutions/2798571 over the weekend to one site.  So far, the database has not grown in size.  I'll find out more later this afternoon.

However, customer now reports these middle-of-the-night errors:

Time:2016-12-11 00:00:00.038

Message:ETL service aggregation to daily tables has encountered an error. Please consult the service log for more details.

Severity:ERROR

 

Time:2016-12-11 03:00:00.086

Message:ETL service has encountered an error while deleting old data. Please consult the service log for more details.

Severity:ERROR

We don't know yet if those are related to the database config changes.

- Greg

Comment 30 Marina Kalinin 2016-12-12 21:42:33 UTC

Greg, can you please share/check the service log as well? (it should be dwh log)

Comment 31 Shirly Radco 2016-12-13 12:08:59 UTC

Please open a separate bug for the dwh error and attach the dwh log and a db dump.

Comment 32 Greg Scott 2016-12-13 12:43:07 UTC

Huh? What dwh error?

Comment 33 Shirly Radco 2016-12-14 13:12:17 UTC

The error in comment 29 
Message:ETL service has encountered an error while deleting old data. Please consult the service log for more details. 

is a message coming from the ovirt-engine-dwhd service.
It complains that the delete job did not finish for some reason.

If this is not a one time error then please open a bug and attach required info.

Comment 34 Greg Scott 2016-12-14 15:00:36 UTC

Ah - thanks Shirly.  I didn't know those ETL errors were related to dwhd. So then you're right, they can't be related to the problem in this bz. And looking a little deeper, John reports these ETL errors have happened before. So that's a whole independent issue. Sorry for the distraction on this bug.

- Greg

Comment 35 Oved Ourfali 2016-12-25 09:13:49 UTC

Greg, any updates for the DB size after a few weeks?

Comment 36 Greg Scott 2016-12-26 16:21:03 UTC

As of the last time we talked last week, the DB size is holding steady after taking the steps in https://access.redhat.com/solutions/2798571. No scheduled customer call this week because of Christmas, but I'm sure we'll talk over the next few days.

- Greg

Comment 37 Martin Perina 2017-01-13 14:03:24 UTC

Changing dependency to BZ1388433, because according to results on our internal medium sized setup changes to auto vacuum configuration provided by BZ1388433 removed the need to execute full vacuum periodically. Moving bug to POST to sync with BZ1388433 status

Comment 38 Martin Perina 2017-01-14 20:43:43 UTC

Moving to 4.1.1 as dependent BZ1388433

Comment 39 Martin Perina 2017-02-13 14:42:41 UTC

Moving to MODIFIED to align status with BZ1388433

Comment 41 Gil Klein 2017-03-23 10:47:12 UTC

(In reply to Martin Perina from comment #39)
> Moving to MODIFIED to align status with BZ1388433
Martin, while BZ #1388433 is now targeted to 4.1.2, should we target this one accordingly?

Comment 42 Martin Perina 2017-03-23 11:03:49 UTC

(In reply to Gil Klein from comment #41)
> (In reply to Martin Perina from comment #39)
> > Moving to MODIFIED to align status with BZ1388433
> Martin, while BZ #1388433 is now targeted to 4.1.2, should we target this
> one accordingly?

Well, additional fixes to BZ1388433 are only "cosmetic", so even with already merged patch in 4.1.1, which changed automatic vacuum settings, you should be able to verify that no table grows out of limits on disk. But I don't have an issue with retargeting if you need to do that.

Comment 43 Lucie Leistnerova 2017-04-11 07:33:56 UTC

Tested on 2 pools with 1050 VMs - by starting/restarting VMs size of table vm_dynamic doesn't grow (or minimal). Sometimes little overhead (60-80k) stays after deleting the pools, but I think that this is not so common operation that I can verify it.

Feel free to reopen this bug when the table on the customer production engine still grows too much.

verified in ovirt-engine-4.1.1.8-0.1.el7.noarch