1269045 – Gutterball database using up space and filling up pgsql filesystem.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1269045 - Gutterball database using up space and filling up pgsql filesystem.

Summary: Gutterball database using up space and filling up pgsql filesystem.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Gutterball
Sub Component:
Version:	6.1.1
Hardware:	x86_64
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	Unspecified
Assignee:	Michael Stead
QA Contact:	Tazim Kolhar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-06 06:54 UTC by Ribu Tho
Modified:	2019-09-12 09:02 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-19 15:57:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:2474	0	normal	SHIPPED_LIVE	Satellite 6.1.4 bug fix update	2015-11-19 20:45:48 UTC

Description Ribu Tho 2015-10-06 06:54:46 UTC

Description of problem:

The pgsql partition fills up disk space . Currently allocated 15G and disk has run out of  space. 

/dev/mapper/vg_volgrp2-lv_mongodb
                     100656560  22272812  73263956  24% /var/lib/mongodb
/dev/mapper/vg_volgrp3-lv_pgsql
                      14983560  14199244     16524 100% /var/lib/pgsql
/dev/mapper/vg_volgrp4-lv_pulp
                     503829736 133221412 345008532  28% /var/lib/pulp

The gutterball database seems to be using close to 7G of space . 

-bash-4.1$ psql
psql (8.4.20)
Type "help" for help.

postgres=# \l+
                                                             List of databases
    Name    |  Owner   | Encoding |  Collation  |    Ctype    |    Access privileges    |  Size   | Tablespace |        Description
------------+----------+----------+-------------+-------------+-------------------------+---------+------------+---------------------------
  gutterball | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =T/postgres             | 7283 MB | pg_default |

 foreman    | foreman  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =T/foreman              | 4055 MB | pg_default |
                                                              : foreman=CTc/foreman

================================================================================

Currently the satellite server is down and not able access the web GUI . 
 

Expected results:  

 * Resolution for the issue being faced with growing database sizes.

Comment 1 Michael Stead 2015-10-06 12:40:52 UTC

Could you please check to see if it is the gb_event table that is taking up the majority of the disk space? A dump of the table sizes would be very useful.

Thanks.

Comment 2 Michael Stead 2015-10-06 16:57:03 UTC

After some investigation locally, it appears that the gb_event table is the culprit and gutterball will require a bug fix to properly address this issue.

As a *temporary* workaround, it is safe to delete all rows in the gb_event table.

    delete from gb_event;

This will not affect any of the reporting data that gutterball has collected, and will just purge the processed event history.

While this is not ideal, it should free up most of the disk space being consumed by Gutterball.

Work will be planned to address this issue as soon as possible.


@Ribu Tho, please let us know if the workaround addresses your issue.

Comment 4 Ribu Tho 2015-10-07 00:41:24 UTC

Michael, 

The current db stats is as follows and the customer has just updated the case indicating that gb_event is using up only 91MB and does removing this help in purging the other tables especially pg_largeobject 


***********************************************************************
gutterball=# SELECT
gutterball-#    relname AS objectname,
gutterball-#    relkind AS objecttype,
gutterball-#    reltuples AS "#entries", pg_size_pretty(relpages::bigint*8*1024) AS size
gutterball-#    FROM pg_class
gutterball-#    WHERE relpages >= 8
gutterball-#    ORDER BY relpages DESC;
            objectname            | objecttype |  #entries   |  size
----------------------------------+------------+-------------+---------
 pg_largeobject                   | r          | 6.24575e+06 | 6305 MB
 pg_largeobject_loid_pn_index     | i          | 6.24575e+06 | 137 MB
 gb_event                         | r          |      209801 | 91 MB
***********************************************************************

Thanks

Comment 11 Michael Stead 2015-10-07 14:03:15 UTC

In the original workaround, I missed the fact that large objects are not automatically removed in postgres when the parent reference is deleted.

Only gutterball's event data is being stored in pg_largeobject, so deleting all rows in this table as well as the gb_event table will be Ok.

Updated workaround:

    delete from gb_event;
    delete from pg_largeobject;

Comment 14 Mike McCune 2015-11-03 17:04:09 UTC

This is resolved in the latest build of candlepin we are including in 6.1.4

Comment 15 Corey Welton 2015-11-03 21:16:40 UTC

Need some test steps to reliably populate the gutterball data.

Comment 17 Michael Stead 2015-11-06 19:46:43 UTC

Corey,

If you need further clarification on the info below, let me know. Hopefully this is what you are looking for.

Key changes included in this fix:
1) Events that are of no interest to gutterball will now be ignored, and not stored.
2) An event cleaner was added that will remove events that are 24h old, every 24h (configurable)
3) Events that are processed by gutterball no longer store the event JSON in the table unless they fail to be processed.
4) No longer store event JSON in LOBs (pg_largeobject).


After applying the update, it is a good idea to delete all rows from pg_largeobject and run a VACUUM FULL.

delete from pg_largeobject;

VACUUM FULL;


Here are some ways you can test these changes:

1) Register a system and repeatedly attach a bunch of subscriptions and remove them (changes the consumer status which triggers events)

After doing this a few times, check the gutterball's gb_event table to make sure that the events are in the PROCESSED state and that the newentity/oldentity column values are empty. These are the event JSON columns.

select * from gb_event;

Try and get the event row count > 30

select count(*) from gb_event;

Also check that pg_largeobject size remains empty (or constant if the table was not cleared initially).

select count(*) from pg_largeobject;



2) Creating a new org will produce an event that gutterball will now ignore.

This is the easiest way to produce one of many events that gutterball will now ignore. Before creating the new org, check the total count of events:

select count(*) from gb_event;

Create the new org, and event count should not change as the event emitted by candlepin should be ignored by gutterball.

If you want to test this further, you can enable debug logging in gutterball and you'll see a message saying that the event was skipped (/var/log/gutterball/gutterball.log).

# Add to /etc/gutterball/gutterball.conf and restart tomcat to enable debug logging
log4j.logger.org.candlepin.gutterball=DEBUG



3) Check that the new event cleaner is functioning properly.

First check the disk size of gutterball's gb_event table (size will vary based on the number of cycles of (1) done above):

SELECT relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') and relname LIKE 'gb_%' ORDER BY pg_relation_size(C.oid) DESC;

The event cleaner is by default configured to run every 24h and to clean up events that are 24h old. Make some quick changes to the /etc/gutterball/gutterball.conf to set the cleaner intervals to be shorter. This requires a tomcat restart.

# Run the cleaner every minute. Only events older than 2 minutes will be deleted.
# MAKE SURE TO RESTART TOMCAT AFTER MAKING THESE CHANGES
gutterball.tasks.event_cleanup.interval=1
gutterball.tasks.event_cleanup.unit=minutes
gutterball.tasks.event_cleanup.max_age_in_minutes=2

You can check /var/log/gutterball/gutterball.log to see some output from the cleaner and the number of events deleted after each run.

NOTE:
=====
Due to the nature of postgres, disk space will not be immediately reclaimed. Its autovacuum process will do a periodic disk cleanup, but you can manually initiate a cleanup by running a VACUUM FULL.

Wait until a lot of the events are cleaned up by the event cleaner, run VACUUM FULL; and check the gb_event table's disk usage as done above. The size should shrink some since the events were deleted.

Comment 19 Tazim Kolhar 2015-11-09 08:21:12 UTC

VERIFIED:
# rpm -qa | grep foreman
ibm-x3650-05.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3650-05.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-gce-1.7.2.45-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-debug-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-libvirt-1.7.2.45-1.el7sat.noarch
foreman-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.3-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.24-1.el7sat.noarch
puppet-foreman_scap_client-0.3.3-9.el7sat.noarch
foreman-selinux-1.7.2.14-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
foreman-vmware-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.5-1.el7sat.noarch
foreman-postgresql-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman_openscap-0.3.2.10-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.20-1.el7sat.noarch
foreman-proxy-1.7.2.6-1.el7sat.noarch
ibm-x3650-05.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-discovery-image-2.1.0-36.el7sat.noarch
foreman-ovirt-1.7.2.45-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.9-1.el7sat.noarch
foreman-compute-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.15.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch

steps:
1) Creating a new org will produce an event that gutterball will now ignore.

# su - potsgres
-bash-4.2$ psql
psql (9.2.13)
Type "help" for help.

check the total count of events:
postgres-# select * from gb_event


Create the new org, and event count does not change as the event emitted by candlepin should be ignored by gutterball.

If you want to test this further, you can enable debug logging in gutterball and you'll see a message saying that the event was skipped (/var/log/gutterball/gutterball.log).

# Add to /etc/gutterball/gutterball.conf and restart tomcat to enable debug logging
log4j.logger.org.candlepin.gutterball=DEBUG

# service tomcat restart
Redirecting to /bin/systemctl restart  tomcat.service

# tail -f /var/log/gutterball/gutterball.log
JMS Redelivered: false
JMS Destination: 'event'/'owner.created'; None
JMS Type: null
JMS MessageID: ID:c4145a85-9673-3caa-a23e-d859688cfb22
JMS Content-Type: text/plain
AMQ message number: 4
Properties:
	qpid.subject = owner.created

2015-11-09 03:11:30,032 [Dispatcher-0-Conn-1] DEBUG org.candlepin.gutterball.receiver.EventMessageListener - Unsupported event received by gutterball. Skipping message: ID:c4145a85-9673-3caa-a23e-d859688cfb22


2) Check that the new event cleaner is functioning properly.

First check the disk size of gutterball's gb_event table (size will vary based on the number of cycles of (1) done above):

SELECT relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') and relname LIKE 'gb_%' ORDER BY pg_relation_size(C.oid) DESC

The event cleaner is by default configured to run every 24h and to clean up events that are 24h old. Make some quick changes to the /etc/gutterball/gutterball.conf to set the cleaner intervals to be shorter. This requires a tomcat restart.

# Run the cleaner every minute. Only events older than 2 minutes will be deleted.
# MAKE SURE TO RESTART TOMCAT AFTER MAKING THESE CHANGES
gutterball.tasks.event_cleanup.interval=1
gutterball.tasks.event_cleanup.unit=minutes
gutterball.tasks.event_cleanup.max_age_in_minutes=2

You can check /var/log/gutterball/gutterball.log to see some output from the cleaner and the number of events deleted after each run.

Comment 22 errata-xmlrpc 2015-11-19 15:57:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2474

Note You need to log in before you can comment on or make changes to this bug.