Bug 1269045
| Summary: | Gutterball database using up space and filling up pgsql filesystem. | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Ribu Tho <rabraham> |
| Component: | Gutterball | Assignee: | Michael Stead <mstead> |
| Status: | CLOSED ERRATA | QA Contact: | Tazim Kolhar <tkolhar> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.1.1 | CC: | bbuckingham, cwelton, mmccune, mnapolis, mpicoto, mstead, nstiasni, sthirugn |
| Target Milestone: | Unspecified | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-11-19 15:57:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Could you please check to see if it is the gb_event table that is taking up the majority of the disk space? A dump of the table sizes would be very useful. Thanks. After some investigation locally, it appears that the gb_event table is the culprit and gutterball will require a bug fix to properly address this issue.
As a *temporary* workaround, it is safe to delete all rows in the gb_event table.
delete from gb_event;
This will not affect any of the reporting data that gutterball has collected, and will just purge the processed event history.
While this is not ideal, it should free up most of the disk space being consumed by Gutterball.
Work will be planned to address this issue as soon as possible.
@Ribu Tho, please let us know if the workaround addresses your issue.
Michael,
The current db stats is as follows and the customer has just updated the case indicating that gb_event is using up only 91MB and does removing this help in purging the other tables especially pg_largeobject
***********************************************************************
gutterball=# SELECT
gutterball-# relname AS objectname,
gutterball-# relkind AS objecttype,
gutterball-# reltuples AS "#entries", pg_size_pretty(relpages::bigint*8*1024) AS size
gutterball-# FROM pg_class
gutterball-# WHERE relpages >= 8
gutterball-# ORDER BY relpages DESC;
objectname | objecttype | #entries | size
----------------------------------+------------+-------------+---------
pg_largeobject | r | 6.24575e+06 | 6305 MB
pg_largeobject_loid_pn_index | i | 6.24575e+06 | 137 MB
gb_event | r | 209801 | 91 MB
***********************************************************************
Thanks
In the original workaround, I missed the fact that large objects are not automatically removed in postgres when the parent reference is deleted.
Only gutterball's event data is being stored in pg_largeobject, so deleting all rows in this table as well as the gb_event table will be Ok.
Updated workaround:
delete from gb_event;
delete from pg_largeobject;
This is resolved in the latest build of candlepin we are including in 6.1.4 Need some test steps to reliably populate the gutterball data. Corey,
If you need further clarification on the info below, let me know. Hopefully this is what you are looking for.
Key changes included in this fix:
1) Events that are of no interest to gutterball will now be ignored, and not stored.
2) An event cleaner was added that will remove events that are 24h old, every 24h (configurable)
3) Events that are processed by gutterball no longer store the event JSON in the table unless they fail to be processed.
4) No longer store event JSON in LOBs (pg_largeobject).
After applying the update, it is a good idea to delete all rows from pg_largeobject and run a VACUUM FULL.
delete from pg_largeobject;
VACUUM FULL;
Here are some ways you can test these changes:
1) Register a system and repeatedly attach a bunch of subscriptions and remove them (changes the consumer status which triggers events)
After doing this a few times, check the gutterball's gb_event table to make sure that the events are in the PROCESSED state and that the newentity/oldentity column values are empty. These are the event JSON columns.
select * from gb_event;
Try and get the event row count > 30
select count(*) from gb_event;
Also check that pg_largeobject size remains empty (or constant if the table was not cleared initially).
select count(*) from pg_largeobject;
2) Creating a new org will produce an event that gutterball will now ignore.
This is the easiest way to produce one of many events that gutterball will now ignore. Before creating the new org, check the total count of events:
select count(*) from gb_event;
Create the new org, and event count should not change as the event emitted by candlepin should be ignored by gutterball.
If you want to test this further, you can enable debug logging in gutterball and you'll see a message saying that the event was skipped (/var/log/gutterball/gutterball.log).
# Add to /etc/gutterball/gutterball.conf and restart tomcat to enable debug logging
log4j.logger.org.candlepin.gutterball=DEBUG
3) Check that the new event cleaner is functioning properly.
First check the disk size of gutterball's gb_event table (size will vary based on the number of cycles of (1) done above):
SELECT relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') and relname LIKE 'gb_%' ORDER BY pg_relation_size(C.oid) DESC;
The event cleaner is by default configured to run every 24h and to clean up events that are 24h old. Make some quick changes to the /etc/gutterball/gutterball.conf to set the cleaner intervals to be shorter. This requires a tomcat restart.
# Run the cleaner every minute. Only events older than 2 minutes will be deleted.
# MAKE SURE TO RESTART TOMCAT AFTER MAKING THESE CHANGES
gutterball.tasks.event_cleanup.interval=1
gutterball.tasks.event_cleanup.unit=minutes
gutterball.tasks.event_cleanup.max_age_in_minutes=2
You can check /var/log/gutterball/gutterball.log to see some output from the cleaner and the number of events deleted after each run.
NOTE:
=====
Due to the nature of postgres, disk space will not be immediately reclaimed. Its autovacuum process will do a periodic disk cleanup, but you can manually initiate a cleanup by running a VACUUM FULL.
Wait until a lot of the events are cleaned up by the event cleaner, run VACUUM FULL; and check the gb_event table's disk usage as done above. The size should shrink some since the events were deleted.
VERIFIED:
# rpm -qa | grep foreman
ibm-x3650-05.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3650-05.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-gce-1.7.2.45-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-debug-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-libvirt-1.7.2.45-1.el7sat.noarch
foreman-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.3-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.24-1.el7sat.noarch
puppet-foreman_scap_client-0.3.3-9.el7sat.noarch
foreman-selinux-1.7.2.14-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
foreman-vmware-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.5-1.el7sat.noarch
foreman-postgresql-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman_openscap-0.3.2.10-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.20-1.el7sat.noarch
foreman-proxy-1.7.2.6-1.el7sat.noarch
ibm-x3650-05.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-discovery-image-2.1.0-36.el7sat.noarch
foreman-ovirt-1.7.2.45-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.9-1.el7sat.noarch
foreman-compute-1.7.2.45-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.15.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
steps:
1) Creating a new org will produce an event that gutterball will now ignore.
# su - potsgres
-bash-4.2$ psql
psql (9.2.13)
Type "help" for help.
check the total count of events:
postgres-# select * from gb_event
Create the new org, and event count does not change as the event emitted by candlepin should be ignored by gutterball.
If you want to test this further, you can enable debug logging in gutterball and you'll see a message saying that the event was skipped (/var/log/gutterball/gutterball.log).
# Add to /etc/gutterball/gutterball.conf and restart tomcat to enable debug logging
log4j.logger.org.candlepin.gutterball=DEBUG
# service tomcat restart
Redirecting to /bin/systemctl restart tomcat.service
# tail -f /var/log/gutterball/gutterball.log
JMS Redelivered: false
JMS Destination: 'event'/'owner.created'; None
JMS Type: null
JMS MessageID: ID:c4145a85-9673-3caa-a23e-d859688cfb22
JMS Content-Type: text/plain
AMQ message number: 4
Properties:
qpid.subject = owner.created
2015-11-09 03:11:30,032 [Dispatcher-0-Conn-1] DEBUG org.candlepin.gutterball.receiver.EventMessageListener - Unsupported event received by gutterball. Skipping message: ID:c4145a85-9673-3caa-a23e-d859688cfb22
2) Check that the new event cleaner is functioning properly.
First check the disk size of gutterball's gb_event table (size will vary based on the number of cycles of (1) done above):
SELECT relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') and relname LIKE 'gb_%' ORDER BY pg_relation_size(C.oid) DESC
The event cleaner is by default configured to run every 24h and to clean up events that are 24h old. Make some quick changes to the /etc/gutterball/gutterball.conf to set the cleaner intervals to be shorter. This requires a tomcat restart.
# Run the cleaner every minute. Only events older than 2 minutes will be deleted.
# MAKE SURE TO RESTART TOMCAT AFTER MAKING THESE CHANGES
gutterball.tasks.event_cleanup.interval=1
gutterball.tasks.event_cleanup.unit=minutes
gutterball.tasks.event_cleanup.max_age_in_minutes=2
You can check /var/log/gutterball/gutterball.log to see some output from the cleaner and the number of events deleted after each run.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2474 |
Description of problem: The pgsql partition fills up disk space . Currently allocated 15G and disk has run out of space. /dev/mapper/vg_volgrp2-lv_mongodb 100656560 22272812 73263956 24% /var/lib/mongodb /dev/mapper/vg_volgrp3-lv_pgsql 14983560 14199244 16524 100% /var/lib/pgsql /dev/mapper/vg_volgrp4-lv_pulp 503829736 133221412 345008532 28% /var/lib/pulp The gutterball database seems to be using close to 7G of space . -bash-4.1$ psql psql (8.4.20) Type "help" for help. postgres=# \l+ List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges | Size | Tablespace | Description ------------+----------+----------+-------------+-------------+-------------------------+---------+------------+--------------------------- gutterball | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =T/postgres | 7283 MB | pg_default | foreman | foreman | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =T/foreman | 4055 MB | pg_default | : foreman=CTc/foreman ================================================================================ Currently the satellite server is down and not able access the web GUI . Expected results: * Resolution for the issue being faced with growing database sizes.