+++ This bug was initially created as a clone of Bug #1306602 +++ Description of problem: Removal of a resource that has content associated with it (rhq_package_bits data) leaves the actual blob data in pg_largeobject table orphaned once the rhq_package_bits row is removed. This results in unbound file system growth for the PostgreSQL database and even after removing old content, the space is not eligible for garbage collection/compaction because the actual blob data never gets unlinked. Version-Release number of selected component (if applicable): JBoss ON 3.3.7 How reproducible: Always Steps to Reproduce: 1. Install, configure, and start JBoss ON 3.3.7 system. 2. Install, configure, and start EAP 6.4 standalone server. 3. Import newly discovered EAP server into inventory. 4. Configure newly imported EAP server's connection settings. 5. Create new _Deployment_ child resource for EAP server using jboss-helloworld.war. 6. After newly deployed WAR is reflected in JBoss ON inventory, verify its content page reflects that the resource is backed by the jboss-helloworld.war content. 7. Uninventory the jboss-helloworld.war resource. Actual results: After a few minutes that jboss-helloworld.war content entry stored in rhq_package_bits is removed yet the pg_largeobject table continues to contain the binary data that represents the jboss-helloworld.war. There are no references in the JBoss ON database to this lob making it an orphan. Expected results: The pg_largeobject table should be empty. Additional info: This issue is specific to how PostgreSQL stores blob data. Instead of storing the binary data in the rhq_package_bits table, Postgres writes the data to the pg_largeobject table and stores the data reference in the rhq_package_bits table. However, the PostgreSQL JDBC driver is not able to determine if the only reference to this data is held by the rhq_package_bits table when a DELETE is performed. The expectation by the JDBC implementation is that the application calling DELETE will know whether there are other references to the pg_largeobject entry and if not, use the unlink operation to remove the old data. I believe the fix here is that when the DELETE query is executed against the rhq_package_bits table, an unlink query should also be executed. Do not confuse this issue with the issue identified in Bug 1306602. Although the issues appear to be the same, the difference is that 1306602 raised an issue in where we simply failed to remove the rhq_package_bits row. With the fix for 1306602, we are now removing the row from rhq_package_bits but the failure here it to unlink the lob id when PostgreSQL is the back-end database.
There can be data from three different tables in the pg_largeobject: rhq_raw_config rhq_package_bits rhq_drift_file The knowledge base solution should do the trick, but we shouldn't trust that tool since it might not be on the execution path (and all the permission issues considering), so the functionality should be replicated in a light way. Find all the oids from those tables, then check pg_largeobject / pg_largeobject_metadata and remove those lines, which are not in the three tables. Most likely easiest with a single SQL clause and pure JDBC. The automatic unlinking would require a new trigger to the database and a database change, so there's a small problem when it comes to z-stream release.
(In reply to Michael Burman from comment #1) > There can be data from three different tables in the pg_largeobject: > > rhq_raw_config > rhq_package_bits > rhq_drift_file > > The knowledge base solution should do the trick, but we shouldn't trust that > tool since it might not be on the execution path (and all the permission > issues considering), so the functionality should be replicated in a light > way. > > Find all the oids from those tables, then check pg_largeobject / > pg_largeobject_metadata and remove those lines, which are not in the three > tables. Most likely easiest with a single SQL clause and pure JDBC. I suppose a separate workaround for newer versions of PostgreSQL could be created. The issue is that pg_largeobject cannot be accessed by non-superuser and pg_largeobject_metadata was not added until the 9.x stream. This is why the vacuumlo option was chosen. However, the re-index will also require super-user so perhaps my argument is moot. > The automatic unlinking would require a new trigger to the database and a > database change, so there's a small problem when it comes to z-stream > release. Agreed. However, we could do the unlinking ourselves when a delete or update is called on these data types. It would seem the Hibernate has to have some callback method or other option for handling this?
I'm going to back up a little bit and suggest that we don't do the "fix" part. I've now implemented the fixing strategy in many places (making Postgres to have different delete code than Oracle), but I don't think I can cover all the bases. Problem is that we have multiple "CASCADE" deletes also, and stuff like "Resource-delete" that runs more than dozen different delete query in certain given order. Tracking and finding all the possible cascades if painful - I'm not sure if I can reliably find all of them. And rewriting all the query blocks that might touch the given tables is also very error prone. For example, the whole logic of following query: https://github.com/rhq-project/rhq/blob/master/modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/resource/ResourceManagerBean.java#L726 Must be rewritten to have separate stages so I can in the middle put some Postgres-specific code. Same sort of rewrite must be put to 6-7 different places also, with some of them needing new Exception paths that would change interfaces also (to catch errors). After doing few of these, I'm no longer recommending to go forward with that strategy. This will create a huge amount of QE testing, potential problems and I'm not sure if it'll even solve the issue 100%. Instead, as we need to create the purge job in any case, could we just run it on say daily basis? That would clean up any orphan largeobjects from the database. It's more simple to test and it will catch all the places. Obviously, the first clean up run will take a lot longer for customers with large databases already. Should we trigger it from apply-updates also?
This PR adds a new operation to the postgres-plugin that cleans orphaned items from the database: https://github.com/rhq-project/rhq/pull/276
The new operation will only work as a workaround and does not fix this issue. Can we not explicitly remove content on resource uninventory instead of relying on the CASCADE. The workaround is still valid in case we miss the pgsql delete/unlink for some lobs. But if we can at least handle explicit unlink for uninventory method it will at least address the issue identified in this BZ. It seems package bits and drift would be the biggest offenders of this.
Fixed in the master (did we have a ticket for the rhq_drift_file?): commit 012b4f48f0072a4df3995cc3279cdd0cabde6361 Author: Michael Burman <miburman> Date: Wed Jan 11 16:03:13 2017 +0200 Unlink each object before purging the rows from rhq_drift_file commit b594e547b05eaca22ac7b2f7cd9e65f07dc5eccb Author: Michael Burman <miburman> Date: Wed Jan 11 13:03:23 2017 +0200 RHQ_PACKAGE_BITS is unlinked before removing the orphaned bits commit 48e839c1b1756da39dc0a00a2b9c5ae42ad23e4c Author: Michael Burman <miburman> Date: Wed Nov 30 12:07:22 2016 +0200 [BZ 1381720] Add ability to clean orphaned large objects from the database Query pg_largeobject_metadata when used with Postgres >= 9.0
Note to this and the BZ 1381725, there are multiple paths to cause additional issues with largeobject (such as using the Java API). These fixes do not catch all the potential places.
Moving to ON_QA as available for test with build: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=534002 http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/110/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip maps to ER01 build of JON 3.3.8.
Verified on ER02 build of JON 3.3.8. No orphan content is left in pg_largeobject table after uninventory of resource.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0285.html