Bug 1381725

Summary: Create Child Deployment/DomainDeployment results in orphaned LOB data in the PostgreSQL pg_largeobject table
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: Content, DatabaseAssignee: Michael Burman <miburman>
Status: CLOSED ERRATA QA Contact: Hayk Hovsepyan <hhovsepy>
Severity: high Docs Contact:
Priority: medium    
Version: JON 3.3.7CC: fbrychta, hhovsepy, jmartine, spinder
Target Milestone: ER02Keywords: Triaged
Target Release: JON 3.3.8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-16 18:45:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Larry O'Leary 2016-10-04 20:14:39 UTC
Description of problem:
During the process of uploading a new deployment (content backed), the content gets duplicated in PostgreSQL's pg_largeobject table.

This results in unbound file system growth for the PostgreSQL database. There is no method provided via JBoss ON to clean this up or prevent it. The process of simply uploading a file in preparation for a deployment will continue to leak large amount of disk space.


How reproducible:
Always

Steps to Reproduce:
1.  Install, configure, and start JBoss ON 3.3.7 system.
2.  Install, configure, and start EAP 6.4 standalone server.
3.  Import newly discovered EAP server into inventory.
4.  Configure newly imported EAP server's connection settings.
5.  Create new _Deployment_ child resource for EAP server using jboss-helloworld.war.

Actual results:
The following queries against the PostgreSQL database reveals a single entry that corresponds to the deployment of the jboss-helloworld.war.

.SQL: `SELECT * FROM rhq_package_version;`
----
id	display_name	short_description	long_description	version	display_version	file_name	file_size	file_md5	file_sha256	file_creation_time	license_name	license_version	metadata	package_id	architecture_id	config_id	package_bits_id
10011	jboss-as-helloworld.war	<null>	<null>	0	<null>	jboss-as-helloworld.war	7606	<null>	d02323a97678e377e4fdd7c4d9b6d5891f1a29d3ed0df0bc8c5a548f1fdfded3	1473374418000	<null>	<null>	<null>	10001	1	<null>	10011
----


.SQL: `SELECT * FROM rhq_package_bits;`
----
id	bits
10011	19054
----

However, the pg_largeobject table includes two entries with only one being referenced:

.SQL: `SELECT * FROM pg_largeobject;`
----
loid	pageno	data
19053	0	50  4b  03  04  0a  00  00  00  00  ...
19053	1	03  33  e4  1a  70  7a  00  7c  88  ...
19053	2	66  4d  f5  a5  5a  05  b1  13  d0  ...
19053	3	41  2d  49  4e  46  2f  4d  41  4e  ...
19054	0	50  4b  03  04  0a  00  00  00  00  ...
19054	1	03  33  e4  1a  70  7a  00  7c  88  ...
19054	2	66  4d  f5  a5  5a  05  b1  13  d0  ...
19054	3	41  2d  49  4e  46  2f  4d  41  4e  ...
----

The content binary data is identical.

Expected results:
The results from rhq_package_version and rhq_package_bits are correct but the results from pg_largeobject should have been:

.SQL: `SELECT * FROM pg_largeobject;`
----
loid	pageno	data
19054	0	50  4b  03  04  0a  00  00  00  00  ...
19054	1	03  33  e4  1a  70  7a  00  7c  88  ...
19054	2	66  4d  f5  a5  5a  05  b1  13  d0  ...
19054	3	41  2d  49  4e  46  2f  4d  41  4e  ...
----

Additional info:
This issue is specific to how PostgreSQL stores blob data. Instead of storing the binary data in the rhq_package_bits table, Postgres writes the data to the pg_largeobject table and stores the data reference in the rhq_package_bits table. 

In the case of uploading a new deployment, the initial rhq_package_version and rhq_package_bit entries are created. However, once we can properly analyze the uploaded content and complete the create child wizard, the rhq_package_version table is updated. This triggers an update to rhq_package_bits and results in the pg_largeobject orphan. 

The PostgreSQL JDBC driver is not able to determine if the only reference to this data is held by the rhq_package_bits table after the UPDATE is performed.

I believe the fix here is that when the UPDATE query is executed against the rhq_package_version table, an unlink query should also be executed.

Comment 1 Michael Burman 2017-01-12 11:25:35 UTC
Part of the https://github.com/rhq-project/rhq/pull/276 updates

Comment 2 Michael Burman 2017-01-13 12:44:14 UTC
Fixed in the master (cherry-pick BZ 1381720 first):

commit d1cb33593865d3b6d8a5706ab029f72c7a317587
Author: Michael Burman <miburman>
Date:   Thu Jan 12 13:25:07 2017 +0200

    [BZ 1381725] Avoid bloating pg_largeobject with content updates

Comment 5 Josejulio Martínez 2017-01-20 04:53:56 UTC
After this: 
commit d1cb33593865d3b6d8a5706ab029f72c7a317587
Author: Michael Burman <miburman>
Date:   Thu Jan 12 13:25:07 2017 +0200

    [BZ 1381725] Avoid bloating pg_largeobject with content updates

Need to cherry-pick this:

commit 2bf0676e3dfdcd2399fd13ce1ac72bb63f2f936d
Author: Ruben Vargas <ruben.vp8510>
Date:   Sat Jan 14 05:52:43 2017 -0600

    Bug 1381725 - Create Child Deployment/DomainDeployment results in orphaned LOB data in the PostgreSQL pg_largeobject table

Comment 8 Simeon Pinder 2017-01-27 06:16:21 UTC
Moving to ON_QA as available to test with the following brew build:
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=535315

NOTE: jon-server-patch-3.3.0.GA.zip maps to jon-server-3.3.0.GA-update-08.zip whic is JON 3.3.8 ER02 build.

Comment 9 Hayk Hovsepyan 2017-01-27 17:06:28 UTC
Verified on JON 3.3.8 ER02 build.
Now rows in pg_largeobject table are not doubled.

Comment 10 errata-xmlrpc 2017-02-16 18:45:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0285.html