Bug 1386913

Summary: Newly uploaded content results in previous content version being left in database with no way to clean it up
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: ContentAssignee: Ruben Vargas Palma <rvargasp>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: high Docs Contact:
Priority: high    
Version: JON 3.3.7CC: fbrychta, loleary, miburman, rvargasp, spinder
Target Milestone: CR03Keywords: Triaged
Target Release: JON 3.3.8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-16 18:45:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Larry O'Leary 2016-10-19 19:47:36 UTC
Description of problem:
When a new version of content is uploaded, the BLOB for the previous version of the content remains in the database. There is no way to remove/delete this old content. It essentially becomes unreachable and therefore is using up large amounts of precious space.

Version-Release number of selected component (if applicable):
3.3, 3.3.7

How reproducible:
Always

Steps to Reproduce:
1. Deploy a WAR to EAP 6 using Create Child inventory function.
2. Verify new deployment is reported as available.
3. Replace the deployment with version 2.0 of the WAR from the resource's new content tab using Upload New Package.
4. Repeat the previous step with version 3.0 of the WAR.

Actual results:
Version 2 of the WAR will remain in the database (rhq_package_bits) yet is no longer deployed or available for deployment.

Expected results:
Either version 2 should no longer be in the database or there should be an option in the content page to remove version 2 from the database.

Additional info:
There may be more going wrong here then what is currently seen. It appears that perhaps the original intention of the content pages was to provide a revert capability along with the ability to manage/delete the previous versions. However, the UI does not appear to support this and using the CLI requires you to dump everything related to the content history and resource.

The severity of this issue seems high as it means that overtime, the system will reach a point that a re-install will be necessary or manual intervention with the schema -- which would not be supported and therefore a re-install would be the recommendation.

Comment 1 Michael Burman 2016-11-30 14:03:50 UTC
Isn't this the same problem as BZ 1381720 / BZ 1306602? With 1306602 we remove the rows, but 1381720 should solve the issue of unlinking the blob also?

Comment 2 Larry O'Leary 2016-12-02 15:39:31 UTC
No, BZ 1381720 does not prevent or address this.

What is happening here is that when a new package version is uploaded, the previous version remains fully intact. Not only its blob but its rhq_package_version row along with rhq_package_bits. This means that the actual blob is not orphaned and is not eligible for removal when executing the new operation being introduced in BZ 1381720.

The fact that the package version remains is by design. The intention appears that a user should be able to see all prior versions of the content and then revert to any prior version. However, the UI does not provide that capability. I think the capability remains in the legacy UI and was never moved into SmartGWT.

The result is that as a deployment evolves, the space required in the database continues to grow unbound and the user has no way of cleaning this up if they choose to remove prior versions. In other words:

- they cannot revert;
- they cannot remove prior versions;

Hope that clears it up.

Comment 4 Josejulio Martínez 2017-01-26 05:50:44 UTC
commit 702b1f634366c49c77ed6aeec27a2de73d371118
Merge: 5002add 6b2f422
Author: Josejulio Martínez <finwemartinez>
Date:   Wed Jan 25 23:48:55 2017 -0600

    Merge pull request #287 from rubenvp8510/purgebitsversions
    
    Bug 1386913 - Remove old version from package_bits when deploys a new version

commit 6b2f422f3acc3cfa0b8acb764f8206808ec44a53
Author: Ruben Vargas <ruben.vp8510>
Date:   Thu Jan 12 17:36:03 2017 -0600

    Remove old version from package_bits when deploys a new version

Comment 8 Ruben Vargas Palma 2017-01-31 03:31:48 UTC
1)It's not clear to me, Could you please provide me steps to reproduce it? Thank you.

2) I'll review, it supposed to clean the pg_largeobject table, this fix always leaves the first deployment intact (and orphaned) but it should remove other versions if they are present. Are all old versions bits are still in pg_largeobject?

Comment 9 Filip Brychta 2017-01-31 07:37:16 UTC
1) This issue was found by CLI automation which is doing following:
- deploy first version and retrieveBackingContent
- deploy second version (updateBackingContent) and retrieveBackingContent
Second step fails with exceptions in comment 7.

It might be just timing issue when updateBackingContent finishes too soon and following retrieveBackingContent fails (later invocation of retrieveBackingContent works). Anyway this automaton failure has never been there in previous JON builds. If it's expected that updateBackingContent is not blocking operation and it's necessary to wait before calling retrieveBackingContent I will just update automation to do that.


2) I noticed that it always leaves two latest versions in package_bits. Older versions are removed from package_bits but not from pg_largeobject. Is it possible that pg_largeobject is not cleaned immediately but some time later? I have not checked this. I will test it more later today and let you know.

Comment 10 Filip Brychta 2017-01-31 13:16:22 UTC
I can confirm that rhq_package_bits contains only two latest versions. Older versions are removed from rhq_package_bits but not from pg_largeobject.
I can also confirm that orphaned rows from pg_largeobject are cleaned by the new purge job introduced in BZ 1381720

Comment 11 Ruben Vargas Palma 2017-02-02 16:20:42 UTC
For the problem with pg_largeobject, I verified that the pg_largeobject are still present, I have a PR that I'm still testing, and others can review.


About the issue with retrieveBackingContent, it seems like it's a kind of race condition, the thing is.. when you run retrieveBackingContent immediately after deploys a new version, it takes the old installed version but at some point, the old version updated his bits to null (which is part of this BZ) so it can't retrieve the content.


The solution to the problem could be put a delay between the calls, I don't know if that is a satisfactory solution. WDYT? @loleary

Comment 12 Larry O'Leary 2017-02-02 18:23:16 UTC
(In reply to Ruben Vargas Palma from comment #11)
> The solution to the problem could be put a delay between the calls, I don't
> know if that is a satisfactory solution. WDYT? @loleary

This won't work as existing users will not be able to update their scripts or client applications. We need to look into why retreiveBackingContent is getting a reference to the old entry. Or perhaps figure out why updateBackingContent is returning before the proper/final id is created.

Comment 13 Ruben Vargas Palma 2017-02-07 17:39:06 UTC
commit 9fb1b28d4d55b7bc6df79aea481e7f8060ce256e
Merge: 2b98508 f1eb57f
Author: Josejulio Martínez <finwemartinez>
Date:   Tue Feb 7 11:36:32 2017 -0600

    Merge pull request #293 from rubenvp8510/fix-bits-2.0
    
    Bug 1386913 - Newly uploaded content results in previous content vers…

commit f1eb57f27e1683079783dd7cb20c41427793c4a9
Author: Ruben Vargas <ruben.vp8510>
Date:   Sat Feb 4 20:19:44 2017 -0600

    Bug 1386913 - Newly uploaded content results in previous content version being left in database with no way to clean it up

Comment 18 Ruben Vargas Palma 2017-02-09 23:44:02 UTC
commit ccb3d2722279c0f98888c73f80d5a3a641f73662
Merge: 1e7e638 f2c7ad8
Author: Josejulio Martínez <finwemartinez>
Date:   Thu Feb 9 16:19:43 2017 -0600

    Merge pull request #294 from rubenvp8510/tx-purge-bits
    
    BZ-1386913 Separate transactions for remove bits.

commit f2c7ad823705cc637de80645b4c54b3d43e18bc4
Author: Ruben Vargas <ruben.vp8510>
Date:   Thu Feb 9 11:16:07 2017 -0600

    BZ-1386913
    - Added validations for unlink only on Postgres
    - Put unlink process as part of other transaction
    - One transaction per package.

Comment 22 Michael Burman 2017-02-13 10:45:21 UTC
In the master:

commit 730168bd3f036a10b6e710af90b79a303971d372
Author: Michael Burman <miburman>
Date:   Mon Feb 13 12:44:39 2017 +0200

    [BZ 1386913] After unlinking, close the connection resources

Comment 26 errata-xmlrpc 2017-02-16 18:45:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0285.html