Bug 1271500

Summary: [RFE] Improve recovery in case candlepin and katello are out of sync
Product: Red Hat Satellite Reporter: Peter Vreman <peter.vreman>
Component: Subscription ManagementAssignee: Tom McKay <tomckay>
Status: CLOSED ERRATA QA Contact: Chris Duryee <cduryee>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1.2CC: aperotti, bbuckingham, bkearney, cduryee, cwelton, daniele, ehelms, tomckay, xdmoon
Target Milestone: UnspecifiedKeywords: FutureFeature, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-katello-3.0.0.0-1 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-27 11:36:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 260381, 1122832, 1353215    

Description Peter Vreman 2015-10-14 07:43:19 UTC
Description of problem:
Once Katello and Candlepin are out of sync it is for a user impossible to recover without deleting entries manual in the postgres database.

The Candlepin and Katello can get out of sync in the following situations
- Unexpected reboot
- Unexpected application restart during a subscription task
- Any other failing step in a Tasks that worked with Subscriptions

After such situation happends the Content Hosts page cannot be loaded:

2015-10-13 13:50:26 [I] Processing by Katello::Api::V2::SystemsController#index as */*
2015-10-13 13:50:26 [I]   Parameters: {"per_page"=>9999, "organization_id"=>3, "api_version"=>"v2", "system"=>{}}
2015-10-13 13:50:27 [I] Authorized user hoici(hoici )
2015-10-13 13:50:45 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.67/app/views/katello/api/v2/systems/index.json.rabl within katello
/api/v2/layouts/collection (17945.0ms)
2015-10-13 13:50:46 [E] exception when talking to a remote client: Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit ed149cbb-9dd9-4
dd4-acb2-56ac6393b872 has been deleted","requestUuid":"17eb9c0d-7765-4cdf-b3d8-4423313d0904","deletedId":"ed149cbb-9dd9-4dd4-acb2-56ac6393b872"} (GET /candl
epin/consumers/ed149cbb-9dd9-4dd4-acb2-56ac6393b872) RestClient::Gone: Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit ed149cbb-9d
d9-4dd4-acb2-56ac6393b872 has been deleted","requestUuid":"17eb9c0d-7765-4cdf-b3d8-4423313d0904","deletedId":"ed149cbb-9dd9-4dd4-acb2-56ac6393b872"} (GET /c
andlepin/consumers/ed149cbb-9dd9-4dd4-acb2-56ac6393b872)
Body: {"displayMessage":"Unit ed149cbb-9dd9-4dd4-acb2-56ac6393b872 has been deleted","requestUuid":"17eb9c0d-7765-4cdf-b3d8-4423313d0904","deletedId":"ed149
cbb-9dd9-4dd4-acb2-56ac6393b872"}


Even the katello:reindex does not work to try to correct this:

# foreman-rake katello:reindex
API controllers newer than Apipie cache! Run apipie:cache rake task to regenerate cache.
Elasticsearch Indices cleared.
Re-indexing Katello::ContentViewErratumFilterRule
Re-indexing Katello::HostCollection
Re-indexing Katello::System
Re-indexing Katello::ContentViewHistory
Re-indexing Katello::ContentViewFilter
Re-indexing Katello::TaskStatus
Re-indexing Katello::Job
Re-indexing Katello::ContentViewPuppetModule
Re-indexing Katello::Repository
Re-indexing Katello::ContentView
Re-indexing Katello::Distributor
Re-indexing Katello::ActivationKey
Re-indexing Katello::Provider
Re-indexing Katello::ContentViewPackageGroupFilterRule
Re-indexing Katello::ContentViewPackageFilterRule
Re-indexing Katello::Product
Re-indexing Katello::ContentViewPuppetEnvironment
Re-indexing Katello::Notice
Re-indexing Katello::Hypervisor
The following Katello::Hypervisor items could not be indexed due to various reasons.
Please check /usr/share/foreman/log/reindex.log for more detailed information.
Object: #<Katello::Hypervisor id: 9, uuid: "ed149cbb-9dd9-4dd4-acb2-56ac6393b872", name: "li-hc-1005", description: "Initial Registration Params", location: "None", environment_id: 2, created_at: "2015-09-21 10:41:02", updated_at: "2015-09-29 14:37:05", type: "Katello::Hypervisor", content_view_id: 52, host_id: nil>
rake aborted!
Hypervisor does not support this action

Tasks: TOP => katello:reindex


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Simulate an unexpected reboot, e.g. Delete a subscription in Candlepin without deleting it in Katello
2.
3.

Actual results:
Unable to recover without manual deleting a record in the katello database

Expected results:
Katello recovers, possible scenarios:
- It mentions that the COntent Host is Deleted
- Delete the ContentHost implicitly
Katello reindex succeeds
- Deletes the out of sync content host record from Katello


Additional info:

Comment 1 Peter Vreman 2016-03-16 13:21:51 UTC
I expect that https://github.com/Katello/katello/pull/5536 addresses this issue.

Comment 2 Tom McKay 2016-03-16 13:33:01 UTC
To test, I think, simply manually delete a consumer record in candlepin and see what blows up. If things recover nicely and/or katello:reindex resolves mismatch, then I would mark this BZ verified.

Comment 3 Andrea Perotti 2016-05-11 09:50:59 UTC
(In reply to Peter Vreman from comment #1)
> I expect that https://github.com/Katello/katello/pull/5536 addresses this
> issue.

That has been merged upstream.
Any news on the topic?

Customer is asking update.

Comment 5 Chris Duryee 2016-07-11 14:52:37 UTC
tested with snap 19.1

verified, steps were:


* set up a sat6, register system to itself
* edit /etc/candlepin/candlepin.conf:
 ** comment out "module.config.adapter_module=org.candlepin.katello.KatelloModule"
 ** change "candlepin.auth.basic.enable=false" to true
* restart tomcat
* run "curl -k https://localhost:8443/candlepin/admin/init" to create candlepin admin user
* run "curl -k -u admin:admin https://localhost:8443/candlepin/consumers/" to grab the UUID of the consumer to delete
* run "curl -k -u admin:admin -X DELETE 'https://localhost:8443/candlepin/consumers/<UUID>'"

* undo your changes to candlepin.conf
* katello-service restart


After this, main content host page with list of content hosts was still operational.

NOTE: there is a smaller bug in https://bugzilla.redhat.com/show_bug.cgi?id=1354555, which is related to the host not disappearing after a reindex and backend object cleaning. However, the overall content host page is not broken, so I am marking this as VERIFIED.

Comment 6 Bryan Kearney 2016-07-27 11:36:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1501