Bug 1271500 - [RFE] Improve recovery in case candlepin and katello are out of sync
[RFE] Improve recovery in case candlepin and katello are out of sync
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Subscription Management (Show other bugs)
Unspecified Unspecified
medium Severity medium (vote)
: GA
: --
Assigned To: Tom McKay
Chris Duryee
: FutureFeature, Triaged
Depends On:
Blocks: 260381 1122832 1353215
  Show dependency treegraph
Reported: 2015-10-14 03:43 EDT by Peter Vreman
Modified: 2016-07-27 07:36 EDT (History)
9 users (show)

See Also:
Fixed In Version: tfm-rubygem-katello-
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-07-27 07:36:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 12030 None None None 2016-06-08 07:37 EDT

  None (edit)
Description Peter Vreman 2015-10-14 03:43:19 EDT
Description of problem:
Once Katello and Candlepin are out of sync it is for a user impossible to recover without deleting entries manual in the postgres database.

The Candlepin and Katello can get out of sync in the following situations
- Unexpected reboot
- Unexpected application restart during a subscription task
- Any other failing step in a Tasks that worked with Subscriptions

After such situation happends the Content Hosts page cannot be loaded:

2015-10-13 13:50:26 [I] Processing by Katello::Api::V2::SystemsController#index as */*
2015-10-13 13:50:26 [I]   Parameters: {"per_page"=>9999, "organization_id"=>3, "api_version"=>"v2", "system"=>{}}
2015-10-13 13:50:27 [I] Authorized user hoici(hoici )
2015-10-13 13:50:45 [I]   Rendered /opt/rh/ruby193/root/usr/share/gems/gems/katello- within katello
/api/v2/layouts/collection (17945.0ms)
2015-10-13 13:50:46 [E] exception when talking to a remote client: Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit ed149cbb-9dd9-4
dd4-acb2-56ac6393b872 has been deleted","requestUuid":"17eb9c0d-7765-4cdf-b3d8-4423313d0904","deletedId":"ed149cbb-9dd9-4dd4-acb2-56ac6393b872"} (GET /candl
epin/consumers/ed149cbb-9dd9-4dd4-acb2-56ac6393b872) RestClient::Gone: Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit ed149cbb-9d
d9-4dd4-acb2-56ac6393b872 has been deleted","requestUuid":"17eb9c0d-7765-4cdf-b3d8-4423313d0904","deletedId":"ed149cbb-9dd9-4dd4-acb2-56ac6393b872"} (GET /c
Body: {"displayMessage":"Unit ed149cbb-9dd9-4dd4-acb2-56ac6393b872 has been deleted","requestUuid":"17eb9c0d-7765-4cdf-b3d8-4423313d0904","deletedId":"ed149

Even the katello:reindex does not work to try to correct this:

# foreman-rake katello:reindex
API controllers newer than Apipie cache! Run apipie:cache rake task to regenerate cache.
Elasticsearch Indices cleared.
Re-indexing Katello::ContentViewErratumFilterRule
Re-indexing Katello::HostCollection
Re-indexing Katello::System
Re-indexing Katello::ContentViewHistory
Re-indexing Katello::ContentViewFilter
Re-indexing Katello::TaskStatus
Re-indexing Katello::Job
Re-indexing Katello::ContentViewPuppetModule
Re-indexing Katello::Repository
Re-indexing Katello::ContentView
Re-indexing Katello::Distributor
Re-indexing Katello::ActivationKey
Re-indexing Katello::Provider
Re-indexing Katello::ContentViewPackageGroupFilterRule
Re-indexing Katello::ContentViewPackageFilterRule
Re-indexing Katello::Product
Re-indexing Katello::ContentViewPuppetEnvironment
Re-indexing Katello::Notice
Re-indexing Katello::Hypervisor
The following Katello::Hypervisor items could not be indexed due to various reasons.
Please check /usr/share/foreman/log/reindex.log for more detailed information.
Object: #<Katello::Hypervisor id: 9, uuid: "ed149cbb-9dd9-4dd4-acb2-56ac6393b872", name: "li-hc-1005", description: "Initial Registration Params", location: "None", environment_id: 2, created_at: "2015-09-21 10:41:02", updated_at: "2015-09-29 14:37:05", type: "Katello::Hypervisor", content_view_id: 52, host_id: nil>
rake aborted!
Hypervisor does not support this action

Tasks: TOP => katello:reindex

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Simulate an unexpected reboot, e.g. Delete a subscription in Candlepin without deleting it in Katello

Actual results:
Unable to recover without manual deleting a record in the katello database

Expected results:
Katello recovers, possible scenarios:
- It mentions that the COntent Host is Deleted
- Delete the ContentHost implicitly
Katello reindex succeeds
- Deletes the out of sync content host record from Katello

Additional info:
Comment 1 Peter Vreman 2016-03-16 09:21:51 EDT
I expect that https://github.com/Katello/katello/pull/5536 addresses this issue.
Comment 2 Tom McKay 2016-03-16 09:33:01 EDT
To test, I think, simply manually delete a consumer record in candlepin and see what blows up. If things recover nicely and/or katello:reindex resolves mismatch, then I would mark this BZ verified.
Comment 3 Andrea Perotti 2016-05-11 05:50:59 EDT
(In reply to Peter Vreman from comment #1)
> I expect that https://github.com/Katello/katello/pull/5536 addresses this
> issue.

That has been merged upstream.
Any news on the topic?

Customer is asking update.
Comment 5 Chris Duryee 2016-07-11 10:52:37 EDT
tested with snap 19.1

verified, steps were:

* set up a sat6, register system to itself
* edit /etc/candlepin/candlepin.conf:
 ** comment out "module.config.adapter_module=org.candlepin.katello.KatelloModule"
 ** change "candlepin.auth.basic.enable=false" to true
* restart tomcat
* run "curl -k https://localhost:8443/candlepin/admin/init" to create candlepin admin user
* run "curl -k -u admin:admin https://localhost:8443/candlepin/consumers/" to grab the UUID of the consumer to delete
* run "curl -k -u admin:admin -X DELETE 'https://localhost:8443/candlepin/consumers/<UUID>'"

* undo your changes to candlepin.conf
* katello-service restart

After this, main content host page with list of content hosts was still operational.

NOTE: there is a smaller bug in https://bugzilla.redhat.com/show_bug.cgi?id=1354555, which is related to the host not disappearing after a reindex and backend object cleaning. However, the overall content host page is not broken, so I am marking this as VERIFIED.
Comment 6 Bryan Kearney 2016-07-27 07:36:51 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.