Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1320557 - unregistering a Content Host can pause ListenOnCandlepinEvents with Candlepin::Consumer: 410 Gone error
Summary: unregistering a Content Host can pause ListenOnCandlepinEvents with Candlepin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Tasks Plugin
Version: 6.1.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: Unspecified
Assignee: Tomas Strachota
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks: 1325879
TreeView+ depends on / blocked
 
Reported: 2016-03-23 13:44 UTC by Pavel Moravec
Modified: 2019-11-14 07:39 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1325879 (view as bug list)
Environment:
Last Closed: 2016-05-17 05:33:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hotfix patch (1.63 KB, application/mbox)
2016-05-11 21:59 UTC, Mike McCune
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2209461 0 None None None 2016-03-24 14:20:01 UTC
Red Hat Product Errata RHBA-2016:1084 0 normal SHIPPED_LIVE Satellite 6.1.9 bug fix update 2016-05-17 09:32:24 UTC

Description Pavel Moravec 2016-03-23 13:44:03 UTC
Description of problem:
When unregistering a content host (either by subscription-manager unregister, or deleting Content Host, or deleting Host), Actions::Katello::System::Destroy task is being processed.

This task has a concurrency bug that with some probability causes Actions::Candlepin::ListenOnCandlepinEvents task to be paused with error:

Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca)

(UUID matches the UUID of just being deleted Content Host)

sequence of steps leading to the bug:
- Actions::Candlepin::Consumer::Destroy sub-task executed
  - it deletes the consumer from candlepin and recalculates compliance for it (imho redundant step when we delete the consumer, but katello is aware of it)
  - it announces the compliance.create event to ListenOnCandlepinEvents task
  - katello finds the katello_system is present so it runs "re-indexing content host .."
- the parent task Actions::Katello::System::Destroy even _now_ enters Actions::Katello::System::Destroy
  - this subtask deletes the system from katello_systems _after_ the check before "re-indexing content host .." is made, so the re-index is _not_ skipped
- the re-index of the content host calls GET consumer/<uuid> on candlepin, what triggers the 410 Gone in ListenOnCandlepinEvents task


Version-Release number of selected component (if applicable):
Sat 6.1.7


How reproducible:
100% within few minutes


Steps to Reproduce:
1. tail -f /var/log/foreman/production.log | grep -e "in phase Finalize Actions::Katello::System::Destroy" -e "skip re-indexing of non-existent content host" -e "re-indexing content host"

2. Have opened Actions::Candlepin::ListenOnCandlepinEvents task in WebUI

3. On some Content Host, register and unregister it in a loop (here RHEL7 used, update if using RHEL5 or 6 accordingly):

while true; do
  subscription-manager register --force --org="Default_Organization" --environment="Library" --username=admin --password=faYakexMm5XN543x
  subscription-manager subscribe --pool=8aa2d415526494380152732fc8d20dd7
  subscription-manager repos --enable rhel-7-server-rpms --enable rhel-7-server-satellite-tools-6.1-rpms
  date
  subscription-manager unregister
  date
  sleep 5
done

4. monitor the tail -f output and ListenOnCandlepinEvents task


Actual results:
- tail shows:

2016-03-23 14:17:05 [D] re-indexing content host pmoravec-rhel7.gsslab.brq.redhat.com
2016-03-23 14:17:05 [D]          Step f5705893-2577-41c4-9af6-9b7c10ccb646: 6   running >>   success in phase Finalize Actions::Katello::System::Destroy

- ListenOnCandlepinEvents task is paused/error with error:

Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca)


Expected results:
- tail can be ok (rather symptom for devels), just the ListenOnCandlepinEvents task needs to be running without an error


Additional info:
seems like a lack of locking / concurrency bug where reindex_pool_subscription_handler.rb:

      def reindex_consumer(message)
        if message.content['newEntity']
          uuid = JSON.parse(message.content['newEntity'])['consumer']['uuid']
          system = ::Katello::System.find_by_uuid(uuid)
          if system.nil?
            @logger.debug "skip re-indexing of non-existent content host #{uuid}"
          else
            @logger.debug "re-indexing content host #{system.name}"
            system.update_index
          end

needs to be executed as atomic operation (not concurrently with Actions::Katello::System::Destroy)

Comment 2 Pavel Moravec 2016-03-23 14:43:39 UTC
Present also in 6.1.8

Comment 3 Pavel Moravec 2016-03-24 11:12:34 UTC
An alternative way to fixing the race/concurrency condition in katello would be to prevent candlepin in generating and sending the (just deleted) consumer compliance. Consumer deletion triggers entitlements deletion what triggers back compliance computation for the just deleted consumer - the latest is ridiculous and sending the compliance.create message to katello in fact triggers the race condition there.

Comment 9 Mike McCune 2016-05-11 21:59:16 UTC
Created attachment 1156316 [details]
hotfix patch

Comment 10 Mike McCune 2016-05-11 22:05:32 UTC
== HOTFIX INSTRUCTIONS ==

1) Download 1320557-001.patch from this bug

2) Copy to:

cp 1320557-001.patch /opt/rh/ruby193/root/usr/share/gems/gems/katello-`rpm -q --queryformat '%{VERSION}\n' ruby193-rubygem-katello`

3) Patch:

patch -p1 < 1320557-001.patch

4) restart:

katello-service restart

Comment 11 Kedar Bidarkar 2016-05-12 20:02:47 UTC
I ran the script in loop for almost an hour and nothing seen, also the dynflow tasks was clean.

VERIFIED With sat6.1.9 compose1

Comment 12 Kedar Bidarkar 2016-05-12 20:15:34 UTC
Need to test is properly, re-opening this bug.

Comment 13 Kedar Bidarkar 2016-05-12 21:48:53 UTC
Also tested by grepping for "410 Gone" 

This issue is no longer seen with sat6.1.9 compose1

Comment 15 errata-xmlrpc 2016-05-17 05:33:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1084


Note You need to log in before you can comment on or make changes to this bug.