Red Hat Bugzilla – Bug 1320557
unregistering a Content Host can pause ListenOnCandlepinEvents with Candlepin::Consumer: 410 Gone error
Last modified: 2017-02-23 14:19:49 EST
Description of problem: When unregistering a content host (either by subscription-manager unregister, or deleting Content Host, or deleting Host), Actions::Katello::System::Destroy task is being processed. This task has a concurrency bug that with some probability causes Actions::Candlepin::ListenOnCandlepinEvents task to be paused with error: Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca) (UUID matches the UUID of just being deleted Content Host) sequence of steps leading to the bug: - Actions::Candlepin::Consumer::Destroy sub-task executed - it deletes the consumer from candlepin and recalculates compliance for it (imho redundant step when we delete the consumer, but katello is aware of it) - it announces the compliance.create event to ListenOnCandlepinEvents task - katello finds the katello_system is present so it runs "re-indexing content host .." - the parent task Actions::Katello::System::Destroy even _now_ enters Actions::Katello::System::Destroy - this subtask deletes the system from katello_systems _after_ the check before "re-indexing content host .." is made, so the re-index is _not_ skipped - the re-index of the content host calls GET consumer/<uuid> on candlepin, what triggers the 410 Gone in ListenOnCandlepinEvents task Version-Release number of selected component (if applicable): Sat 6.1.7 How reproducible: 100% within few minutes Steps to Reproduce: 1. tail -f /var/log/foreman/production.log | grep -e "in phase Finalize Actions::Katello::System::Destroy" -e "skip re-indexing of non-existent content host" -e "re-indexing content host" 2. Have opened Actions::Candlepin::ListenOnCandlepinEvents task in WebUI 3. On some Content Host, register and unregister it in a loop (here RHEL7 used, update if using RHEL5 or 6 accordingly): while true; do subscription-manager register --force --org="Default_Organization" --environment="Library" --username=admin --password=faYakexMm5XN543x subscription-manager subscribe --pool=8aa2d415526494380152732fc8d20dd7 subscription-manager repos --enable rhel-7-server-rpms --enable rhel-7-server-satellite-tools-6.1-rpms date subscription-manager unregister date sleep 5 done 4. monitor the tail -f output and ListenOnCandlepinEvents task Actual results: - tail shows: 2016-03-23 14:17:05 [D] re-indexing content host pmoravec-rhel7.gsslab.brq.redhat.com 2016-03-23 14:17:05 [D] Step f5705893-2577-41c4-9af6-9b7c10ccb646: 6 running >> success in phase Finalize Actions::Katello::System::Destroy - ListenOnCandlepinEvents task is paused/error with error: Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca) Expected results: - tail can be ok (rather symptom for devels), just the ListenOnCandlepinEvents task needs to be running without an error Additional info: seems like a lack of locking / concurrency bug where reindex_pool_subscription_handler.rb: def reindex_consumer(message) if message.content['newEntity'] uuid = JSON.parse(message.content['newEntity'])['consumer']['uuid'] system = ::Katello::System.find_by_uuid(uuid) if system.nil? @logger.debug "skip re-indexing of non-existent content host #{uuid}" else @logger.debug "re-indexing content host #{system.name}" system.update_index end needs to be executed as atomic operation (not concurrently with Actions::Katello::System::Destroy)
Present also in 6.1.8
An alternative way to fixing the race/concurrency condition in katello would be to prevent candlepin in generating and sending the (just deleted) consumer compliance. Consumer deletion triggers entitlements deletion what triggers back compliance computation for the just deleted consumer - the latest is ridiculous and sending the compliance.create message to katello in fact triggers the race condition there.
Created attachment 1156316 [details] hotfix patch
== HOTFIX INSTRUCTIONS == 1) Download 1320557-001.patch from this bug 2) Copy to: cp 1320557-001.patch /opt/rh/ruby193/root/usr/share/gems/gems/katello-`rpm -q --queryformat '%{VERSION}\n' ruby193-rubygem-katello` 3) Patch: patch -p1 < 1320557-001.patch 4) restart: katello-service restart
I ran the script in loop for almost an hour and nothing seen, also the dynflow tasks was clean. VERIFIED With sat6.1.9 compose1
Need to test is properly, re-opening this bug.
Also tested by grepping for "410 Gone" This issue is no longer seen with sat6.1.9 compose1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1084