Bug 1643649
Summary: | Sequential Actions::Katello::Host::Update calls from subscription-manager can fail under load | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Mike McCune <mmccune> | |
Component: | API - Content | Assignee: | Mike McCune <mmccune> | |
Status: | CLOSED ERRATA | QA Contact: | Jitendra Yejare <jyejare> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6.3.4 | CC: | ajambhul, bwallace, egolov, jyejare, ktordeur, mjia, pdwyer, pmoravec, ptrivedi, roarora, smajumda, wclark | |
Target Milestone: | 6.6.0 | Keywords: | Triaged | |
Target Release: | Unused | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | tfm-rubygem-katello-3.12.0.9-1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1731277 (view as bug list) | Environment: | ||
Last Closed: | 2019-10-22 12:46:44 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Mike McCune
2018-10-26 21:22:41 UTC
*** Bug 1644342 has been marked as a duplicate of this bug. *** Comments from an another scenario that can cause this from the dupe we closed: https://bugzilla.redhat.com/show_bug.cgi?id=1644342 """ Description of problem: In case a hypervisor is registered as a Host and is running virt-who, a task to update the Host can fail on acquired lock. The reason is, both rhsmcertd-worker AND virt-who can fire "PUT /rhsm/consumers/${UUID}" requests concurrently and independently, what leads to concurrent Actions::Katello::Host::Update tasks where either of them fails with "Required lock is already taken by other running tasks" error. We should handle this better in our API, that e.g. should handle the 2nd request, wait till the 1st is completed and then continue in the 2nd one. Version-Release number of selected component (if applicable): Sat 6.3, expected in 6.4 as well. How reproducible: 100% with some effort / proper timing Steps to Reproduce: 1. Have a hypervisor registered as a Host, with virt-who installed & configured 2. Populate or update some host's facts, just to have Host update crunching the facts longer. 3. Enable rhsm debugs and restart rhsmcertd service 4. Check rhsm logs to spot time when "PUT /rhsm/consumers/${UUID}" is triggered. 5. At that time, run virt-who -o. 6. After a while, check for status of 2 Host update tasks for this Host. Actual results: 2nd Host update fails with "Required lock is already taken by other running tasks" error. Expected results: 2nd Host update task proceeds well. """ Created redmine issue https://projects.theforeman.org/issues/25422 from this bug Upstream bug assigned to mmccune Upstream bug assigned to mmccune Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25422 has been resolved. Tested! @Satellite 6.6 Snap 16 Steps: ------------- 1. Registered 5 RHEL7 systems to the satellite. 2. Ran 100 times disable and enable for all 5 hosts asynchronously. using script [1]. 3. Monitor clients and satellite if fails. Scripts: -------------- [1] : ``` for i in {1..100} do for i in c72588fe5db6 31ce842fcf7c 41c5a036ef47 1549600b5af5 4ebacb14a9b5 do echo -e "\nResult for container: ${i}:\n" docker exec -d $i subscription-manager repos --disable rhel-7-server-rpms done for i in c72588fe5db6 31ce842fcf7c 41c5a036ef47 1549600b5af5 4ebacb14a9b5 do echo -e "\nResult for container: ${i}:\n" docker exec -d $i subscription-manager repos --enable rhel-7-server-rpms done done time docker exec 4ebacb14a9b5 subscription-manager repos --disable rhel-7-server-rpms ``` Observation: --------------- 1. During execution, I never saw 'Required Lock already taken error!' 2. At 1001st time, when I execute 'subscription-manager disable' where the satellite was having an almost good chunk of 'Host Update' task yet to execute, the execution wait for 1minute 48 seconds, but did not fail with 'required lock' error. Script stdout: --------------- ``` . . . Result for container: 41c5a036ef47: Result for container: 1549600b5af5: Result for container: 4ebacb14a9b5: Repository 'rhel-7-server-rpms' is disabled for this system. <---------------- This 1001st request which wait for 1m48s and executed without error. real 1m48.293s user 0m0.021s sys 0m0.015s ``` Is this what we expect, to verify this bug? Did you change the Administer -> Settings -> Content -> "Allow multiple concurrent Actions::Katello::Host::Update calls for one host to be processed at the same time" to 'TRUE'? If not and it was false then you did not have a concurrency level high enough to introduce the Lock contention. If you had that set to 'TRUE' then yes, this bug is VERIFIED. Verified! @Satellite 6.6 Snap 17 Steps: ------------- 1. As per comment 14, I set setting "Allow multiple concurrent Actions::Katello::Host::Update calls for one host to be processed at the same time" to True/yes. 2. Registered 5 RHEL7 systems to the satellite. 3. Ran 100 times disable and enable for all 5 hosts asynchronously. using script [1]. 4. Monitor clients and satellite if fails. Scripts: -------------- [1] : ``` for i in {1..100} do for i in c72588fe5db6 31ce842fcf7c 41c5a036ef47 1549600b5af5 4ebacb14a9b5 do echo -e "\nResult for container: ${i}:\n" docker exec -d $i subscription-manager repos --disable rhel-7-server-rpms done for i in c72588fe5db6 31ce842fcf7c 41c5a036ef47 1549600b5af5 4ebacb14a9b5 do echo -e "\nResult for container: ${i}:\n" docker exec -d $i subscription-manager repos --enable rhel-7-server-rpms done done time docker exec 4ebacb14a9b5 subscription-manager repos --disable rhel-7-server-rpms ``` Observation: --------------- 1. During execution, I never saw 'Required Lock already taken error!' in the satellite. 2. At 1001st time, when I execute 'subscription-manager disable' where the satellite was having an almost good chunk of 'Host Update' task yet to execute, the execution was waiting for 5minute 44 seconds but did not fail with 'required lock' error. Script stdout: --------------- ``` . . . Result for container: 41c5a036ef47: Result for container: 1549600b5af5: Result for container: 4ebacb14a9b5: Repository 'rhel-7-server-rpms' is disabled for this system. <---------------- This 1001st request which wait for 1m48s and executed without error. real 5m44.023s user 0m0.021s sys 0m0.015s ``` Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3172 |