Description of problem: Updating a client's content view fails when the client is registered to a capsule. Version-Release number of selected component (if applicable): Tested against: * A satellite running Satellite-6.1.0-RHEL-7-20150317.0 * A capsule running RHEL 7.1 * Clients running RHEL 5.11, 6.6 and 7.1. The RHEL 5.11 and 7.1 clients are affected, but the RHEL 6.6 client was not affected. How reproducible: Unknown. Steps to Reproduce: 1. Set up a satellite and capsule. 2. Register clients to the capsule with the usual commands: 1. rpm -Uvh http://capsule.example.com/pub/katello-ca-consumer-latest.noarch.rpm 2. subscription-manager register --org="Default_Organization" --environment="Library" 3. (not sure if required) subscription-manager attach --pool … 3. Create a content view and make it available to the clients. (Following the xample above, it should be in the Library lifecycle environment and the Default Organization organization.) 4. Log in to the satellite web UI. Go to Hosts → Content Hosts. Select a host, and change the "content view" field. Actual results: The task fails. The failed task is "Actions::Candlepin::Consumer::Update". The most interesting line from the exception is "caused by: (RestClient::BadRequest) Katello::Resources::Candlepin::Consumer: 400 Bad Request {"displayMessage":"Problem updating unit Consumer [id = null, type = null, getName() = null]","requestUuid":"a1a2b811-4ef2-4a0c-91eb-c7bd274b9d3f"} (PUT /candlepin/consumers/ddc94272-eb37-40ab-8d36-1d4380875f9f)". Expected results: The task succeeds. Additional info: No firewall was running on the capsule or clients when this bug was noticed. It is unknown if the presence of a firewall affects how hard it is to reproduce this bug.
It's worth noting that this issue breaks other functionality. For example, after a client is affected by this issue, it is no longer possible to manage the packages on that client via katello-agent.
Note this occurred at 12:30 EDT (16:30 UTC) relevant candlepin error: at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75] 2015-03-23 12:30:16,570 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] ERROR org.candlepin.resource.ConsumerResource - Problem updating unit: java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: deadlock detected Detail: Process 27244 waits for ShareLock on transaction 306663; blocked by process 25271. Process 25271 waits for ExclusiveLock on tuple (0,15) of relation 21551 of database 21516; blocked by process 27244. Hint: See server log for query details.
NOTE: This may have nothing to do with the fact that the Content Host is registered via a Capsule, that may be a red-herring.
Does it happen repeatedly or does a re-try succeed?
The bad request: $ cat candlepin.log | grep req=519aa1fa-6b11-4819-839a-566237f81428 2015-03-23 12:30:15,481 [req=519aa1fa-6b11-4819-839a-566237f81428, org=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3 2015-03-23 12:30:15,551 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.resource.ConsumerResource - Capabilities changed. 2015-03-23 12:30:15,552 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.resource.ConsumerResource - Updating 0 guest IDs. 2015-03-23 12:30:15,552 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.resource.ConsumerResource - removing IDs. 2015-03-23 12:30:15,554 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.resource.ConsumerResource - Updating environment to: 1-6 2015-03-23 12:30:15,560 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.controller.CandlepinPoolManager - Regenerating #0 entitlement certificates for consumer: Consumer [id = 8a9084f34c39150b014c47592b4802d9, type = ConsumerType [id=1000, label=system], getName() = mgmt4.rhq.lab.eng.bos.redhat.com] 2015-03-23 12:30:15,560 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.resource.ConsumerResource - Updating to specific last checkin time: Mon Mar 23 12:22:43 EDT 2015 2015-03-23 12:30:16,570 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] ERROR org.candlepin.resource.ConsumerResource - Problem updating unit: 2015-03-23 12:30:16,596 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO org.candlepin.common.filter.LoggingFilter - Response: status=400, content-type="application/json", time=1154 Note the consumer ID in question is: ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3 Milliseconds before the bad request is an autobind request for this system: 2015-03-23 12:30:15,442 [req=6480131e-a050-4978-822a-786142286b4e, org=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/consumers/ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3/entitlements And it has not completed by the time this update request arrives. A similar thing happens 5 seconds later but for a different system: 2015-03-23 12:30:22,972 [req=00c99c5b-d549-44d2-9d3b-02a412869487, org=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/consumers/ddc94272-eb37-40ab-8d36-1d4380 875f9f/entitlements 2015-03-23 12:30:23,006 [req=a1a2b811-4ef2-4a0c-91eb-c7bd274b9d3f, org=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/ddc94272-eb37-40ab-8d36-1d43808 75f9f And this triggers a second deadlock. Given the original bug report I'm assuming this is being driven by something happening in Satellite web UI and not something from subscription-manager. Is there anything in the katello code for changing a content view that would simultaneously try to initiate an autobind request, and update the consumer itself at virtually the exact same time?
Devan, now that you mention it yes! https://bugzilla.redhat.com/show_bug.cgi?id=1204949 https://github.com/Katello/katello/commit/84c44cf6e47fc754cbe7a2640c98ab5c5bd02dbb I think the thought was to re-run auto attach when changing a systems environment. The way it was implemented though the auto-attach is running concurrently with the environment modification and not afterwards. Should be an easy fix.
> NOTE: This may have nothing to do with the fact that the Content Host is registered via a Capsule, that may be a red-herring. Definitely. So far, I've only produced this on a single specific setup that included a satellite and capsule. It may be possible to produce this on a different setup that does not include a capsule at all. > Does it happen repeatedly or does a re-try succeed? As demonstrated in #c13 , at least two separate deadlocks were produced. However, I believe jsherrill was able to unblock one of the failed tasks by clicking "resume task" at some later point in time.
Created redmine issue http://projects.theforeman.org/issues/9883 from this bug
Upstream PR: https://github.com/Katello/katello/pull/5134
I could not reproduce this anymore on build Satellite-6.1.0-RHEL-7-20150324.0. I was able to: * Register a random RHEL 6 content host against my satellite. * Subscribe it against the 'Default Organization' and 'Library' lifecycle. * Then, using the web ui, I selected this content host (Hosts > Content Hosts) and moved it to an existing content view BONUS ROUND * Since the content view I associated with this content host had RPMs for the OS and katello-agent, I was able to install the katello-agent and then install Firefox.
Verified by QE on Satellite-6.1.0-RHEL-7-20150324.0 build.
This bug is slated to be released with Satellite 6.1.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1592