Bug 1204949 - Updating client content view fails when client is registered to capsule
Summary: Updating client content view fails when client is registered to capsule
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: Unspecified
Assignee: Devan Goodwin
QA Contact: Og Maciel
URL: http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-23 20:55 UTC by jaudet
Modified: 2019-04-01 20:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-12 05:30:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 9883 0 None None None 2016-04-22 15:09:58 UTC
Red Hat Product Errata RHSA-2015:1592 0 normal SHIPPED_LIVE Important: Red Hat Satellite 6.1.1 on RHEL 6 2015-08-12 09:04:35 UTC

Description jaudet 2015-03-23 20:55:16 UTC
Description of problem:
Updating a client's content view fails when the client is registered to a capsule.

Version-Release number of selected component (if applicable):

Tested against:

* A satellite running Satellite-6.1.0-RHEL-7-20150317.0
* A capsule running RHEL 7.1
* Clients running RHEL 5.11, 6.6 and 7.1.

The RHEL 5.11 and 7.1 clients are affected, but the RHEL 6.6 client was not affected.

How reproducible:
Unknown.

Steps to Reproduce:
1. Set up a satellite and capsule.
2. Register clients to the capsule with the usual commands:

    1. rpm -Uvh http://capsule.example.com/pub/katello-ca-consumer-latest.noarch.rpm
    2. subscription-manager register --org="Default_Organization" --environment="Library"
    3. (not sure if required) subscription-manager attach --pool …

3. Create a content view and make it available to the clients. (Following the xample above, it should be in the Library lifecycle environment and the Default Organization organization.)
4. Log in to the satellite web UI. Go to Hosts → Content Hosts. Select a host, and change the "content view" field.

Actual results:
The task fails. The failed task is "Actions::Candlepin::Consumer::Update". The most interesting line from the exception is "caused by: (RestClient::BadRequest) Katello::Resources::Candlepin::Consumer: 400 Bad Request {"displayMessage":"Problem updating unit Consumer [id = null, type = null, getName() = null]","requestUuid":"a1a2b811-4ef2-4a0c-91eb-c7bd274b9d3f"} (PUT /candlepin/consumers/ddc94272-eb37-40ab-8d36-1d4380875f9f)".

Expected results:
The task succeeds.

Additional info:
No firewall was running on the capsule or clients when this bug was noticed. It is unknown if the presence of a firewall affects how hard it is to reproduce this bug.

Comment 3 jaudet 2015-03-23 20:57:59 UTC
It's worth noting that this issue breaks other functionality. For example, after a client is affected by this issue, it is no longer possible to manage the packages on that client via katello-agent.

Comment 7 Justin Sherrill 2015-03-23 21:43:58 UTC
Note this occurred at 12:30 EDT (16:30 UTC)

relevant candlepin error:

        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-03-23 12:30:16,570 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] ERROR org.candlepin.resource.ConsumerResource - Problem updating unit:
java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 27244 waits for ShareLock on transaction 306663; blocked by process 25271.
Process 25271 waits for ExclusiveLock on tuple (0,15) of relation 21551 of database 21516; blocked by process 27244.
  Hint: See server log for query details.

Comment 9 Mike McCune 2015-03-23 22:31:42 UTC
NOTE: This may have nothing to do with the fact that the Content Host is registered via a Capsule, that may be a red-herring.

Comment 12 Devan Goodwin 2015-03-24 12:00:13 UTC
Does it happen repeatedly or does a re-try succeed?

Comment 13 Devan Goodwin 2015-03-24 12:46:42 UTC
The bad request:

$ cat candlepin.log | grep req=519aa1fa-6b11-4819-839a-566237f81428
2015-03-23 12:30:15,481 [req=519aa1fa-6b11-4819-839a-566237f81428, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3
2015-03-23 12:30:15,551 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Capabilities changed.
2015-03-23 12:30:15,552 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Updating 0 guest IDs.
2015-03-23 12:30:15,552 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - removing IDs.
2015-03-23 12:30:15,554 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Updating environment to: 1-6
2015-03-23 12:30:15,560 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.controller.CandlepinPoolManager - Regenerating #0 entitlement certificates for consumer: Consumer [id = 8a9084f34c39150b014c47592b4802d9, type = ConsumerType [id=1000, label=system], getName() = mgmt4.rhq.lab.eng.bos.redhat.com]
2015-03-23 12:30:15,560 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Updating to specific last checkin time: Mon Mar 23 12:22:43 EDT 2015
2015-03-23 12:30:16,570 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] ERROR org.candlepin.resource.ConsumerResource - Problem updating unit:
2015-03-23 12:30:16,596 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=400, content-type="application/json", time=1154


Note the consumer ID in question is: ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3

Milliseconds before the bad request is an autobind request for this system:

2015-03-23 12:30:15,442 [req=6480131e-a050-4978-822a-786142286b4e, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/consumers/ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3/entitlements

And it has not completed by the time this update request arrives. 

A similar thing happens 5 seconds later but for a different system:

2015-03-23 12:30:22,972 [req=00c99c5b-d549-44d2-9d3b-02a412869487, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/consumers/ddc94272-eb37-40ab-8d36-1d4380
875f9f/entitlements
2015-03-23 12:30:23,006 [req=a1a2b811-4ef2-4a0c-91eb-c7bd274b9d3f, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/ddc94272-eb37-40ab-8d36-1d43808
75f9f

And this triggers a second deadlock.

Given the original bug report I'm assuming this is being driven by something happening in Satellite web UI and not something from subscription-manager.

Is there anything in the katello code for changing a content view that would simultaneously try to initiate an autobind request, and update the consumer itself at virtually the exact same time?

Comment 14 Justin Sherrill 2015-03-24 13:07:09 UTC
Devan, now that you mention it yes!

https://bugzilla.redhat.com/show_bug.cgi?id=1204949

https://github.com/Katello/katello/commit/84c44cf6e47fc754cbe7a2640c98ab5c5bd02dbb

I think the thought was to re-run auto attach when changing a systems environment.  The way it was implemented though the auto-attach is running concurrently with the environment modification and not afterwards.  Should be an easy fix.

Comment 15 jaudet 2015-03-24 13:13:23 UTC
> NOTE: This may have nothing to do with the fact that the Content Host is registered via a Capsule, that may be a red-herring.

Definitely. So far, I've only produced this on a single specific setup that included a satellite and capsule. It may be possible to produce this on a different setup that does not include a capsule at all.

> Does it happen repeatedly or does a re-try succeed?

As demonstrated in #c13 , at least two separate deadlocks were produced. However, I believe jsherrill was able to unblock one of the failed tasks by clicking "resume task" at some later point in time.

Comment 16 Justin Sherrill 2015-03-24 13:15:39 UTC
Created redmine issue http://projects.theforeman.org/issues/9883 from this bug

Comment 17 Mike McCune 2015-03-24 14:12:50 UTC
Upstream PR:

https://github.com/Katello/katello/pull/5134

Comment 20 Og Maciel 2015-03-26 22:12:15 UTC
I could not reproduce this anymore on build Satellite-6.1.0-RHEL-7-20150324.0.

I was able to:

* Register a random RHEL 6 content host against my satellite.
* Subscribe it against the 'Default Organization' and 'Library' lifecycle.
* Then, using the web ui, I selected this content host (Hosts > Content Hosts) and moved it to an existing content view

BONUS ROUND

* Since the content view I associated with this content host had RPMs for the OS and katello-agent, I was able to install the katello-agent and then install Firefox.

Comment 21 Og Maciel 2015-03-26 22:12:53 UTC
Verified by QE on Satellite-6.1.0-RHEL-7-20150324.0 build.

Comment 22 Bryan Kearney 2015-08-11 13:26:21 UTC
This bug is slated to be released with Satellite 6.1.

Comment 23 errata-xmlrpc 2015-08-12 05:30:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1592


Note You need to log in before you can comment on or make changes to this bug.