1204949 – Updating client content view fails when client is registered to capsule

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1204949 - Updating client content view fails when client is registered to capsule

Summary: Updating client content view fails when client is registered to capsule

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Candlepin
Sub Component:
Version:	6.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	Unspecified
Assignee:	Devan Goodwin
QA Contact:	Og Maciel
Docs Contact:
URL:	http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-03-23 20:55 UTC by jaudet
Modified:	2019-04-01 20:26 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-12 05:30:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	9883	0	None	None	None	2016-04-22 15:09:58 UTC
Red Hat Product Errata	RHSA-2015:1592	0	normal	SHIPPED_LIVE	Important: Red Hat Satellite 6.1.1 on RHEL 6	2015-08-12 09:04:35 UTC

Description jaudet 2015-03-23 20:55:16 UTC

Description of problem:
Updating a client's content view fails when the client is registered to a capsule.

Version-Release number of selected component (if applicable):

Tested against:

* A satellite running Satellite-6.1.0-RHEL-7-20150317.0
* A capsule running RHEL 7.1
* Clients running RHEL 5.11, 6.6 and 7.1.

The RHEL 5.11 and 7.1 clients are affected, but the RHEL 6.6 client was not affected.

How reproducible:
Unknown.

Steps to Reproduce:
1. Set up a satellite and capsule.
2. Register clients to the capsule with the usual commands:

1. rpm -Uvh http://capsule.example.com/pub/katello-ca-consumer-latest.noarch.rpm
2. subscription-manager register --org="Default_Organization" --environment="Library"
3. (not sure if required) subscription-manager attach --pool …

3. Create a content view and make it available to the clients. (Following the xample above, it should be in the Library lifecycle environment and the Default Organization organization.)
4. Log in to the satellite web UI. Go to Hosts → Content Hosts. Select a host, and change the "content view" field.

Actual results:
The task fails. The failed task is "Actions::Candlepin::Consumer::Update". The most interesting line from the exception is "caused by: (RestClient::BadRequest) Katello::Resources::Candlepin::Consumer: 400 Bad Request {"displayMessage":"Problem updating unit Consumer [id = null, type = null, getName() = null]","requestUuid":"a1a2b811-4ef2-4a0c-91eb-c7bd274b9d3f"} (PUT /candlepin/consumers/ddc94272-eb37-40ab-8d36-1d4380875f9f)".

Expected results:
The task succeeds.

Additional info:
No firewall was running on the capsule or clients when this bug was noticed. It is unknown if the presence of a firewall affects how hard it is to reproduce this bug.

Comment 3 jaudet 2015-03-23 20:57:59 UTC

It's worth noting that this issue breaks other functionality. For example, after a client is affected by this issue, it is no longer possible to manage the packages on that client via katello-agent.

Comment 7 Justin Sherrill 2015-03-23 21:43:58 UTC

Note this occurred at 12:30 EDT (16:30 UTC)

relevant candlepin error:

        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-03-23 12:30:16,570 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] ERROR org.candlepin.resource.ConsumerResource - Problem updating unit:
java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 27244 waits for ShareLock on transaction 306663; blocked by process 25271.
Process 25271 waits for ExclusiveLock on tuple (0,15) of relation 21551 of database 21516; blocked by process 27244.
  Hint: See server log for query details.

Comment 9 Mike McCune 2015-03-23 22:31:42 UTC

NOTE: This may have nothing to do with the fact that the Content Host is registered via a Capsule, that may be a red-herring.

Comment 12 Devan Goodwin 2015-03-24 12:00:13 UTC

Does it happen repeatedly or does a re-try succeed?

Comment 13 Devan Goodwin 2015-03-24 12:46:42 UTC

The bad request:

$ cat candlepin.log | grep req=519aa1fa-6b11-4819-839a-566237f81428
2015-03-23 12:30:15,481 [req=519aa1fa-6b11-4819-839a-566237f81428, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3
2015-03-23 12:30:15,551 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Capabilities changed.
2015-03-23 12:30:15,552 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Updating 0 guest IDs.
2015-03-23 12:30:15,552 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - removing IDs.
2015-03-23 12:30:15,554 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Updating environment to: 1-6
2015-03-23 12:30:15,560 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.controller.CandlepinPoolManager - Regenerating #0 entitlement certificates for consumer: Consumer [id = 8a9084f34c39150b014c47592b4802d9, type = ConsumerType [id=1000, label=system], getName() = mgmt4.rhq.lab.eng.bos.redhat.com]
2015-03-23 12:30:15,560 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.resource.ConsumerResource - Updating to specific last checkin time: Mon Mar 23 12:22:43 EDT 2015
2015-03-23 12:30:16,570 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] ERROR org.candlepin.resource.ConsumerResource - Problem updating unit:
2015-03-23 12:30:16,596 [req=519aa1fa-6b11-4819-839a-566237f81428, org=Default_Organization] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=400, content-type="application/json", time=1154


Note the consumer ID in question is: ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3

Milliseconds before the bad request is an autobind request for this system:

2015-03-23 12:30:15,442 [req=6480131e-a050-4978-822a-786142286b4e, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/consumers/ddb7dc41-2cf1-4acc-904e-e3c1fa3847e3/entitlements

And it has not completed by the time this update request arrives. 

A similar thing happens 5 seconds later but for a different system:

2015-03-23 12:30:22,972 [req=00c99c5b-d549-44d2-9d3b-02a412869487, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/consumers/ddc94272-eb37-40ab-8d36-1d4380
875f9f/entitlements
2015-03-23 12:30:23,006 [req=a1a2b811-4ef2-4a0c-91eb-c7bd274b9d3f, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/ddc94272-eb37-40ab-8d36-1d43808
75f9f

And this triggers a second deadlock.

Given the original bug report I'm assuming this is being driven by something happening in Satellite web UI and not something from subscription-manager.

Is there anything in the katello code for changing a content view that would simultaneously try to initiate an autobind request, and update the consumer itself at virtually the exact same time?

Comment 14 Justin Sherrill 2015-03-24 13:07:09 UTC

Devan, now that you mention it yes!

https://bugzilla.redhat.com/show_bug.cgi?id=1204949

https://github.com/Katello/katello/commit/84c44cf6e47fc754cbe7a2640c98ab5c5bd02dbb

I think the thought was to re-run auto attach when changing a systems environment.  The way it was implemented though the auto-attach is running concurrently with the environment modification and not afterwards.  Should be an easy fix.

Comment 15 jaudet 2015-03-24 13:13:23 UTC

> NOTE: This may have nothing to do with the fact that the Content Host is registered via a Capsule, that may be a red-herring.

Definitely. So far, I've only produced this on a single specific setup that included a satellite and capsule. It may be possible to produce this on a different setup that does not include a capsule at all.

> Does it happen repeatedly or does a re-try succeed?

As demonstrated in #c13 , at least two separate deadlocks were produced. However, I believe jsherrill was able to unblock one of the failed tasks by clicking "resume task" at some later point in time.

Comment 16 Justin Sherrill 2015-03-24 13:15:39 UTC

Created redmine issue http://projects.theforeman.org/issues/9883 from this bug

Comment 17 Mike McCune 2015-03-24 14:12:50 UTC

Upstream PR:

https://github.com/Katello/katello/pull/5134

Comment 20 Og Maciel 2015-03-26 22:12:15 UTC

I could not reproduce this anymore on build Satellite-6.1.0-RHEL-7-20150324.0.

I was able to:

* Register a random RHEL 6 content host against my satellite.
* Subscribe it against the 'Default Organization' and 'Library' lifecycle.
* Then, using the web ui, I selected this content host (Hosts > Content Hosts) and moved it to an existing content view

BONUS ROUND

* Since the content view I associated with this content host had RPMs for the OS and katello-agent, I was able to install the katello-agent and then install Firefox.

Comment 21 Og Maciel 2015-03-26 22:12:53 UTC

Verified by QE on Satellite-6.1.0-RHEL-7-20150324.0 build.

Comment 22 Bryan Kearney 2015-08-11 13:26:21 UTC

This bug is slated to be released with Satellite 6.1.

Comment 23 errata-xmlrpc 2015-08-12 05:30:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1592

Note You need to log in before you can comment on or make changes to this bug.