Bug 1317922

Summary: deleting a content view with >40 repos fails on candlepin "400 Bad Request" error
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: Content ViewsAssignee: Christine Fouant <cfouant>
Status: CLOSED ERRATA QA Contact: Roman Plevka <rplevka>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1.6CC: bbuckingham, cfouant, ealcaniz, ehelms, fnguyen, kshirsal, peter.vreman, rplevka, sauchter, sthirugn, tstrachota
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1327343 (view as bug list) Environment:
Last Closed: 2016-07-27 11:03:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1317008, 1327343    
Attachments:
Description Flags
candlepin.lgo none

Description Pavel Moravec 2016-03-15 14:03:12 UTC
Description of problem:
Having a content view (CV) published with >40 repositories (the more, the higher probable the bug is) and trying to delete the CV (remove a version promoted to a lifecycle environment, i.e. to Library), there is a probability it will fail due to some candlepin concurrent activity that fails and returns:

Katello::Resources::Candlepin::Environment: 400 Bad Request  (DELETE /candlepin/environments/1-35/content?content=2413)

The key problem sounds to be due to the fact candlepin is requested to remove one content (repo in katello words) after another from its environment (candlepin environment is pair CV+lifecycle env.) and deleting the environment at the end. Removing a repo triggers recalculation of certificates/subscriptions that apparently can interfere with the next removal of next piece of content / next repo.


Version-Release number of selected component (if applicable):
Sat6.1.7
katello-2.2.0.18-1.el7sat.noarch
candlepin-0.9.49.11-1.el7.noarch


How reproducible:
50% for CV with 40 repos
(the more repos in the CV, the more probable it is)


Steps to Reproduce:
1. Create a CV with >40 repos
2. publish it
3. Delete the CV / remove version promoted to Library


Actual results:
With some probability, error as above is hit


Expected results:
No such error


Additional info:

Comment 1 Pavel Moravec 2016-03-15 14:05:32 UTC
Created attachment 1136610 [details]
candlepin.lgo

Comment 2 Pavel Moravec 2016-03-15 14:06:30 UTC
(In reply to Pavel Moravec from comment #1)
> Created attachment 1136610 [details]
> candlepin.lgo

See timestamp:

2016-03-14 17:01:06,808 [req=b678a64f-6db1-4f61-8411-762bc4b13379, org=Default_Organization] ERROR org.candlepin.common.exceptions.mappers.CandlepinExceptionMapper - Runtime Error Error while committing the transaction at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched:81

and an attempt to delete environment 1-35 (first by removing individual contents one by one from it).

Comment 4 Filip Nguyen 2016-03-16 08:02:37 UTC
From Candlepin logs, I can see that there are interleaved calls [1][3] to DELETE the environment [0]. Because request [1] finishes at [4], the request [2] is trying to delete something that is already deleted 

In other words, the problem is that the caller who is issuing duplicit deletes to [0]. These deletes must be issued concurrently by the client (or by multiple clients), otherwise it would be impossible to interleave them (client would wait for one of [1] and [3] to finish)


[0] 1-35/content?content=2413

[1]
2016-03-14 17:01:06,692 [req=2e896897-5dcf-4432-952c-8f318b70cca4, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=DELETE, uri=/candlepin/environments/1-35/content?content=2413

[2]
2016-03-14 17:01:06,692 [req=b98b8e29-e4f0-4069-893e-46818b9147e9, org=Default_Organization] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=24
[3]
2016-03-14 17:01:06,713 [req=b678a64f-6db1-4f61-8411-762bc4b13379, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=DELETE, uri=/candlepin/environments/1-35/content?content=2413
[4]
2016-03-14 17:01:06,792 [req=2e896897-5dcf-4432-952c-8f318b70cca4, org=Default_Organization] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=202, content-type="application/json", time=100

Comment 5 Christine Fouant 2016-04-15 18:13:19 UTC
fixed by https://github.com/Katello/katello/pull/5937/

Comment 6 Roman Plevka 2016-06-28 12:20:31 UTC
VERIFIED
on sat6.2.0 beta snap17.0

- removed CV version promoted to Library containing 40 repositories.
- candlepin log looks much better:

<pre>
2016-06-28 08:09:43,001 [thread=http-8443-2] [req=ca9d4, org=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=DELETE, uri=/candlepin/environments/3-5
2016-06-28 08:09:43,019 [thread=http-8443-2] [req=ca9d4, org=bz1317922] INFO  org.candlepin.resource.EnvironmentResource - Deleting consumers in environment Environment [id: 3-5, name: Library/foo_cv, owner: Owner [id: 9200a5, key: bz1317922]]
2016-06-28 08:09:43,025 [thread=http-8443-2] [req=ca9d4, org=bz1317922] INFO  org.candlepin.resource.EnvironmentResource - Deleting environment: Environment [id: 3-5, name: Library/foo_cv, owner: Owner [id: 9200a5, key: bz1317922]]
2016-06-28 08:09:43,061 [thread=http-8443-2] [req=ca9d4, org=bz1317922] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=204, content-type="null", time=60

# hashes shortened for better readability
</pre>

no errors in foreman/product.log

btw this seems to be a subscenario of the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1344257

Comment 7 Bryan Kearney 2016-07-27 11:03:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1501