Bug 1609543 - concurrently creating repositories causes most of them are not visible for consumer at the end
Summary: concurrently creating repositories causes most of them are not visible for co...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Candlepin
Version: 6.3.2
Hardware: x86_64
OS: Linux
medium
high vote
Target Milestone: Unspecified
Assignee: Chris Duryee
QA Contact: jcallaha
URL:
Whiteboard:
Depends On: 1656597
Blocks: 1122832 1656611 1656612 1656613
TreeView+ depends on / blocked
 
Reported: 2018-07-29 07:43 UTC by Pavel Moravec
Modified: 2020-02-03 03:01 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1656597 (view as bug list)
Environment:
Last Closed: 2020-02-03 03:01:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Foreman Issue Tracker 25626 Normal Closed race condition when creating multiple repos in same product 2020-03-25 18:24:17 UTC

Description Pavel Moravec 2018-07-29 07:43:37 UTC
Description of problem:
When one concurrently creates repositories - does not matter if RH or custom ones - then syncing the repos, not all of them are visible for consumers after enabling relevant subscription.

The cause (kudos to the customer behind!) is, Actions::Candlepin::Environment::SetContent step in Actions::Katello::Repository::Create is called concurrently based on data collected in planning phase. If that planning phase is run concurrently as well, SetContent is called concurrently like:

"set content to <original> + <1st-repo>" &
"set content to <original> + <2nd-repo>" &
..

That ends up in deleting many contents.

As a result, consumers / Content Hosts attaching the relevant subscription do not see all repos, just the latest ones enabled by latest concurrently called SetContent.


Version-Release number of selected component (if applicable):
Sat 6.3.2


How reproducible:
100% (with a bit of luck)

Steps to Reproduce:
1. run this script to create a ZOO product, associate pulp example zoo repo many times, and sync them (well that is already optional):

hmr="hammer -u admin -p redhat"

org="RedHat"
$hmr product create --name=ZOO --label=ZOO --organization=$org

from=${1:-1}
to=${2:-10}
for i in $(seq $from $to); do
        $hmr repository create --product=ZOO --organization=$org --name=zoo_${i} --content-type=yum --url=https://repos.fedorapeople.org/repos/pulp/pulp/demo_repos/zoo/ &
done
wait

for i in $(seq $from $to); do
        $hmr repository synchronize --organization=$org --product=ZOO --name=zoo_${i} --async &
done

while [ $($hmr task list --search "state = running" | grep -ci sync) -gt 0 ]; do
        sleep 5
done

2. on a Content Host associated to Library LE, attach subscription for the ZOO product.
3. Run there:
subscription-manager repos --list | grep ID | grep zoo


Actual results:
3. shows less than 10 repos


Expected results:
3. to show all 10 repos


Additional info:
another check: in WebUI, see the 10 concurrent "Actions::Katello::Repository::Create" tasks. Click to their dynflow step "12: Actions::Candlepin::Environment::SetContent" and see if "delete_ids: []" is everywhere. If some ID is deleted, we have this problem.

technically, the problem is in /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/actions/katello/content_view/update_environment.rb (again kudos to the customer pointing there, I just agree with his findings):

module Actions
  module Katello
    module ContentView
      class UpdateEnvironment < Actions::Base
        def plan(content_view, environment, new_content_id = nil)
          view_env = content_view.content_view_environment(environment)
          content_ids = content_view.repos(environment).map(&:content_id).uniq.compact
          # in case we create new custom repository that doesn't have the
          # content_id set yet in the plan phase, we allow to pass it as
          # additional argument
          content_ids << new_content_id if new_content_id && !content_ids.include?(new_content_id)
          plan_action(Candlepin::Environment::SetContent,
                      cp_environment_id: view_env.cp_id,
                      content_ids:       content_ids)

          plan_self(:environment_id => environment.id)
        end
      end
    end
  end
end

here content_ids = .. is called concurrently for all repo create tasks, and plan_action prepared to add just the one new environment, ignoring those from concurrently run planning phases

Comment 1 Pavel Moravec 2018-07-29 08:16:31 UTC
btw the bug is on katello (and not candlepin) side, since katello adds and then deletes the content to/from the product, within Actions::Candlepin::Environment::SetContent / in app/lib/actions/candlepin/environment/set_content.rb :

        def run
          saved_cp_ids = existing_ids
          output[:add_ids] = input[:content_ids] - saved_cp_ids
          output[:delete_ids] = saved_cp_ids - input[:content_ids]
..
          until output[:delete_ids].empty?
            begin
              output[:delete_response] = ::Katello::Resources::Candlepin::Environment.
                  delete_content(input[:cp_environment_id], output[:delete_ids])
              break


We really need AddContent action here.

Comment 2 Pavel Moravec 2018-07-29 10:07:27 UTC
I tried patch:

--- /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/actions/katello/content_view/update_environment.rb.orig	2018-07-29 11:35:43.661209506 +0200
+++ /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/actions/katello/content_view/update_environment.rb	2018-07-29 11:22:03.261976079 +0200
@@ -11,7 +11,8 @@ module Actions
           content_ids << new_content_id if new_content_id && !content_ids.include?(new_content_id)
           plan_action(Candlepin::Environment::SetContent,
                       cp_environment_id: view_env.cp_id,
-                      content_ids:       content_ids)
+                      content_ids:       content_ids,
+                      add_only: true)
 
           plan_self(:environment_id => environment.id)
         end

--- /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/actions/candlepin/environment/set_content.rb.orig	2018-07-29 10:54:30.089505492 +0200
+++ /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/actions/candlepin/environment/set_content.rb	2018-07-29 11:15:32.601817157 +0200
@@ -5,6 +5,7 @@ module Actions
         input_format do
           param :cp_environment_id
           param :content_ids, Array
+	  param :add_only
         end
 
         def existing_ids
@@ -15,7 +16,11 @@ module Actions
         end
 
         def run
-          saved_cp_ids = existing_ids
+          if input[:add_only]
+            saved_cp_ids = []
+          else
+            saved_cp_ids = existing_ids
+          end
           output[:add_ids] = input[:content_ids] - saved_cp_ids
           output[:delete_ids] = saved_cp_ids - input[:content_ids]
           max_retries = 4

but...
- katello PUT all requests as it should
- katello did not DELETE the environments from products as before
- BUT a Content host was missing some repos, still :(

Here I blame candlepin, since I have:

2018-07-29 11:52:19,212 [thread=http-bio-8443-exec-4] [req=1b052ea6-ff6e-4afe-b6b5-9757b9aafb55, org=, csid=cbbaaa57] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857938059?enabled=true
2018-07-29 11:52:19,241 [thread=http-bio-8443-exec-10] [req=1ba3ef32-4423-4c72-8591-b71b4661e572, org=, csid=253bfd88] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857938146?enabled=true
2018-07-29 11:52:19,414 [thread=http-bio-8443-exec-5] [req=0a7172c0-4334-4486-9dc7-3d2b0f02a2ef, org=, csid=622dad44] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939237?enabled=true
2018-07-29 11:52:19,465 [thread=http-bio-8443-exec-9] [req=bf9fddd2-7ca3-4312-b744-96d6621e8bad, org=, csid=9248d87d] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939318?enabled=true
2018-07-29 11:52:19,773 [thread=http-bio-8443-exec-10] [req=083949f8-7c65-4149-b625-ef40477b244b, org=, csid=1bd0db1c] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939383?enabled=true
2018-07-29 11:52:19,866 [thread=http-bio-8443-exec-2] [req=d7c25aa6-5885-49a1-82cf-5f2ef4b9e188, org=, csid=7cc72cae] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939665?enabled=true
2018-07-29 11:52:19,888 [thread=http-bio-8443-exec-10] [req=1128e6b5-4c7a-4f3e-8126-bf6ed7e1f51e, org=, csid=2abb7c6d] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939742?enabled=true
2018-07-29 11:52:19,960 [thread=http-bio-8443-exec-5] [req=d99f19b6-40fb-412f-ab1b-3ffb05aa7025, org=, csid=9abdf1fd] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939373?enabled=true
2018-07-29 11:52:19,970 [thread=http-bio-8443-exec-6] [req=e97dd993-bc07-4771-961b-5ac2ad0fbbba, org=, csid=cc206aff] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939580?enabled=true
2018-07-29 11:52:20,097 [thread=http-bio-8443-exec-9] [req=093d1d7b-67a8-4944-a30a-a4a238caff0b, org=, csid=82148849] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939715?enabled=true
2018-07-29 11:52:27,470 [thread=http-bio-8443-exec-10] [req=7d37e676-0089-411a-9b71-93df38350d2f, org=, csid=27f4ab4b] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857943950?enabled=true
2018-07-29 11:52:27,762 [thread=http-bio-8443-exec-6] [req=6ac74bf0-1738-47ab-99cb-5c5d0a03e621, org=, csid=4ffdf15b] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857947309?enabled=true
2018-07-29 11:52:29,740 [thread=http-bio-8443-exec-4] [req=484f125c-5704-4cc3-84e3-43d8a600f35f, org=, csid=12e59b83] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857949572?enabled=true
2018-07-29 11:52:29,788 [thread=http-bio-8443-exec-9] [req=61bfee9f-57db-49e0-b18b-72f860b5931b, org=, csid=eec0202e] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857949573?enabled=true
2018-07-29 11:52:30,116 [thread=http-bio-8443-exec-10] [req=694216e1-e4ce-47ab-bab3-6eb3c6fbb6de, org=, csid=c93e4150] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857949959?enabled=true
2018-07-29 11:52:31,490 [thread=http-bio-8443-exec-1] [req=cdd339d6-0242-401c-b287-a3ef31666409, org=, csid=b0ace6e3] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857950349?enabled=true
2018-07-29 11:52:31,523 [thread=http-bio-8443-exec-8] [req=24bfd552-ed3b-4fbd-b4ff-b3d113616a34, org=, csid=a6d049f9] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857949903?enabled=true
2018-07-29 11:52:31,868 [thread=http-bio-8443-exec-7] [req=714b8bb4-f0e9-42ff-b7a5-a933333b1ed0, org=, csid=67ef464b] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857950033?enabled=true
2018-07-29 11:52:32,046 [thread=http-bio-8443-exec-2] [req=df35ed6e-6b83-48d0-b45f-880529c267a3, org=, csid=32fd02bb] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857951734?enabled=true
2018-07-29 11:52:33,545 [thread=http-bio-8443-exec-7] [req=20e3c262-45cb-43c5-9f4a-a21649e455b4, org=, csid=41042f22] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857951973?enabled=true

(i.e. 20 requests made)

while:

# subscription-manager repos --list | grep ID | grep zoo_66 | sort -n | nl
     1	Repo ID:   RedHat_ZOO_zoo_6652
     2	Repo ID:   RedHat_ZOO_zoo_6653
     3	Repo ID:   RedHat_ZOO_zoo_6654
     4	Repo ID:   RedHat_ZOO_zoo_6655
     5	Repo ID:   RedHat_ZOO_zoo_6657
     6	Repo ID:   RedHat_ZOO_zoo_6661
     7	Repo ID:   RedHat_ZOO_zoo_6662
     8	Repo ID:   RedHat_ZOO_zoo_6664
     9	Repo ID:   RedHat_ZOO_zoo_6665
    10	Repo ID:   RedHat_ZOO_zoo_6666
    11	Repo ID:   RedHat_ZOO_zoo_6667
    12	Repo ID:   RedHat_ZOO_zoo_6668
    13	Repo ID:   RedHat_ZOO_zoo_6669
#

where e.g. RedHat_ZOO_zoo_6663 has content id 1532857939318 that is present in postgres only in:

cp2_content.txt: 8aac022564e56a350164e573f56c028a | 1532857939318 | 2018-07-29 11:52:19.308+02 | 2018-07-29 11:52:19.308+02 | /custom/ZOO/zoo_6663                                                                            |                                                                                                                          | RedHat_ZOO_zoo_6663                                                   |              1 | zoo_6663                                                                                                                     |            |                                | yum            | Custom  |                 |      -49928151 |      0
cp2_environment_content.txt: 8aac022564e56a350164e573f85f02b1 | 2018-07-29 11:52:20.063+02 | 2018-07-29 11:52:20.063+02 | 8aac022564e56a350164e573f56c028a | 1              | 
cp2_owner_content.txt: 8aac022561bcb74a0161bcb7d1570001 | 8aac022564e56a350164e573f56c028a

and the content was NOT deleted from candlepin:

# grep 1532857939318 /var/log/candlepin/candlepin.log
2018-07-29 11:52:19,465 [thread=http-bio-8443-exec-9] [req=bf9fddd2-7ca3-4312-b744-96d6621e8bad, org=, csid=9248d87d] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/125756553231/content/1532857939318?enabled=true
#

It seems that candlepin do NOT enable all contents when requested concurrently a short time after previous one :(

Comment 4 Peter Vreman 2018-10-15 10:59:44 UTC
I have hit this issue also on Sat6.4(Beta)

Comment 6 Peter Vreman 2018-10-19 12:00:55 UTC
The depends on 1122832 shall be a blocks 1122832

Comment 9 Pavel Moravec 2018-11-14 13:38:11 UTC
(In reply to Pavel Moravec from comment #2)
> I tried patch:
..
> It seems that candlepin do NOT enable all contents when requested
> concurrently a short time after previous one :(

Same situation in Sat6.4. This patch decreases probability of hitting the concurrency bug, but does not fully prevent it. Once candlepin has concurrent requests like:

2018-11-14 13:50:53,069 [thread=http-bio-8443-exec-3] [req=3ccabadb-b775-439b-aeb1-212734bd1c51, org=, csid=65e39ff5] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/702732614081/content/1542199852802?enabled=true
2018-11-14 13:50:53,085 [thread=http-bio-8443-exec-2] [req=4f509296-d059-47b5-92b6-6681aee7004b, org=, csid=248c0c8c] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/RedHat/products/702732614081/content/1542199852951?enabled=true
2018-11-14 13:50:53,208 [thread=http-bio-8443-exec-3] [req=3ccabadb-b775-439b-aeb1-212734bd1c51, org=, csid=65e39ff5] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=140
2018-11-14 13:50:53,256 [thread=http-bio-8443-exec-2] [req=4f509296-d059-47b5-92b6-6681aee7004b, org=, csid=248c0c8c] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=171

either of them is ignored - "addContent" method cant be run concurrently :-/

However, one can re-play those POST methods once again (manually) to ensure they will be really executed.

Anyway, this is kind of nontrivial workaround..

Comment 10 Pavel Moravec 2018-11-15 13:51:42 UTC
candlepin bug:

calling:

verb=POST, uri=/candlepin/owners/RedHat/products/702732614081/content/1542199852802?enabled=true
verb=POST, uri=/candlepin/owners/RedHat/products/702732614081/content/1542199852951?enabled=true

for the same owner and product (for different content to be enabled) ends up by enabling just one of them - usually or always the later one.

I suspect the problem is in calling

https://github.com/candlepin/candlepin/blob/master/server/src/main/java/org/candlepin/resource/OwnerProductResource.java#L353

that ends up in concurrent call of:

https://github.com/candlepin/candlepin/blob/master/server/src/main/java/org/candlepin/dto/api/v1/ProductDTO.java#L532

that needs to be called sequentially.


Chris, could you please confirm my assumption is correct? And how to efficiently implement such a lock on the addProductContent method?

Comment 11 Pavel Moravec 2018-11-15 17:13:47 UTC
(In reply to Pavel Moravec from comment #2)
> I tried patch:
> 
> ---
> /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/
> actions/katello/content_view/update_environment.rb.orig	2018-07-29
> 11:35:43.661209506 +0200
> +++
> /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/
> actions/katello/content_view/update_environment.rb	2018-07-29
> 11:22:03.261976079 +0200
> @@ -11,7 +11,8 @@ module Actions
>            content_ids << new_content_id if new_content_id &&
> !content_ids.include?(new_content_id)
>            plan_action(Candlepin::Environment::SetContent,
>                        cp_environment_id: view_env.cp_id,
> -                      content_ids:       content_ids)
> +                      content_ids:       content_ids,
> +                      add_only: true)
>  
>            plan_self(:environment_id => environment.id)
>          end
> 
> ---
> /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/
> actions/candlepin/environment/set_content.rb.orig	2018-07-29
> 10:54:30.089505492 +0200
> +++
> /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5.74/app/lib/
> actions/candlepin/environment/set_content.rb	2018-07-29 11:15:32.601817157
> +0200
> @@ -5,6 +5,7 @@ module Actions
>          input_format do
>            param :cp_environment_id
>            param :content_ids, Array
> +	  param :add_only
>          end
>  
>          def existing_ids
> @@ -15,7 +16,11 @@ module Actions
>          end
>  
>          def run
> -          saved_cp_ids = existing_ids
> +          if input[:add_only]
> +            saved_cp_ids = []
> +          else
> +            saved_cp_ids = existing_ids
> +          end
>            output[:add_ids] = input[:content_ids] - saved_cp_ids
>            output[:delete_ids] = saved_cp_ids - input[:content_ids]
>            max_retries = 4
> 
> but...
> - katello PUT all requests as it should
> - katello did not DELETE the environments from products as before
> - BUT a Content host was missing some repos, still :(
> 
> Here I blame candlepin, since I have:
> 
> 2018-07-29 11:52:19,212 [thread=http-bio-8443-exec-4]
> [req=1b052ea6-ff6e-4afe-b6b5-9757b9aafb55, org=, csid=cbbaaa57] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857938059?enabled=true
> 2018-07-29 11:52:19,241 [thread=http-bio-8443-exec-10]
> [req=1ba3ef32-4423-4c72-8591-b71b4661e572, org=, csid=253bfd88] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857938146?enabled=true
> 2018-07-29 11:52:19,414 [thread=http-bio-8443-exec-5]
> [req=0a7172c0-4334-4486-9dc7-3d2b0f02a2ef, org=, csid=622dad44] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939237?enabled=true
> 2018-07-29 11:52:19,465 [thread=http-bio-8443-exec-9]
> [req=bf9fddd2-7ca3-4312-b744-96d6621e8bad, org=, csid=9248d87d] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939318?enabled=true
> 2018-07-29 11:52:19,773 [thread=http-bio-8443-exec-10]
> [req=083949f8-7c65-4149-b625-ef40477b244b, org=, csid=1bd0db1c] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939383?enabled=true
> 2018-07-29 11:52:19,866 [thread=http-bio-8443-exec-2]
> [req=d7c25aa6-5885-49a1-82cf-5f2ef4b9e188, org=, csid=7cc72cae] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939665?enabled=true
> 2018-07-29 11:52:19,888 [thread=http-bio-8443-exec-10]
> [req=1128e6b5-4c7a-4f3e-8126-bf6ed7e1f51e, org=, csid=2abb7c6d] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939742?enabled=true
> 2018-07-29 11:52:19,960 [thread=http-bio-8443-exec-5]
> [req=d99f19b6-40fb-412f-ab1b-3ffb05aa7025, org=, csid=9abdf1fd] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939373?enabled=true
> 2018-07-29 11:52:19,970 [thread=http-bio-8443-exec-6]
> [req=e97dd993-bc07-4771-961b-5ac2ad0fbbba, org=, csid=cc206aff] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939580?enabled=true
> 2018-07-29 11:52:20,097 [thread=http-bio-8443-exec-9]
> [req=093d1d7b-67a8-4944-a30a-a4a238caff0b, org=, csid=82148849] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939715?enabled=true
> 2018-07-29 11:52:27,470 [thread=http-bio-8443-exec-10]
> [req=7d37e676-0089-411a-9b71-93df38350d2f, org=, csid=27f4ab4b] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857943950?enabled=true
> 2018-07-29 11:52:27,762 [thread=http-bio-8443-exec-6]
> [req=6ac74bf0-1738-47ab-99cb-5c5d0a03e621, org=, csid=4ffdf15b] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857947309?enabled=true
> 2018-07-29 11:52:29,740 [thread=http-bio-8443-exec-4]
> [req=484f125c-5704-4cc3-84e3-43d8a600f35f, org=, csid=12e59b83] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857949572?enabled=true
> 2018-07-29 11:52:29,788 [thread=http-bio-8443-exec-9]
> [req=61bfee9f-57db-49e0-b18b-72f860b5931b, org=, csid=eec0202e] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857949573?enabled=true
> 2018-07-29 11:52:30,116 [thread=http-bio-8443-exec-10]
> [req=694216e1-e4ce-47ab-bab3-6eb3c6fbb6de, org=, csid=c93e4150] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857949959?enabled=true
> 2018-07-29 11:52:31,490 [thread=http-bio-8443-exec-1]
> [req=cdd339d6-0242-401c-b287-a3ef31666409, org=, csid=b0ace6e3] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857950349?enabled=true
> 2018-07-29 11:52:31,523 [thread=http-bio-8443-exec-8]
> [req=24bfd552-ed3b-4fbd-b4ff-b3d113616a34, org=, csid=a6d049f9] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857949903?enabled=true
> 2018-07-29 11:52:31,868 [thread=http-bio-8443-exec-7]
> [req=714b8bb4-f0e9-42ff-b7a5-a933333b1ed0, org=, csid=67ef464b] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857950033?enabled=true
> 2018-07-29 11:52:32,046 [thread=http-bio-8443-exec-2]
> [req=df35ed6e-6b83-48d0-b45f-880529c267a3, org=, csid=32fd02bb] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857951734?enabled=true
> 2018-07-29 11:52:33,545 [thread=http-bio-8443-exec-7]
> [req=20e3c262-45cb-43c5-9f4a-a21649e455b4, org=, csid=41042f22] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857951973?enabled=true
> 
> (i.e. 20 requests made)
> 
> while:
> 
> # subscription-manager repos --list | grep ID | grep zoo_66 | sort -n | nl
>      1	Repo ID:   RedHat_ZOO_zoo_6652
>      2	Repo ID:   RedHat_ZOO_zoo_6653
>      3	Repo ID:   RedHat_ZOO_zoo_6654
>      4	Repo ID:   RedHat_ZOO_zoo_6655
>      5	Repo ID:   RedHat_ZOO_zoo_6657
>      6	Repo ID:   RedHat_ZOO_zoo_6661
>      7	Repo ID:   RedHat_ZOO_zoo_6662
>      8	Repo ID:   RedHat_ZOO_zoo_6664
>      9	Repo ID:   RedHat_ZOO_zoo_6665
>     10	Repo ID:   RedHat_ZOO_zoo_6666
>     11	Repo ID:   RedHat_ZOO_zoo_6667
>     12	Repo ID:   RedHat_ZOO_zoo_6668
>     13	Repo ID:   RedHat_ZOO_zoo_6669
> #
> 
> where e.g. RedHat_ZOO_zoo_6663 has content id 1532857939318 that is present
> in postgres only in:
> 
> cp2_content.txt: 8aac022564e56a350164e573f56c028a | 1532857939318 |
> 2018-07-29 11:52:19.308+02 | 2018-07-29 11:52:19.308+02 |
> /custom/ZOO/zoo_6663                                                        
> |                                                                           
> | RedHat_ZOO_zoo_6663                                                   |   
> 1 | zoo_6663                                                                
> |            |                                | yum            | Custom  |  
> |      -49928151 |      0
> cp2_environment_content.txt: 8aac022564e56a350164e573f85f02b1 | 2018-07-29
> 11:52:20.063+02 | 2018-07-29 11:52:20.063+02 |
> 8aac022564e56a350164e573f56c028a | 1              | 
> cp2_owner_content.txt: 8aac022561bcb74a0161bcb7d1570001 |
> 8aac022564e56a350164e573f56c028a
> 
> and the content was NOT deleted from candlepin:
> 
> # grep 1532857939318 /var/log/candlepin/candlepin.log
> 2018-07-29 11:52:19,465 [thread=http-bio-8443-exec-9]
> [req=bf9fddd2-7ca3-4312-b744-96d6621e8bad, org=, csid=9248d87d] INFO 
> org.candlepin.common.filter.LoggingFilter - Request: verb=POST,
> uri=/candlepin/owners/RedHat/products/125756553231/content/
> 1532857939318?enabled=true
> #
> 
> It seems that candlepin do NOT enable all contents when requested
> concurrently a short time after previous one :(


The patch isnt complete even on katello side as it raises redundant but harm-less "add environment content" requests like:

2018-11-15 17:57:46,051 [thread=http-bio-8443-exec-5] [req=2d23727a-b728-4914-bdb7-20baff9ea588, org=, csid=97dabfa4] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/environments/a82007c47b25723539a00ff94e8b3d27/content
2018-11-15 17:57:46,060 [thread=http-bio-8443-exec-5] [req=2d23727a-b728-4914-bdb7-20baff9ea588, org=RedHat, csid=97dabfa4] WARN  org.candlepin.audit.EventSinkImpl - Rolling back ActiveMQ transaction.
2018-11-15 17:57:46,061 [thread=http-bio-8443-exec-5] [req=2d23727a-b728-4914-bdb7-20baff9ea588, org=RedHat, csid=97dabfa4] DEBUG org.candlepin.servlet.filter.EventFilter - Request failed, skipping event sending, status=Conflict
2018-11-15 17:57:46,061 [thread=http-bio-8443-exec-5] [req=2d23727a-b728-4914-bdb7-20baff9ea588, org=RedHat, csid=97dabfa4] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=409, content-type="application/json", time=10
2018-11-15 17:57:46,109 [thread=http-bio-8443-exec-7] [req=eb890b5a-bc72-42ca-8573-c2d18c5ccb59, org=, csid=aab0955e] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/environments/a82007c47b25723539a00ff94e8b3d27/content
2018-11-15 17:57:46,134 [thread=http-bio-8443-exec-7] [req=eb890b5a-bc72-42ca-8573-c2d18c5ccb59, org=RedHat, csid=aab0955e] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=202, content-type="application/json", time=25

Comment 12 Chris Duryee 2018-12-04 18:26:04 UTC
NOTE: attached katello issue only fixes 1/2 of this bug. A candlepin fix is also needed.

Comment 13 pm-sat@redhat.com 2018-12-07 21:05:01 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25626 has been resolved.

Comment 14 Chris "Ceiu" Rog 2018-12-14 14:06:57 UTC
@Pavel:
That sounds about right. We've a couple options to ensure we don't clobber simultaneous updates that are going through review now, so this should be fixed on the CP side soon.

Comment 15 Justin Sherrill 2019-01-09 21:49:51 UTC
Chatted with the candlepin team.  It will take until Satellite 6.7 or 6.8 to get a fix in at the earliest to properly resolve this issue, as its not a small change.  So any real fix will likely wait for then.

If we need a shorter term fix, i suggest one solution.  We don't prevent the problem from occuring, but instead provide a way to correct the problem after the fact, either automatically or manually.

I could imagine either:
a) some script the user runs to correct all repositories
b) on each sync or content upload katello checks to verify that a repositories content is correctly assigned to a product and if not, does the assignment

Comment 16 Brad Buckingham 2019-01-09 21:54:36 UTC
Since additional changes are needed, we are going to mark this as 'failedQA' so that it won't be moved to POST by automation.  Once the additional changes are in, we can remove the failedQA.

Comment 18 Pavel Moravec 2019-04-03 12:04:28 UTC
(In reply to Justin Sherrill from comment #15)
> Chatted with the candlepin team.  It will take until Satellite 6.7 or 6.8 to
> get a fix in at the earliest to properly resolve this issue, as its not a
> small change.  So any real fix will likely wait for then.
> 
> If we need a shorter term fix, i suggest one solution.  We don't prevent the
> problem from occuring, but instead provide a way to correct the problem
> after the fact, either automatically or manually.
> 
> I could imagine either:
> a) some script the user runs to correct all repositories
> b) on each sync or content upload katello checks to verify that a
> repositories content is correctly assigned to a product and if not, does the
> assignment

Is the option b) implemented via https://github.com/Katello/katello/pull/8034 or is that an independent retry mechanism (I think so)?

Comment 19 Peter Vreman 2019-04-03 12:12:13 UTC
Feedback from an user perspecitive: Things are not fixed but already improved a lot with only having https://github.com/Katello/katello/pull/7867.


On Sat6.5.0Beta I have also applied the Katello fix from https://github.com/Katello/katello/pull/7867.

With Sat65 I then get now Candlepin failures that stop the task with an error instead of corrupting data:

production.log-20190320.gz:2019-03-19T11:27:59 [E|app|ed0b49cb] ForemanTasks::TaskError: Task 80587a4e-d87f-4835-888d-4e554a704509: Katello::Errors::CandlepinError: Runtime Error Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.candlepin.model.OwnerProduct#org.candlepin.model.OwnerProductKey@d40d0129] at org.hibernate.dialect.lock.PessimisticWriteSelectLockingStrategy.lock:76

I can restart the Tasks and it finishes correctly.

In my tests sofar i did not notice any issue on the clients sofar that content was unavailable


At least for me already having https://github.com/Katello/katello/pull/7867 is still an improvement to be able to restart the task only and not having to need 30min indentification of the problem and 30mins repairing using the 'foreman-rake console' with ruby commands to fix the assigned content ids.

Comment 20 Jonathon Turel 2019-04-09 15:49:12 UTC
Changing BZ component to Candlepin since this has been addressed from the Katello side.

Comment 21 Bryan Kearney 2019-04-09 16:03:58 UTC
Upstream bug assigned to cduryee@redhat.com

Comment 22 Pavel Moravec 2019-07-25 07:26:30 UTC
(In reply to Peter Vreman from comment #19)
> Feedback from an user perspecitive: Things are not fixed but already
> improved a lot with only having https://github.com/Katello/katello/pull/7867.
> 
> 
> On Sat6.5.0Beta I have also applied the Katello fix from
> https://github.com/Katello/katello/pull/7867.
> 
> With Sat65 I then get now Candlepin failures that stop the task with an
> error instead of corrupting data:
> 
> production.log-20190320.gz:2019-03-19T11:27:59 [E|app|ed0b49cb]
> ForemanTasks::TaskError: Task 80587a4e-d87f-4835-888d-4e554a704509:
> Katello::Errors::CandlepinError: Runtime Error Row was updated or deleted by
> another transaction (or unsaved-value mapping was incorrect) :
> [org.candlepin.model.OwnerProduct#org.candlepin.model.
> OwnerProductKey@d40d0129] at
> org.hibernate.dialect.lock.PessimisticWriteSelectLockingStrategy.lock:76
> 
> I can restart the Tasks and it finishes correctly.
> 
> In my tests sofar i did not notice any issue on the clients sofar that
> content was unavailable
> 
> 
> At least for me already having https://github.com/Katello/katello/pull/7867
> is still an improvement to be able to restart the task only and not having
> to need 30min indentification of the problem and 30mins repairing using the
> 'foreman-rake console' with ruby commands to fix the assigned content ids.


I can confirm the same candlepin lock bug appears in 6.5 also when deleting a product with many repos inside (like 100). Then many Actions::Candlepin::Product::ContentRemove dynflow steps fail due to that.

Comment 23 Justin Sherrill 2020-02-03 03:01:26 UTC
The upstream fix didn't appear to make 6.5, but did make 6.6.  We'll leave this open based on pavel's comments for possible improvement.


Note You need to log in before you can comment on or make changes to this bug.