Bug 1369463

Summary: multiple manifest uploads are deadlocking
Product: Red Hat Satellite Reporter: Lukas Pramuk <lpramuk>
Component: Subscription ManagementAssignee: Justin Sherrill <jsherril>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2.13CC: andrew.schofield, bbuckingham, bcourt, jsherril, rplevka, wpinheir
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: tfm-rubygem-katello-3.4.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-21 16:54:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lukas Pramuk 2016-08-23 13:24:31 UTC
Description of problem:
multiple manifest uploads are deadlocking, we have race condition here.

Version-Release number of selected component (if applicable):
6.2.1

How reproducible:
reliably

Steps to Reproduce:
1. Prepare multiple organizations for manifest uploads
# for i in {1..9} ; do hammer organization create --name "Organization #$i" ; done

2. Simultaneously upload multiple manifests 
# for i in {1..9} ; do hammer subscription upload --file `clone_manifest manifest.zip fake_manifest.key` --organization "Organization #$i" & done
[10] 31917
[11] 31918
[12] 31919
[13] 31921
[14] 31923
[15] 31926
[16] 31928
[17] 31931
[18] 31936
[..............................................................................................................] [100%]
Error: Runtime Error ERROR: deadlock detected
  Detail: Process 31377 waits for ShareLock on transaction 709166; blocked by process 31889.
Process 31889 waits for ExclusiveLock on tuple (0,19) of relation 16580 of database 16386; blocked by process 31377.
  Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102
[..............................................................................................................] [100%]
Error: Runtime Error Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched:81
[..............................................................................................................] [100%]
Error: Runtime Error ERROR: deadlock detected
  Detail: Process 29937 waits for ExclusiveLock on tuple (0,19) of relation 16580 of database 16386; blocked by process 31889.
Process 31889 waits for ShareLock on transaction 709188; blocked by process 29937.
  Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102
[..............................................................................................................] [100%]
Error: Runtime Error ERROR: deadlock detected
  Detail: Process 32379 waits for ExclusiveLock on tuple (0,19) of relation 16580 of database 16386; blocked by process 31889.
Process 31889 waits for ShareLock on transaction 709188; blocked by process 29937.
Process 29937 waits for ExclusiveLock on tuple (0,19) of relation 16580 of database 16386; blocked by process 32379.
  Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102
[..............................................................................................................] [100%]
Error: Runtime Error ERROR: deadlock detected
  Detail: Process 32432 waits for ShareLock on transaction 709329; blocked by process 31889.
Process 31889 waits for ExclusiveLock on tuple (0,50) of relation 16580 of database 16386; blocked by process 32432.
  Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102
[..............................................................................................................] [100%]
Error: Runtime Error ERROR: deadlock detected
  Detail: Process 32433 waits for ShareLock on transaction 709329; blocked by process 31889.
Process 31889 waits for ExclusiveLock on tuple (0,50) of relation 16580 of database 16386; blocked by process 32433.
  Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102
[..............................................................................................................] [100%]
Error: Runtime Error Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched:81
[                                                                                                                ] [0%][[                                                                                                                ] [0%][[                                                                                                                ] [0%][[                                                                                                                ] [0%][[                                                                                                                ] [0%][[                                                                                                                ] [0%][[                                                                                                                ] [0%][[..............................................................................................................] [100%][..............................................................................................................] [100%]
[..............................................................................................................] [100%]
Error: PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_on_cp_id"
DETAIL:  Key (cp_id)=(ff80808156a722610156b7811da51b17) already exists.
: INSERT INTO "katello_subscriptions" ("cp_id", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"


Actual results:
deadlocking tasks (successful 1/9)
(With good timing you can hit it with just 2 manifests)


Expected results:
no deadlocks

Additional info:
I dont expect CUs uploading 10 manifest at the same time, but we aim to improve backend, right?
Consider 10 manifests upload as means of exposing the backend flaw...not a real scenario.

Comment 1 Lukas Pramuk 2016-08-23 13:34:53 UTC
3 different errors occurred (3 beasts captured):

1)
Runtime Error ERROR: deadlock detected
  Detail: Process 31377 waits for ShareLock on transaction 709166; blocked by process 31889.
Process 31889 waits for ExclusiveLock on tuple (0,19) of relation 16580 of database 16386; blocked by process 31377.
  Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102

2)
Runtime Error Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched:81

3)
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_on_cp_id"
DETAIL:  Key (cp_id)=(ff80808156a722610156b7811da51b17) already exists.
: INSERT INTO "katello_subscriptions" ("cp_id", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"

Comment 5 Barnaby Court 2016-12-16 19:47:53 UTC
This has been fixed in Candlepin 2.0.

Comment 6 Lukas Pramuk 2017-06-19 12:34:29 UTC
FailedQA.

@satellite-6.3.0-15.0.beta.el7sat.noarch
foreman-1.15.0-1.el7sat.noarch
katello-3.4.1-1.el7sat.noarch
candlepin-2.0.35-1.el7.noarch

2.
# for i in {1..9} ; do hammer subscription upload --file `clone_manifest manifest.zip fake_manifest.key` --organization "Organization #$i" & done
[1] 29123
[2] 29124
[3] 29125
[4] 29126
[5] 29127
[6] 29128
[7] 29129
[8] 29130
[9] 29131
[............................                                                                                    ] [25%]
Failed to import archive
[........................................................                                                        ] [50%]
Failed to import archive
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_products_on_subs_id_prod_id"
DETAIL:  Key (subscription_id, product_id)=(4, 1) already exists.
: INSERT INTO "katello_subscription_products" ("subscription_id", "product_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"
[...............................................................................................................] [100%]
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_products_on_subs_id_prod_id"
DETAIL:  Key (subscription_id, product_id)=(7, 140) already exists.
: INSERT INTO "katello_subscription_products" ("subscription_id", "product_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"
[........................................................                                                        ] [50%]
Failed to import archive
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_products_on_subs_id_prod_id"
DETAIL:  Key (subscription_id, product_id)=(12, 21) already exists.
: INSERT INTO "katello_subscription_products" ("subscription_id", "product_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"
[...............................................................................................................] [100%]
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_products_on_subs_id_prod_id"
DETAIL:  Key (subscription_id, product_id)=(10, 139) already exists.
: INSERT INTO "katello_subscription_products" ("subscription_id", "product_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"
[...............................................................................................................] [100%]
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_products_on_subs_id_prod_id"
DETAIL:  Key (subscription_id, product_id)=(10, 139) already exists.
: INSERT INTO "katello_subscription_products" ("subscription_id", "product_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"
[........................................................                                                        ] [50%]
Failed to import archive
no implicit conversion of String into Integer
[...............................................................................................................] [100%]
PG::Error: ERROR:  duplicate key value violates unique constraint "index_katello_subscriptions_products_on_subs_id_prod_id"
DETAIL:  Key (subscription_id, product_id)=(16, 85) already exists.
: INSERT INTO "katello_subscription_products" ("subscription_id", "product_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id"
[...............................................................................................................] [100%]


4 paused error tasks / 4 stopped with warning / 1 success only

Comment 7 Barnaby Court 2017-06-19 14:56:30 UTC
re-routing to subscription management as the QA failure is in the katello_subscription_products table which is not controlled by candlepin.

Comment 8 Lukas Pramuk 2017-08-30 19:44:59 UTC
VERIFIED.

@satellite-6.3.0-16.0.beta.el7sat.noarch
foreman-1.15.3-2.el7sat.noarch
katello-3.4.4-2.el7sat.noarch
candlepin-2.0.40-1.el7.noarch

2.
# for i in {1..9} ; do hammer subscription upload --file `clone_manifest manifest.zip fake_manifest.key` --organization "Organization #$i" & done
[1] 18374
[2] 18375
[3] 18377
[4] 18379
[5] 18381
[6] 18384
[7] 18386
[8] 18390
[9] 18395
[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]

[...................................................................................................................................] [100%]


>>> all concurrent tasks finished successfully (9 of 9)

Comment 9 Satellite Program 2018-02-21 16:54:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> 
> For information on the advisory, and where to find the updated files, follow the link below.
> 
> If the solution does not work for you, open a new bug report.
> 
> https://access.redhat.com/errata/RHSA-2018:0336