Description of problem: With a little whitebox knowledge from ckozak, we now have a scenario by which two consumers can somewhat reliably cause a deadlock situation. The scenario has been automated in our test bucket and the manual description is below. Version-Release number of selected component (if applicable): [root@jsefler-7server ~]# subscription-manager version server type: Red Hat Subscription Management subscription management server: 0.9.9-1 <======= subscription-manager: 1.10.14-7.el7 python-rhsm: 1.10.12-2.el7 How reproducible: often, but not absolutely every time Steps to Reproduce: 1. from two separate systems, register consumer1 and consumer2 within the same org 2. from each system, list --all --available pools (the more the better) 3. from consumer1 attach all the pools 4. from consumer2 attach all the pools in the reverse order attached by consumer 1 (a different order is crucial) 5 simultaneously unsubscribe --all from each consumer Actual results (taken from an automated test log): 201405081727:34.995 - INFO: ssh root.redhat.com subscription-manager unsubscribe --all 201405081727:35.127 - INFO: ssh root.redhat.com subscription-manager unsubscribe --all 201405081727:43.152 - INFO: SSHCommandResult from an attempt to unsubscribe all on 'jsefler-7server.usersys.redhat.com': exitCode=0 stdout='55 subscriptions removed at the server. 55 local certificates have been deleted.' stderr='' 201405081727:43.153 - INFO: SSHCommandResult from an attempt to unsubscribe all on 'jsefler-7workstation.usersys.redhat.com': exitCode=255 stdout='' stderr='Runtime Error ERROR: deadlock detected Detail: Process 20383 waits for ShareLock on transaction 49111010; blocked by process 24227. Process 24227 waits for ShareLock on transaction 49111011; blocked by process 20383. Hint: See server log for query details. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,102' From the log above, you will see that consumer1 was able to successfully unsubscribe from all 55 attached subscriptions. However, consumer 2 encountered a deadlock runtime error.
commit 334bcceab9708c9e6457618afe1e581726d89cde Author: ckozak <ckozak> Date: Thu May 8 13:14:50 2014 -0400 Some preliminary code to avoid deadlock situations This is not a complete sollution, but does reduce the chances of deadlocks in the most common places. Deadlocks may still occur when removing entitlements that create subpools while subpools are being bound (with other pools) As well as updating subpools. We need to lock the source pool before updating derived pools, which makes enforcing order much harder. We might be able to get around that by leveraging the "created" date (breaking ties with id)
Verifying Candlepin Version 0.9.19-1.... [root@jsefler-7server ~]# subscription-manager version server type: This system is currently not registered. subscription management server: 0.9.19-1 <======= subscription-manager: 1.10.14-7.el7 python-rhsm: 1.10.12-2.el7 [root@jsefler-7workstation ~]# subscription-manager version server type: This system is currently not registered. subscription management server: 0.9.19-1 <======= subscription-manager: 1.10.14-7.el7 python-rhsm: 1.10.12-2.el7 The automated test that was implemented to catch the failure in comment 0 is now passing (rhsm.cli.tests.UnsubscribeTests.MultiClientAttemptToDeadLockOnUnsubscribeAll_Test). Below is a verification excerpt from the automated test log corresponding to the failure point in comment 0. 201406170838:38.857 - INFO: Simultaneously attempting to unsubscribe all on 'jsefler-7server.usersys.redhat.com' and 'jsefler-7workstation.usersys.redhat.com'... 201406170838:38.857 - INFO: ssh root.redhat.com subscription-manager unsubscribe --all 201406170838:38.946 - INFO: ssh root.redhat.com subscription-manager unsubscribe --all 201406170838:56.981 - INFO: SSHCommandResult from an attempt to unsubscribe all on 'jsefler-7server.usersys.redhat.com': exitCode=0 stdout='52 subscriptions removed at the server. 52 local certificates have been deleted. ' stderr='' 201406170838:56.981 - INFO: SSHCommandResult from an attempt to unsubscribe all on 'jsefler-7workstation.usersys.redhat.com': exitCode=0 stdout='52 subscriptions removed at the server. 52 local certificates have been deleted. ' stderr='' 201406170838:56.982 - INFO: Asserted: Actual value of '0' matches expected value: The exit code from the unsubscribe all command on 'jsefler-7server.usersys.redhat.com'. 201406170838:56.982 - INFO: Asserted: Actual value of '' matches expected value: Stderr from the unsubscribe all on 'jsefler-7server.usersys.redhat.com'. 201406170838:56.982 - INFO: Asserted: Actual value of '0' matches expected value: The exit code from the unsubscribe all command on 'jsefler-7workstation.usersys.redhat.com'. 201406170838:56.982 - INFO: Asserted: Actual value of '' matches expected value: Stderr from the unsubscribe all on 'jsefler-7workstation.usersys.redhat.com'. Moving to status: VERIFIED
Cleaning up old bugs. These were verified but never closed.