+++ This bug was initially created as a clone of Bug #1023334 +++ Description of problem: When cloning a channel using spacewalk-clone-by-date the command ends with deadlock error. Version-Release number of selected component (if applicable): Spacewalk 2.0 How reproducible: Randomly Steps to Reproduce: 1. Fresh install of Satellite 2.0 2. run spacewalk-clone-by-date -g \ -y --channels=rhel-x86_64-server-6 stable-rhel-x86_64-server-6-20131015 \ -d 2013-10-15 Actual results: Traceback (most recent call last): File "/usr/bin/spacewalk-clone-by-date", line 251, in <module> sys.exit(abs(main() or 0)) File "/usr/bin/spacewalk-clone-by-date", line 241, in main return cloneByDate.main(args) File "/usr/share/rhn/utils/cloneByDate.py", line 191, in main cloner.clone(options.skip_depsolve) File "/usr/share/rhn/utils/cloneByDate.py", line 356, in clone self.dep_solve([pkg['nvrea'] for pkg in added_pkgs]) File "/usr/share/rhn/utils/cloneByDate.py", line 379, in dep_solve self.process_deps(dep_results) File "/usr/share/rhn/utils/cloneByDate.py", line 418, in process_deps cloner.process_deps(needed) File "/usr/share/rhn/utils/cloneByDate.py", line 500, in process_deps self.remote_api.add_packages(self.to_label, needed_ids) File "/usr/share/rhn/utils/cloneByDate.py", line 689, in add_packages self.client.channel.software.addPackages(self.auth_token, label, pkg_set) File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request verbose=self.__verbose File "/usr/lib64/python2.6/xmlrpclib.py", line 1253, in request return self._parse_response(h.getfile(), sock) File "/usr/lib64/python2.6/xmlrpclib.py", line 1392, in _parse_response return u.close() File "/usr/lib64/python2.6/xmlrpclib.py", line 838, in close raise Fault(**self._stack[0]) xmlrpclib.Fault: <Fault -1: 'redstone.xmlrpc.XmlRpcFault: unhandled internal exception: ERROR: deadlock detected\n Detail: Process 2624 waits for ShareLock on transaction 3064927; blocked by process 2627.\nProcess 2627 waits for ExclusiveLock on tuple (8,27) of relation 17242 of database 16384; blocked by process 2624.\n Hint: See server log for query details.\n Where: SQL statement "update rhnChannel set last_modified = $1 where id = $2 "\nPL/pgSQL function "update_channel" line 23 at SQL statement'> Expected results: Channel cloned without error Additional info: This problem is triggered by two API calls when run in parallel: * errata.cloneAsOriginalAsync() - this function (among other things) calls the following PL/SQL function: rhn_channel.refresh_newest_package() * channel.software.addPackages() - this function calls the following PL/SQL functions in the following order: rhn_channel.update_channel() rhn_channel.refresh_newest_package() The race condition triggering this deadlock: 1. errata.cloneAsOriginalAsync() is executed. It calls rhn_channel.refresh_newest_package(), which does the following: - insert into rhnChannelNewestPackage - delete from rhnChannelNewesetPackage - insert into rhnChannelNewestPackageAudit At this point, the procedure would like to do - update rhnChannel but before it does so, the other API call kicks in. 2. channel.software.addPackages() is executed. It calls rhn_channel.update_channel(), which does the following: - update rhnChannel This update acquires lock on the rhnChannel table. 3. errata.cloneAsOriginalAsync() would like to do - update rhnChannel but is unable to, b/c it has to wait for the rhnChannel table lock to be released. 4. channel.software.addPackages() wants to execute rhn_channel.refresh_newest_package(), which would like to do the inserts and delete as described in step 1, but it has to wait, b/c the locks from step 1 are not released yet. At this point, errata.cloneAsOriginalAsync() waits for channel.software.addPackages() and channel.software.addPackages() waits for errata.cloneAsOriginalAsync() -> deadlock. Also, this scenario is possible only when spacewalk-clone-by-date is used with -g option.
spacewalk.git master: d0815058c8892c8f5a0642cc592ff97fcaa49374
Switching MODIFIED Spacewalk bugs to ON_QA before 2.1 release.
Previous commits reverted in spacewalk.git master: ab519082232175cba498721360298ddd80e458ab aabceb78217f6402558bde106d886511ebacb027 Locking rhnChannel table at the beginning of rhn_channel.refresh_newest_package won't suffice, since the resources, over which the two deadlocking transactions compete were locked before entry to the procedure.
Fixed in spacewalk.git master: bded6355c0ef72f980cbedac7ede350bf6b81fc6
Race condition described in the initial comment was not valid. Race condition leading to the deadlock: * Transaction 1: insert into rhnChannelPackage (channel_id, package_id) values (1, 1) * Transaction 2: insert into rhnChannelPackage (channel_id, package_id) values (1, 2) * Transaction 1: rhn_channel.update_channel(1) Part of this procedure is: update rhnChannel set ... where id = 1 PostgreSQL grants transaction 1 exclusive lock on rhnChannel table, but at this point, transaction 1 has to wait till transaction 2 finishes, since both transaction 1 and 2 hold a RowShareLock on rhnChannel row where rhnChannel.id = 1 * Transaction 2: rhn_channel.refresh_newest_package(1) Part of this procedure is: update rhnChannel set ... where id = 1 Transaction 2 at this point won't even be able to acquire ExclusiveLock on rhnChannel table, so it has to wait, till transaction 1 finishes. Transaction 1 waits for transaction 2 to finish & transaction 2 waits for transaction 1 to finish -> deadlock.
Spacewalk 2.1 has been released. https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21