Bug 1053591 - Cloning channel with "spacewalk-clone-by-date" ends with deadlock error
Summary: Cloning channel with "spacewalk-clone-by-date" ends with deadlock error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 2.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Milan Zázrivec
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: space21
TreeView+ depends on / blocked
 
Reported: 2014-01-15 13:19 UTC by Milan Zázrivec
Modified: 2018-12-04 16:59 UTC (History)
7 users (show)

Fixed In Version: spacewalk-java-2.1.124-1
Clone Of: 1023334
Environment:
Last Closed: 2014-03-04 13:06:39 UTC
Embargoed:


Attachments (Terms of Use)

Description Milan Zázrivec 2014-01-15 13:19:43 UTC
+++ This bug was initially created as a clone of Bug #1023334 +++

Description of problem:
When cloning a channel using spacewalk-clone-by-date the command ends with deadlock error.

Version-Release number of selected component (if applicable):
Spacewalk 2.0

How reproducible:
Randomly


Steps to Reproduce:
1. Fresh install of Satellite 2.0
2. run spacewalk-clone-by-date -g \
     -y --channels=rhel-x86_64-server-6 stable-rhel-x86_64-server-6-20131015 \
     -d 2013-10-15

Actual results:
Traceback (most recent call last):
  File "/usr/bin/spacewalk-clone-by-date", line 251, in <module>
    sys.exit(abs(main() or 0))
  File "/usr/bin/spacewalk-clone-by-date", line 241, in main
    return cloneByDate.main(args)
  File "/usr/share/rhn/utils/cloneByDate.py", line 191, in main
    cloner.clone(options.skip_depsolve)
  File "/usr/share/rhn/utils/cloneByDate.py", line 356, in clone
    self.dep_solve([pkg['nvrea'] for pkg in added_pkgs])
  File "/usr/share/rhn/utils/cloneByDate.py", line 379, in dep_solve
    self.process_deps(dep_results)
  File "/usr/share/rhn/utils/cloneByDate.py", line 418, in process_deps
    cloner.process_deps(needed)
  File "/usr/share/rhn/utils/cloneByDate.py", line 500, in process_deps
    self.remote_api.add_packages(self.to_label, needed_ids)
  File "/usr/share/rhn/utils/cloneByDate.py", line 689, in add_packages
    self.client.channel.software.addPackages(self.auth_token, label, pkg_set)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1253, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1392, in _parse_response
    return u.close()
  File "/usr/lib64/python2.6/xmlrpclib.py", line 838, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault -1: 'redstone.xmlrpc.XmlRpcFault: unhandled internal exception: ERROR: deadlock detected\n  Detail: Process 2624 waits for ShareLock on transaction 3064927; blocked by process 2627.\nProcess 2627 waits for ExclusiveLock on tuple (8,27) of relation 17242 of database 16384; blocked by process 2624.\n  Hint: See server log for query details.\n  Where: SQL statement "update rhnChannel set last_modified =  $1  where id =  $2 "\nPL/pgSQL function "update_channel" line 23 at SQL statement'>


Expected results:
Channel cloned without error

Additional info:

This problem is triggered by two API calls when run in parallel:
* errata.cloneAsOriginalAsync()
  - this function (among other things) calls the following PL/SQL function:
    rhn_channel.refresh_newest_package()
* channel.software.addPackages()
  - this function calls the following PL/SQL functions in the following order:
    rhn_channel.update_channel()
    rhn_channel.refresh_newest_package()

The race condition triggering this deadlock:

1. errata.cloneAsOriginalAsync() is executed. It calls
rhn_channel.refresh_newest_package(), which does the following:
   - insert into rhnChannelNewestPackage
   - delete from rhnChannelNewesetPackage
   - insert into rhnChannelNewestPackageAudit

At this point, the procedure would like to do
   - update rhnChannel
but before it does so, the other API call kicks in.

2. channel.software.addPackages() is executed. It calls
rhn_channel.update_channel(), which does the following:
    - update rhnChannel
This update acquires lock on the rhnChannel table.

3. errata.cloneAsOriginalAsync() would like to do
    - update rhnChannel
but is unable to, b/c it has to wait for the rhnChannel table
lock to be released.

4. channel.software.addPackages() wants to execute
rhn_channel.refresh_newest_package(), which would like to
do the inserts and delete as described in step 1, but it
has to wait, b/c the locks from step 1 are not released yet.

At this point, errata.cloneAsOriginalAsync() waits for
channel.software.addPackages() and channel.software.addPackages()
waits for errata.cloneAsOriginalAsync() -> deadlock.

Also, this scenario is possible only when spacewalk-clone-by-date
is used with -g option.

Comment 1 Milan Zázrivec 2014-01-15 13:28:57 UTC
spacewalk.git master: d0815058c8892c8f5a0642cc592ff97fcaa49374

Comment 2 Matej Kollar 2014-01-17 12:15:32 UTC
Switching MODIFIED Spacewalk bugs to ON_QA before 2.1 release.

Comment 3 Milan Zázrivec 2014-01-22 10:42:13 UTC
Previous commits reverted in spacewalk.git master:

ab519082232175cba498721360298ddd80e458ab
aabceb78217f6402558bde106d886511ebacb027

Locking rhnChannel table at the beginning of rhn_channel.refresh_newest_package
won't suffice, since the resources, over which the two deadlocking
transactions compete were locked before entry to the procedure.

Comment 4 Milan Zázrivec 2014-01-22 15:42:57 UTC
Fixed in spacewalk.git master: bded6355c0ef72f980cbedac7ede350bf6b81fc6

Comment 5 Milan Zázrivec 2014-01-22 15:55:21 UTC
Race condition described in the initial comment was not valid.

Race condition leading to the deadlock:

* Transaction 1:
insert into rhnChannelPackage (channel_id, package_id) values (1, 1)

* Transaction 2:
insert into rhnChannelPackage (channel_id, package_id) values (1, 2)

* Transaction 1:
rhn_channel.update_channel(1)

Part of this procedure is: update rhnChannel set ... where id = 1

PostgreSQL grants transaction 1 exclusive lock on rhnChannel table, but
at this point, transaction 1 has to wait till transaction 2 finishes, since
both transaction 1 and 2 hold a RowShareLock on rhnChannel row where
rhnChannel.id = 1

* Transaction 2:
rhn_channel.refresh_newest_package(1)

Part of this procedure is: update rhnChannel set ... where id = 1

Transaction 2 at this point won't even be able to acquire ExclusiveLock
on rhnChannel table, so it has to wait, till transaction 1 finishes.

Transaction 1 waits for transaction 2 to finish & transaction 2
waits for transaction 1 to finish -> deadlock.

Comment 6 Matej Kollar 2014-03-04 13:06:39 UTC
Spacewalk 2.1 has been released.
https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21

Comment 7 Matej Kollar 2014-03-04 13:08:35 UTC
Spacewalk 2.1 has been released.
https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21


Note You need to log in before you can comment on or make changes to this bug.