Bug 702643 - Canceling a task could result in a deadlock from logging
Summary: Canceling a task could result in a deadlock from logging
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Jason Connor
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: 673206
TreeView+ depends on / blocked
 
Reported: 2011-05-06 12:58 UTC by John Matthews
Modified: 2014-03-31 01:39 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-07-11 20:13:23 UTC
Embargoed:


Attachments (Terms of Use)

Description John Matthews 2011-05-06 12:58:33 UTC
Description of problem:

Running "pulp/playpen/perftests/loop_sync_cancel.sh" and noticing on some runs pulp appears hung.  Below are the stack traces from 'debugging'.

Issue is that python logging uses a lock.
One thread is logging a statement, while it is performing a release of the lock we interrupt with a CancelException.

Next thread tries to log something and it's hung on acquiring the lock.  

The timing of this is difficult, I don't see this scenario a lot when running loop_cancel_sync, but I have seen it a few times over several days of testing.

Thread was in the process of releasing lock and was interrupted with a CancelException

[Fri May 06 08:36:34 2011] [error] Exception in thread Thread-486:
[Fri May 06 08:36:34 2011] [error] Traceback (most recent call last):
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri May 06 08:36:34 2011] [error]     self.run()
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri May 06 08:36:34 2011] [error]     self.__target(*self.__args, **self.__kwargs)
[Fri May 06 08:36:34 2011] [error]   File "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 343, in run
[Fri May 06 08:36:34 2011] [error]     _log.error("Task id:%s, method_name:%s CancelException" % (self.id, self.method_name))
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1154, in error
[Fri May 06 08:36:34 2011] [error]     self._log(ERROR, msg, args, **kwargs)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1246, in _log
[Fri May 06 08:36:34 2011] [error]     self.handle(record)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1256, in handle
[Fri May 06 08:36:34 2011] [error]     self.callHandlers(record)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1293, in callHandlers
[Fri May 06 08:36:34 2011] [error]     hdlr.handle(record)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 742, in handle
[Fri May 06 08:36:34 2011] [error]     self.release()
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 696, in release
[Fri May 06 08:36:34 2011] [error]     self.lock.release()
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/threading.py", line 138, in release
[Fri May 06 08:36:34 2011] [error]     self.__count = count = self.__count - 1
[Fri May 06 08:36:34 2011] [error] CancelException


Thread blocked trying to acquire lock
[Fri May 06 08:42:58 2011] [error] # ProcessId: 11598
[Fri May 06 08:42:58 2011] [error] # ThreadID: 140472190457600
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri May 06 08:42:58 2011] [error]   self.__bootstrap_inner()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri May 06 08:42:58 2011] [error]   self.run()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri May 06 08:42:58 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 116, in _dispatch
[Fri May 06 08:42:58 2011] [error]   self._cancel_tasks()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 158, in _cancel_tasks
[Fri May 06 08:42:58 2011] [error]   task.cancel()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/repo_sync_task.py", line 48, in cancel
[Fri May 06 08:42:58 2011] [error]   super(RepoSyncTask, self).cancel()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 417, in cancel
[Fri May 06 08:42:58 2011] [error]   self.thread.cancel()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 243, in cancel
[Fri May 06 08:42:58 2011] [error]   self.raise_exception(CancelException)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 231, in raise_exception
[Fri May 06 08:42:58 2011] [error]   deliver_exception(self, test_exception_event, self.__exception_event.wait)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 215, in deliver_exception
[Fri May 06 08:42:58 2011] [error]   _raise_exception_in_thread(_tid(thread), exc_type)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 165, in _raise_exception_in_thread
[Fri May 06 08:42:58 2011] [error]   _log.error("in _raise_exception_in_thread.  long_tid = [%s], num = [%s]" % (long_tid, num))
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1154, in error
[Fri May 06 08:42:58 2011] [error]   self._log(ERROR, msg, args, **kwargs)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1246, in _log
[Fri May 06 08:42:58 2011] [error]   self.handle(record)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1256, in handle
[Fri May 06 08:42:58 2011] [error]   self.callHandlers(record)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1293, in callHandlers
[Fri May 06 08:42:58 2011] [error]   hdlr.handle(record)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 738, in handle
[Fri May 06 08:42:58 2011] [error]   self.acquire()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 689, in acquire
[Fri May 06 08:42:58 2011] [error]   self.lock.acquire()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Fri May 06 08:42:58 2011] [error]   rc = self.__block.acquire(blocking)

Comment 1 Jay Dobies 2012-07-11 20:13:23 UTC
Closing as this has been rewritten in v2. If it becomes an issue in the new code base we can refile it.


Note You need to log in before you can comment on or make changes to this bug.