Bug 702643

Summary: Canceling a task could result in a deadlock from logging
Product: [Retired] Pulp Reporter: John Matthews <jmatthew>
Component: z_otherAssignee: Jason Connor <jconnor>
Status: CLOSED NEXTRELEASE QA Contact: Preethi Thomas <pthomas>
Severity: low Docs Contact:
Priority: low    
Version: unspecifiedCC: jason.dobies, mmccune, skarmark
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-11 20:13:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 673206    

Description John Matthews 2011-05-06 12:58:33 UTC
Description of problem:

Running "pulp/playpen/perftests/loop_sync_cancel.sh" and noticing on some runs pulp appears hung.  Below are the stack traces from 'debugging'.

Issue is that python logging uses a lock.
One thread is logging a statement, while it is performing a release of the lock we interrupt with a CancelException.

Next thread tries to log something and it's hung on acquiring the lock.  

The timing of this is difficult, I don't see this scenario a lot when running loop_cancel_sync, but I have seen it a few times over several days of testing.

Thread was in the process of releasing lock and was interrupted with a CancelException

[Fri May 06 08:36:34 2011] [error] Exception in thread Thread-486:
[Fri May 06 08:36:34 2011] [error] Traceback (most recent call last):
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri May 06 08:36:34 2011] [error]     self.run()
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri May 06 08:36:34 2011] [error]     self.__target(*self.__args, **self.__kwargs)
[Fri May 06 08:36:34 2011] [error]   File "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 343, in run
[Fri May 06 08:36:34 2011] [error]     _log.error("Task id:%s, method_name:%s CancelException" % (self.id, self.method_name))
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1154, in error
[Fri May 06 08:36:34 2011] [error]     self._log(ERROR, msg, args, **kwargs)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1246, in _log
[Fri May 06 08:36:34 2011] [error]     self.handle(record)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1256, in handle
[Fri May 06 08:36:34 2011] [error]     self.callHandlers(record)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 1293, in callHandlers
[Fri May 06 08:36:34 2011] [error]     hdlr.handle(record)
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 742, in handle
[Fri May 06 08:36:34 2011] [error]     self.release()
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/logging/__init__.py", line 696, in release
[Fri May 06 08:36:34 2011] [error]     self.lock.release()
[Fri May 06 08:36:34 2011] [error]   File "/usr/lib64/python2.7/threading.py", line 138, in release
[Fri May 06 08:36:34 2011] [error]     self.__count = count = self.__count - 1
[Fri May 06 08:36:34 2011] [error] CancelException


Thread blocked trying to acquire lock
[Fri May 06 08:42:58 2011] [error] # ProcessId: 11598
[Fri May 06 08:42:58 2011] [error] # ThreadID: 140472190457600
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri May 06 08:42:58 2011] [error]   self.__bootstrap_inner()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri May 06 08:42:58 2011] [error]   self.run()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri May 06 08:42:58 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 116, in _dispatch
[Fri May 06 08:42:58 2011] [error]   self._cancel_tasks()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 158, in _cancel_tasks
[Fri May 06 08:42:58 2011] [error]   task.cancel()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/repo_sync_task.py", line 48, in cancel
[Fri May 06 08:42:58 2011] [error]   super(RepoSyncTask, self).cancel()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 417, in cancel
[Fri May 06 08:42:58 2011] [error]   self.thread.cancel()
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 243, in cancel
[Fri May 06 08:42:58 2011] [error]   self.raise_exception(CancelException)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 231, in raise_exception
[Fri May 06 08:42:58 2011] [error]   deliver_exception(self, test_exception_event, self.__exception_event.wait)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 215, in deliver_exception
[Fri May 06 08:42:58 2011] [error]   _raise_exception_in_thread(_tid(thread), exc_type)
[Fri May 06 08:42:58 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 165, in _raise_exception_in_thread
[Fri May 06 08:42:58 2011] [error]   _log.error("in _raise_exception_in_thread.  long_tid = [%s], num = [%s]" % (long_tid, num))
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1154, in error
[Fri May 06 08:42:58 2011] [error]   self._log(ERROR, msg, args, **kwargs)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1246, in _log
[Fri May 06 08:42:58 2011] [error]   self.handle(record)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1256, in handle
[Fri May 06 08:42:58 2011] [error]   self.callHandlers(record)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 1293, in callHandlers
[Fri May 06 08:42:58 2011] [error]   hdlr.handle(record)
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 738, in handle
[Fri May 06 08:42:58 2011] [error]   self.acquire()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/logging/__init__.py", line 689, in acquire
[Fri May 06 08:42:58 2011] [error]   self.lock.acquire()
[Fri May 06 08:42:58 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Fri May 06 08:42:58 2011] [error]   rc = self.__block.acquire(blocking)

Comment 1 Jay Dobies 2012-07-11 20:13:23 UTC
Closing as this has been rewritten in v2. If it becomes an issue in the new code base we can refile it.