Bug 700508

Summary: fast sync/cancel_sync locks up task subsystem
Product: [Retired] Pulp Reporter: dgao
Component: z_otherAssignee: John Matthews <jmatthew>
Status: CLOSED CURRENTRELEASE QA Contact: dgao
Severity: high Docs Contact:
Priority: medium    
Version: unspecifiedCC: pthomas, skarmark
Target Milestone: ---Keywords: Triaged
Target Release: Sprint 23   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-24 20:16:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 673206    

Description dgao 2011-04-28 14:50:23 UTC
curl -k -u "admin:admin" --request POST --data '{}' --header 'accept: application/json' --header 'content-type: application/json'  https://pulp-qe-sandbox.usersys.redhat.com/pulp/api/repositories/api_multisync_repo_1/sync/

curl -k -u "admin:admin" https://pulp-qe-sandbox.usersys.redhat.com/pulp/api/repositories/api_multisync_repo_1/sync/6744f4a3-71a3-11e0-8d19-545200104e49/

# at this point, it's verified by repo status that the sync is underway

curl -k -u "admin:admin" --request DELETE https://pulp-qe-sandbox.usersys.redhat.com/pulp/api/repositories/api_multisync_repo_1/sync/6744f4a3-71a3-11e0-8d19-545200104e49/


-------task subsystem hangs----

curl -k -u "admin:admin" --request POST --data '{}' --header 'accept: application/json' --header 'content-type: application/json'  https://pulp-qe-sandbox.usersys.redhat.com/pulp/api/repositories/api_multisync_repo_2/sync/

the last sync call will never return...

Comment 1 John Matthews 2011-04-28 21:15:51 UTC
Added a script to pulp:
playpen/stresstests/loop_sync_cancel.sh  to demonstrate this problem

Usage:
./loop_sync_cancel.sh test_repo_id

Comment 2 John Matthews 2011-04-29 18:31:55 UTC
Debug info from locked up pulp:

Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140638056650496
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 14:31:05 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 14:31:05 2011] [error]   self.run()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 14:31:05 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 335, in run
[Fri Apr 29 14:31:05 2011] [error]   result = self.callable(*self.args, **self.kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/api/repo.py", line 1499, in _sync
[Fri Apr 29 14:31:05 2011] [error]   if not self.set_sync_in_progress(id, True):
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/api/repo.py", line 2068, in set_sync_in_progress
[Fri Apr 29 14:31:05 2011] [error]   self.__sync_lock.acquire()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Fri Apr 29 14:31:05 2011] [error]   rc = self.__block.acquire(blocking)
[Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140638116878080
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 279, in wsgi
[Fri Apr 29 14:31:05 2011] [error]   result = self.handle_with_processors()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 252, in handle_with_processors
[Fri Apr 29 14:31:05 2011] [error]   return process(self.processors)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 558, in processor
[Fri Apr 29 14:31:05 2011] [error]   return handler()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 573, in processor
[Fri Apr 29 14:31:05 2011] [error]   result = handler()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 242, in process
[Fri Apr 29 14:31:05 2011] [error]   return self.handle()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 233, in handle
[Fri Apr 29 14:31:05 2011] [error]   return self._delegate(fn, self.fvars, args)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 414, in _delegate
[Fri Apr 29 14:31:05 2011] [error]   return f()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 422, in <lambda>
[Fri Apr 29 14:31:05 2011] [error]   f = lambda: self._delegate_sub_application(pat, what)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 447, in _delegate_sub_application
[Fri Apr 29 14:31:05 2011] [error]   return app.handle_with_processors()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 252, in handle_with_processors
[Fri Apr 29 14:31:05 2011] [error]   return process(self.processors)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 558, in processor
[Fri Apr 29 14:31:05 2011] [error]   return handler()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 573, in processor
[Fri Apr 29 14:31:05 2011] [error]   result = handler()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 14:31:05 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 242, in process
[Fri Apr 29 14:31:05 2011] [error]   return self.handle()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 233, in handle
[Fri Apr 29 14:31:05 2011] [error]   return self._delegate(fn, self.fvars, args)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 412, in _delegate
[Fri Apr 29 14:31:05 2011] [error]   return handle_class(cls)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 387, in handle_class
[Fri Apr 29 14:31:05 2011] [error]   return tocall(*args)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 52, in report_error
[Fri Apr 29 14:31:05 2011] [error]   return method(self, *args, **kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 131, in _auth_decorator
[Fri Apr 29 14:31:05 2011] [error]   value = method(self, *args, **kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/repositories.py", line 1213, in POST
[Fri Apr 29 14:31:05 2011] [error]   return action(id)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/repositories.py", line 585, in sync
[Fri Apr 29 14:31:05 2011] [error]   task = api.sync(id, timeout, skip, max_speed=limit, threads=threads)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/auditing.py", line 207, in _audit
[Fri Apr 29 14:31:05 2011] [error]   result = method(*args, **kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/api/repo.py", line 1585, in sync
[Fri Apr 29 14:31:05 2011] [error]   task_type=repo_sync.RepoSyncTask)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/async.py", line 70, in run_async
[Fri Apr 29 14:31:05 2011] [error]   return enqueue(task, unique)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/async.py", line 45, in enqueue
[Fri Apr 29 14:31:05 2011] [error]   _queue.enqueue(task, unique)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 211, in enqueue
[Fri Apr 29 14:31:05 2011] [error]   self.__lock.acquire()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Fri Apr 29 14:31:05 2011] [error]   rc = self.__block.acquire(blocking)
[Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140638389262080
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 14:31:05 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 14:31:05 2011] [error]   self.run()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 65, in run
[Fri Apr 29 14:31:05 2011] [error]   msg = self.__fetch()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 86, in __fetch
[Fri Apr 29 14:31:05 2011] [error]   return self.receiver.fetch(timeout=self.WAIT)
[Fri Apr 29 14:31:05 2011] [error] File: "<string>", line 6, in fetch
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1003, in fetch
[Fri Apr 29 14:31:05 2011] [error]   msg = self.session._get(self, timeout=timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "<string>", line 6, in _get
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 636, in _get
[Fri Apr 29 14:31:05 2011] [error]   timeout):
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
[Fri Apr 29 14:31:05 2011] [error]   result = self.connection._ewait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 195, in _ewait
[Fri Apr 29 14:31:05 2011] [error]   result = self._wait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 180, in _wait
[Fri Apr 29 14:31:05 2011] [error]   return self._waiter.wait(predicate, timeout=timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 59, in wait
[Fri Apr 29 14:31:05 2011] [error]   self.condition.wait(timeout - passed)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 96, in wait
[Fri Apr 29 14:31:05 2011] [error]   sw.wait(timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/compat.py", line 53, in wait
[Fri Apr 29 14:31:05 2011] [error]   ready, _, _ = select([self], [], [], timeout)
[Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140638397654784
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 14:31:05 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 14:31:05 2011] [error]   self.run()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 65, in run
[Fri Apr 29 14:31:05 2011] [error]   msg = self.__fetch()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 86, in __fetch
[Fri Apr 29 14:31:05 2011] [error]   return self.receiver.fetch(timeout=self.WAIT)
[Fri Apr 29 14:31:05 2011] [error] File: "<string>", line 6, in fetch
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1003, in fetch
[Fri Apr 29 14:31:05 2011] [error]   msg = self.session._get(self, timeout=timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "<string>", line 6, in _get
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 636, in _get
[Fri Apr 29 14:31:05 2011] [error]   timeout):
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
[Fri Apr 29 14:31:05 2011] [error]   result = self.connection._ewait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 195, in _ewait
[Fri Apr 29 14:31:05 2011] [error]   result = self._wait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 180, in _wait
[Fri Apr 29 14:31:05 2011] [error]   return self._waiter.wait(predicate, timeout=timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 59, in wait
[Fri Apr 29 14:31:05 2011] [error]   self.condition.wait(timeout - passed)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 96, in wait
[Fri Apr 29 14:31:05 2011] [error]   sw.wait(timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/compat.py", line 53, in wait
[Fri Apr 29 14:31:05 2011] [error]   ready, _, _ = select([self], [], [], timeout)
[Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140637965907712
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 14:31:05 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 14:31:05 2011] [error]   self.run()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 14:31:05 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/debugging.py", line 66, in _monitor
[Fri Apr 29 14:31:05 2011] [error]   self._stacktraces()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/debugging.py", line 48, in _stacktraces
[Fri Apr 29 14:31:05 2011] [error]   for filename, lineno, name, line in traceback.extract_stack(stack):
[Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140638406047488
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 14:31:05 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 14:31:05 2011] [error]   self.run()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 14:31:05 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/selector.py", line 119, in run
[Fri Apr 29 14:31:05 2011] [error]   rd, wr, ex = select(self.reading, self.writing, (), timeout)
[Fri Apr 29 14:31:05 2011] [error] 
[Fri Apr 29 14:31:05 2011] [error] # ProcessId: 14397
[Fri Apr 29 14:31:05 2011] [error] # ThreadID: 140638083958528
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 14:31:05 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 14:31:05 2011] [error]   self.run()
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 14:31:05 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 116, in _dispatch
[Fri Apr 29 14:31:05 2011] [error]   self._cancel_tasks()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 158, in _cancel_tasks
[Fri Apr 29 14:31:05 2011] [error]   task.cancel()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/api/repo_sync.py", line 72, in cancel
[Fri Apr 29 14:31:05 2011] [error]   super(RepoSyncTask, self).cancel()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 414, in cancel
[Fri Apr 29 14:31:05 2011] [error]   self.thread.cancel()
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 239, in cancel
[Fri Apr 29 14:31:05 2011] [error]   self.raise_exception(CancelException)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 227, in raise_exception
[Fri Apr 29 14:31:05 2011] [error]   deliver_exception(self, test_exception_event, self.__exception_event.wait)
[Fri Apr 29 14:31:05 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 212, in deliver_exception
[Fri Apr 29 14:31:05 2011] [error]   wait(self.__default_timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 394, in wait
[Fri Apr 29 14:31:05 2011] [error]   self.__cond.wait(timeout)
[Fri Apr 29 14:31:05 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 257, in wait
[Fri Apr 29 14:31:05 2011] [error]   _sleep(delay)

Comment 3 John Matthews 2011-04-29 20:28:08 UTC
Retry
[Fri Apr 29 16:25:53 2011] [notice] caught SIGTERM, shutting down
[Fri Apr 29 16:26:04 2011] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Apr 29 16:26:04 2011] [notice] SSL FIPS mode disabled
[Fri Apr 29 16:26:04 2011] [notice] Digest: generating secret for digest authentication ...
[Fri Apr 29 16:26:04 2011] [notice] Digest: done
[Fri Apr 29 16:26:04 2011] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.
[Fri Apr 29 16:26:04 2011] [notice] mod_python: using mutex_directory /tmp 
[Fri Apr 29 16:26:04 2011] [notice] SSL FIPS mode disabled
[Fri Apr 29 16:26:04 2011] [notice] Apache/2.2.17 (Unix) DAV/2 mod_python/3.3.1 Python/2.7 mod_ssl/2.2.17 OpenSSL/1.0.0d-fips mod_wsgi/3.2 configured -- resuming normal operations
[Fri Apr 29 16:26:15 2011] [error] Traceback (most recent call last):
[Fri Apr 29 16:26:15 2011] [error]   File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 1106, in _hdr_retrieve
[Fri Apr 29 16:26:15 2011] [error]     max_size=self.opts.max_header_size):
[Fri Apr 29 16:26:15 2011] [error]   File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 762, in __getattr__
[Fri Apr 29 16:26:15 2011] [error]     if self.delegate and hasattr(self.delegate, name):
[Fri Apr 29 16:26:15 2011] [error]   File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 764, in __getattr__
[Fri Apr 29 16:26:15 2011] [error]     raise AttributeError, name
[Fri Apr 29 16:26:15 2011] [error] pulp.server.tasking.exception.CancelException
[Fri Apr 29 16:26:15 2011] [error] Traceback (most recent call last):
[Fri Apr 29 16:26:15 2011] [error]   File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 1125, in _hdr_retrieve
[Fri Apr 29 16:26:15 2011] [error]     if buf.lower().find('location') != -1:
[Fri Apr 29 16:26:15 2011] [error] pulp.server.tasking.exception.CancelException
[Fri Apr 29 16:26:15 2011] [error] Traceback (most recent call last):
[Fri Apr 29 16:26:15 2011] [error]   File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 1579, in _progress_update
[Fri Apr 29 16:26:15 2011] [error]     if self._over_max_size(cur=self._amount_read-self._reget_length):
[Fri Apr 29 16:26:15 2011] [error]   File "/usr/lib/python2.7/site-packages/urlgrabber/grabber.py", line 1597, in _over_max_size
[Fri Apr 29 16:26:15 2011] [error]     if not max_size: return False # if we have None for all of the Max then this is dumb
[Fri Apr 29 16:26:15 2011] [error] pulp.server.tasking.exception.CancelException

Saw above after 1 or 2 loops.  Now locked again





[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817090283264
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 16:26:39 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 16:26:39 2011] [error]   self.run()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 16:26:39 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 348, in run
[Fri Apr 29 16:26:39 2011] [error]   self.failed(e)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 390, in failed
[Fri Apr 29 16:26:39 2011] [error]   self._complete()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 400, in _complete
[Fri Apr 29 16:26:39 2011] [error]   self.complete_callback(self)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 265, in complete
[Fri Apr 29 16:26:39 2011] [error]   self.__lock.acquire()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Fri Apr 29 16:26:39 2011] [error]   rc = self.__block.acquire(blocking)
[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817300231936
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 16:26:39 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 16:26:39 2011] [error]   self.run()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 16:26:39 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/debugging.py", line 66, in _monitor
[Fri Apr 29 16:26:39 2011] [error]   self._stacktraces()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/debugging.py", line 48, in _stacktraces
[Fri Apr 29 16:26:39 2011] [error]   for filename, lineno, name, line in traceback.extract_stack(stack):
[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817629726464
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 16:26:39 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 16:26:39 2011] [error]   self.run()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 16:26:39 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/selector.py", line 119, in run
[Fri Apr 29 16:26:39 2011] [error]   rd, wr, ex = select(self.reading, self.writing, (), timeout)
[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817621333760
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 16:26:39 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 16:26:39 2011] [error]   self.run()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 65, in run
[Fri Apr 29 16:26:39 2011] [error]   msg = self.__fetch()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 86, in __fetch
[Fri Apr 29 16:26:39 2011] [error]   return self.receiver.fetch(timeout=self.WAIT)
[Fri Apr 29 16:26:39 2011] [error] File: "<string>", line 6, in fetch
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1003, in fetch
[Fri Apr 29 16:26:39 2011] [error]   msg = self.session._get(self, timeout=timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "<string>", line 6, in _get
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 636, in _get
[Fri Apr 29 16:26:39 2011] [error]   timeout):
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
[Fri Apr 29 16:26:39 2011] [error]   result = self.connection._ewait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 195, in _ewait
[Fri Apr 29 16:26:39 2011] [error]   result = self._wait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 180, in _wait
[Fri Apr 29 16:26:39 2011] [error]   return self._waiter.wait(predicate, timeout=timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 59, in wait
[Fri Apr 29 16:26:39 2011] [error]   self.condition.wait(timeout - passed)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 96, in wait
[Fri Apr 29 16:26:39 2011] [error]   sw.wait(timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/compat.py", line 53, in wait
[Fri Apr 29 16:26:39 2011] [error]   ready, _, _ = select([self], [], [], timeout)
[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817587762944
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 279, in wsgi
[Fri Apr 29 16:26:39 2011] [error]   result = self.handle_with_processors()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 252, in handle_with_processors
[Fri Apr 29 16:26:39 2011] [error]   return process(self.processors)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 558, in processor
[Fri Apr 29 16:26:39 2011] [error]   return handler()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 573, in processor
[Fri Apr 29 16:26:39 2011] [error]   result = handler()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 242, in process
[Fri Apr 29 16:26:39 2011] [error]   return self.handle()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 233, in handle
[Fri Apr 29 16:26:39 2011] [error]   return self._delegate(fn, self.fvars, args)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 414, in _delegate
[Fri Apr 29 16:26:39 2011] [error]   return f()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 422, in <lambda>
[Fri Apr 29 16:26:39 2011] [error]   f = lambda: self._delegate_sub_application(pat, what)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 447, in _delegate_sub_application
[Fri Apr 29 16:26:39 2011] [error]   return app.handle_with_processors()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 252, in handle_with_processors
[Fri Apr 29 16:26:39 2011] [error]   return process(self.processors)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 558, in processor
[Fri Apr 29 16:26:39 2011] [error]   return handler()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in process
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 573, in processor
[Fri Apr 29 16:26:39 2011] [error]   result = handler()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 240, in <lambda>
[Fri Apr 29 16:26:39 2011] [error]   return p(lambda: process(processors))
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 242, in process
[Fri Apr 29 16:26:39 2011] [error]   return self.handle()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 233, in handle
[Fri Apr 29 16:26:39 2011] [error]   return self._delegate(fn, self.fvars, args)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 412, in _delegate
[Fri Apr 29 16:26:39 2011] [error]   return handle_class(cls)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/web.py-0.32-py2.7.egg/web/application.py", line 387, in handle_class
[Fri Apr 29 16:26:39 2011] [error]   return tocall(*args)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 52, in report_error
[Fri Apr 29 16:26:39 2011] [error]   return method(self, *args, **kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 131, in _auth_decorator
[Fri Apr 29 16:26:39 2011] [error]   value = method(self, *args, **kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/repositories.py", line 1213, in POST
[Fri Apr 29 16:26:39 2011] [error]   return action(id)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/repositories.py", line 585, in sync
[Fri Apr 29 16:26:39 2011] [error]   task = api.sync(id, timeout, skip, max_speed=limit, threads=threads)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/auditing.py", line 207, in _audit
[Fri Apr 29 16:26:39 2011] [error]   result = method(*args, **kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/api/repo.py", line 1585, in sync
[Fri Apr 29 16:26:39 2011] [error]   task_type=repo_sync.RepoSyncTask)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/async.py", line 70, in run_async
[Fri Apr 29 16:26:39 2011] [error]   return enqueue(task, unique)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/async.py", line 45, in enqueue
[Fri Apr 29 16:26:39 2011] [error]   _queue.enqueue(task, unique)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 211, in enqueue
[Fri Apr 29 16:26:39 2011] [error]   self.__lock.acquire()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Fri Apr 29 16:26:39 2011] [error]   rc = self.__block.acquire(blocking)
[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817308624640
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 16:26:39 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 16:26:39 2011] [error]   self.run()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Fri Apr 29 16:26:39 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 116, in _dispatch
[Fri Apr 29 16:26:39 2011] [error]   self._cancel_tasks()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 158, in _cancel_tasks
[Fri Apr 29 16:26:39 2011] [error]   task.cancel()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/api/repo_sync.py", line 72, in cancel
[Fri Apr 29 16:26:39 2011] [error]   super(RepoSyncTask, self).cancel()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 414, in cancel
[Fri Apr 29 16:26:39 2011] [error]   self.thread.cancel()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 239, in cancel
[Fri Apr 29 16:26:39 2011] [error]   self.raise_exception(CancelException)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 227, in raise_exception
[Fri Apr 29 16:26:39 2011] [error]   deliver_exception(self, test_exception_event, self.__exception_event.wait)
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 212, in deliver_exception
[Fri Apr 29 16:26:39 2011] [error]   wait(self.__default_timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 394, in wait
[Fri Apr 29 16:26:39 2011] [error]   self.__cond.wait(timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 257, in wait
[Fri Apr 29 16:26:39 2011] [error]   _sleep(delay)
[Fri Apr 29 16:26:39 2011] [error] 
[Fri Apr 29 16:26:39 2011] [error] # ProcessId: 20902
[Fri Apr 29 16:26:39 2011] [error] # ThreadID: 139817612941056
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Fri Apr 29 16:26:39 2011] [error]   self.__bootstrap_inner()
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Fri Apr 29 16:26:39 2011] [error]   self.run()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 65, in run
[Fri Apr 29 16:26:39 2011] [error]   msg = self.__fetch()
[Fri Apr 29 16:26:39 2011] [error] File: "/shared/repo/gofer/src/gofer/messaging/consumer.py", line 86, in __fetch
[Fri Apr 29 16:26:39 2011] [error]   return self.receiver.fetch(timeout=self.WAIT)
[Fri Apr 29 16:26:39 2011] [error] File: "<string>", line 6, in fetch
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1003, in fetch
[Fri Apr 29 16:26:39 2011] [error]   msg = self.session._get(self, timeout=timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "<string>", line 6, in _get
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 636, in _get
[Fri Apr 29 16:26:39 2011] [error]   timeout):
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
[Fri Apr 29 16:26:39 2011] [error]   result = self.connection._ewait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 195, in _ewait
[Fri Apr 29 16:26:39 2011] [error]   result = self._wait(lambda: self.error or predicate(), timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 180, in _wait
[Fri Apr 29 16:26:39 2011] [error]   return self._waiter.wait(predicate, timeout=timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 59, in wait
[Fri Apr 29 16:26:39 2011] [error]   self.condition.wait(timeout - passed)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 96, in wait
[Fri Apr 29 16:26:39 2011] [error]   sw.wait(timeout)
[Fri Apr 29 16:26:39 2011] [error] File: "/usr/lib/python2.7/site-packages/qpid/compat.py", line 53, in wait
[Fri Apr 29 16:26:39 2011] [error]   ready, _, _ = select([self], [], [], timeout)

Comment 4 John Matthews 2011-04-29 20:55:58 UTC
Below are my thoughts:

Scenario for Comment #2 logs:
There are 3 threads we are interested in:
 1 Thread waiting on "self._sync_lock.acquire()" from repo.py::_sync()
 1 Thread waiting to enqueue a new sync
  It is stuck on waiting to acquire TaskQueue.__lock
 1 Thread HOLDING TaskQueue.__lock is spinning in thread.cancel()

Scenario for Comment #3 logs:
There are 3 threads we are interested in:
 1 Thread waiting on TaskQueue.__lock
  It is trying to mark a task as failed
 1 Thread is trying to enqueue a new sync and is waiting on TaskQueue.__lock
 1 Thread is trying to cancel a sync task, yet it is spinning and HOLDING TaskQueue.__lock

For both scenarios a thread is in a spinning wait and it is holding the TaskQueue.__lock

A) Do we really want to be in a wait while we have the TaskQueue.__lock held?  Thinking of opening a new bug to address this and rework the logic so while we spin on cancelling a task, the whole task subsystem is not locked up.

B) Why is cancel not being updated?

Comment 5 John Matthews 2011-05-03 21:30:21 UTC
Issue shows:
Thread-1: Blocked waiting on lock in queue.py:complete()
Thread-2: Blocked waiting on lock in find_async()
Thread-3: Waiting on a threading.Event for task to cancel

Thread-1 and Thread-2 are blocked because Thread-3 has the shared lock acquired.


[Tue May 03 16:52:50 2011] [error] # ProcessId: 15814
[Tue May 03 16:52:50 2011] [error] # ThreadID: 140412345906944
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Tue May 03 16:52:50 2011] [error]   self.__bootstrap_inner()
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Tue May 03 16:52:50 2011] [error]   self.run()
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Tue May 03 16:52:50 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 348, in run
[Tue May 03 16:52:50 2011] [error]   self.failed(e)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 390, in failed
[Tue May 03 16:52:50 2011] [error]   self._complete()
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 400, in _complete
[Tue May 03 16:52:50 2011] [error]   self.complete_callback(self)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 265, in complete
[Tue May 03 16:52:50 2011] [error]   self.__lock.acquire()
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Tue May 03 16:52:50 2011] [error]   rc = self.__block.acquire(blocking)
[Tue May 03 16:52:50 2011] [error] 


6:52:50 2011] [error]   return tocall(*args)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 52, in report_error
[Tue May 03 16:52:50 2011] [error]   return method(self, *args, **kwargs)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 131, in _auth_decorator
[Tue May 03 16:52:50 2011] [error]   value = method(self, *args, **kwargs)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/repositories.py", line 1271, in GET
[Tue May 03 16:52:50 2011] [error]   task_info = self.task_status(action_id)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 411, in task_status
[Tue May 03 16:52:50 2011] [error]   task = self.find_task(id)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/webservices/controllers/base.py", line 425, in find_task
[Tue May 03 16:52:50 2011] [error]   tasks = async.find_async(id=id)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/async.py", line 74, in find_async
[Tue May 03 16:52:50 2011] [error]   return _queue.find(**kwargs)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 300, in find
[Tue May 03 16:52:50 2011] [error]   self.__lock.acquire()
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 122, in acquire
[Tue May 03 16:52:50 2011] [error]   rc = self.__block.acquire(blocking)



[Tue May 03 16:52:50 2011] [error] # ProcessId: 15814
[Tue May 03 16:52:50 2011] [error] # ThreadID: 140412639487744
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 503, in __bootstrap
[Tue May 03 16:52:50 2011] [error]   self.__bootstrap_inner()
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 530, in __bootstrap_inner
[Tue May 03 16:52:50 2011] [error]   self.run()
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 483, in run
[Tue May 03 16:52:50 2011] [error]   self.__target(*self.__args, **self.__kwargs)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 116, in _dispatch
[Tue May 03 16:52:50 2011] [error]   self._cancel_tasks()
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 158, in _cancel_tasks
[Tue May 03 16:52:50 2011] [error]   task.cancel()
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/repo_sync_task.py", line 48, in cancel
[Tue May 03 16:52:50 2011] [error]   super(RepoSyncTask, self).cancel()
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 414, in cancel
[Tue May 03 16:52:50 2011] [error]   self.thread.cancel()
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 240, in cancel
[Tue May 03 16:52:50 2011] [error]   self.raise_exception(CancelException)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 228, in raise_exception
[Tue May 03 16:52:50 2011] [error]   deliver_exception(self, test_exception_event, self.__exception_event.wait)
[Tue May 03 16:52:50 2011] [error] File: "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/thread.py", line 213, in deliver_exception
[Tue May 03 16:52:50 2011] [error]   wait(self.__default_timeout)
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 394, in wait
[Tue May 03 16:52:50 2011] [error]   self.__cond.wait(timeout)
[Tue May 03 16:52:50 2011] [error] File: "/usr/lib64/python2.7/threading.py", line 257, in wait
[Tue May 03 16:52:50 2011] [error]   _sleep(delay)

Comment 6 John Matthews 2011-05-10 15:58:04 UTC
Issue is that injecting a CancelException into a thread can result in killing the thread and leaving process in a bad state.

We attempted to cleanup when a CancelException is injected, yet the CancelException is continuously sent into the thread.  The first time this happens we go into a cleanup method, yet sometimes the timing will cause us to receive a second CancelException in the cleanup method.  This means our cleanup is being interrupted.  We don't have a means of guaranteeing that we can cleanup.

We've also seen that sometimes the timing of a Cancel can cause a regular 'log.info' statement to lockup pulp.  Problem is that 'log' uses a lock, sometimes when the log method is releasing the lock it may be cancelled.

Basically there is a common assumption that the critical section in code is wrapped in a try/finally, that is good but not enough.  The issue seen here is that in the 'finally' block, which is generally the cleanup sometimes the CancelException is resent and then this 'finally' block is interrupted. 

Our approach to fix this is to stop sending the CancelException into repo sync tasks.  In repo sync we added a 'stopped' flag, we will check this at safe points during the sync.  When a sync is canceled, the 'stopped' flag is set and we wait for the sync to move into a safe spot to end, then the sync thread raises a CancelException itself and cleanup proceeds.


Commits below.
http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=b8e900f9b75f2eeacfb97e6f801891d88423d589

http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=cac8c4ec03cd58c225564bd8ea1e0227113b58b6

Comment 7 Jeff Ortel 2011-05-12 18:45:34 UTC
build: 0.176

Comment 8 Preethi Thomas 2011-09-01 18:29:19 UTC
verified
[root@preethi stresstests]# rpm -q pulp
pulp-0.0.229-1.fc15.noarch
[root@preethi stresstests]# 

ran  ./loop_sync_cancel.sh pulp without locking up

Comment 9 Preethi Thomas 2012-02-24 20:16:09 UTC
Pulp v1.0 is released
Closed Current Release.

Comment 10 Preethi Thomas 2012-02-24 20:17:36 UTC
Pulp v1.0 is released.