Bug 715091 - CDS syncs can get stuck as in progress
Summary: CDS syncs can get stuck as in progress
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Jason Connor
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: 688298
TreeView+ depends on / blocked
 
Reported: 2011-06-21 20:25 UTC by Jay Dobies
Modified: 2014-03-31 01:39 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-11 20:16:43 UTC
Embargoed:


Attachments (Terms of Use)

Description Jay Dobies 2011-06-21 20:25:22 UTC
During a sync, goferd crashed (due to qpid issues filed in other bugs). The task in Pulp for that sync is stuck in a running state and no other syncs for that CDS can be triggered.

Comment 1 Jay Dobies 2011-06-21 20:35:55 UTC
Also, the task list output needs to show arguments to the calls. Otherwise, it's impossible to differentiate between sync calls:

Task: 366c15b3-9c44-11e0-bccb-00508d977dff
    Scheduler: immediate
    Call: CdsApi.cds_sync
    State: running
    Start time: 2011-06-21T16:22:40-04:00
    Finish time: None
    Scheduled time: 2011-06-21T20:22:40Z
    Result: None
    Exception: None
    Traceback: None


The argument to cds_sync identifies the CDS being syncced.

Comment 2 Jay Dobies 2011-06-21 20:37:46 UTC
One more note: I was unable to use the task CLI commands to kill the task. The "remove" command had no effect. I had to shut down apache, delete the contents of task_snapshot, and restart it.

Since I had the task ID from the task list command, I probably could have just deleted that row, but with the mongo shell issues we've been seeing it was simpler to just kill the whole collection.

Comment 3 Jason Connor 2011-06-24 04:05:37 UTC
added a task cancel rest api and cli command to cancel any task in a generic manner
dangerous, but can possibly unstick a bad situation
pushed in 1be66992a2e337c6cb3399ef3324ac7eac5e2939

Comment 4 Jeff Ortel 2011-06-24 22:18:23 UTC
build: 0.198

Comment 5 Jason Connor 2011-06-29 19:27:18 UTC
In order to truly unstick a task, the following steps must be tasken:

1) delete the task snapshot (if one exists)
2) remove the task
3) cancel the task

This will cause the task to forcefully go away without the tasking system trying to restart it.

Comment 6 Preethi Thomas 2011-07-21 15:03:32 UTC
fails_qa
[root@preethi ~]# rpm -q pulp
pulp-0.0.212-1.fc14.noarch

here is the use case I followed

1. Ran CDS sync.
2. While cds sync was running stopped goferd on the cds
3. Restarted pulp-cds
4. Ran cds sync 
     [root@preethi ~]# pulp-admin cds sync --hostname=pulp-cds.usersys.redhat.com
error: operation failed: Sync already in process for CDS [pulp-cds.usersys.redhat.com]

4. Ran task list
[root@preethi ~]# pulp-admin task list
Task: 73d0dc59-b3aa-11e0-9f87-002564a85a58
    Scheduler: interval
    Call: cull_history
    Arguments: 
    State: waiting
    Start time: None
    Finish time: None
    Scheduled time: 2011-07-22T05:00:00Z
    Result: None
    Exception: None
    Traceback: None

Task: 73d0c466-b3aa-11e0-9f86-002564a85a58
    Scheduler: interval
    Call: cull_audited_events
    Arguments: 
    State: waiting
    Start time: None
    Finish time: None
    Scheduled time: 2011-07-22T01:00:00Z
    Result: None
    Exception: None
    Traceback: None

Task: 80558a63-b3aa-11e0-b191-002564a85a58
    Scheduler: immediate
    Call: CdsApi.cds_sync
    Arguments: pulp-cds.usersys.redhat.com
    State: running
    Start time: 2011-07-21T11:02:50-04:00
    Finish time: None
    Scheduled time: 2011-07-21T15:02:50Z
    Result: None
    Exception: None
    Traceback: None



 
5. Delete task snapshot

[root@preethi ~]# pulp-admin task delete_snapshot --id=80558a63-b3aa-11e0-b191-002564a85a58
Snapshot for task [80558a63-b3aa-11e0-b191-002564a85a58] deleted

6. Remove task

[root@preethi ~]# pulp-admin task remove --id=80558a63-b3aa-11e0-b191-002564a85a58
Task [80558a63-b3aa-11e0-b191-002564a85a58] set for removal


7. Task cancel

[root@preethi ~]# pulp-admin task cancel --id=80558a63-b3aa-11e0-b191-002564a85a58
Task [80558a63-b3aa-11e0-b191-002564a85a58] canceled

8. Run task list and see it hanging

From pulp.log


2011-07-21 11:04:55,033 28824:140175678088960: pulp.server.tasking.task:WARNING: task:404 Deprecated base class Task.cancel() called for [Task 80558a63-b3aa-11e0-b191-002564a85a58: CdsApi.cds_sync(pulp-cds.usersys.redhat.com, )]
2011-07-21 11:04:57,739 28824:140175583282944: gofer.messaging.consumer:ERROR: consumer:387 aa8b56b8-9c33-4dde-aa08-3fb997f6b3e6
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 382, in __fetch
    return self.__receiver.fetch(timeout=timeout)
  File "<string>", line 8, in fetch
  File "/usr/lib64/python2.7/threading.py", line 137, in release
    raise RuntimeError("cannot release un-acquired lock")
RuntimeError: cannot release un-acquired lock
2011-07-21 11:04:57,783 28824:140175583282944: pulp.server.api.cds:ERROR: cds:585 CDS threw an error during sync to CDS [pulp-cds.usersys.redhat.com]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/server/api/cds.py", line 568, in cds_sync
    self.dispatcher.sync(cds, payload)
  File "/usr/lib/python2.7/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync
    self._send(stub.sync, data)
  File "/usr/lib/python2.7/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send
    result = func(*args)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/stub.py", line 71, in __call__
    return self.stub._send(request, opts)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/stub.py", line 142, in _send
    any=opts.any)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/policy.py", line 123, in send
    reader.close()
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 316, in close
    self.__receiver.close()
  File "<string>", line 6, in close
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1040, in close
    try:
CdsMethodException
2011-07-21 11:04:57,788 28824:140175583282944: pulp.server.tasking.task:ERROR: task:381 Task failed: Task 80558a63-b3aa-11e0-b191-002564a85a58: CdsApi.cds_sync(pulp-cds.usersys.redhat.com, )
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/server/tasking/task.py", line 330, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/pulp/server/api/cds.py", line 568, in cds_sync
    self.dispatcher.sync(cds, payload)
  File "/usr/lib/python2.7/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync
    self._send(stub.sync, data)
  File "/usr/lib/python2.7/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send
    result = func(*args)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/stub.py", line 71, in __call__
    return self.stub._send(request, opts)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/stub.py", line 142, in _send
    any=opts.any)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/policy.py", line 123, in send
    reader.close()
  File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 316, in close
    self.__receiver.close()
  File "<string>", line 6, in close
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1040, in close
    try:
PulpException: 'Error on the CDS during sync; check the server log for more information'

Comment 7 Jay Dobies 2012-07-11 20:16:43 UTC
CDS code is being rewritten entirely in v2 and this is no longer applicable.


Note You need to log in before you can comment on or make changes to this bug.