Bug 1098734

Summary: No available Queue exception when running pulp commands
Product: [Retired] Pulp Reporter: Preethi Thomas <pthomas>
Component: z_otherAssignee: Brian Bouterse <bmbouter>
Status: CLOSED DUPLICATE QA Contact: pulp-qe-list
Severity: high Docs Contact:
Priority: high    
Version: 2.4 BetaCC: bmbouter, ipanova, mhrivnak
Target Milestone: ---Keywords: Triaged
Target Release: 2.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-25 18:20:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
admin.log
none
log none

Description Preethi Thomas 2014-05-17 19:55:28 UTC
Created attachment 896667 [details]
admin.log

Description of problem:

Not sure what is causing this. I have run into this exception a couple of times on 2 different servers. 



May 17 15:47:52 dell-pe2950-02 pulp: celery.worker.job:ERROR: NoAvailableQueues: There are no available queues in the system for reserved task work.


[root@dell-pe2950-02 ~]# pulp-admin rpm repo sync run --repo-id errata
+----------------------------------------------------------------------+
                   Synchronizing Repository [errata]
+----------------------------------------------------------------------+

An internal error occurred on the Pulp server. More information can be found in
the client log file ~/.pulp/admin.log.

[root@dell-pe2950-02 ~]# 



[root@pulp-24-server ~]# pulp-admin rpm repo copy  all -f zoo -t zoo-copy
An internal error occurred on the Pulp server. More information can be found in
the client log file ~/.pulp/admin.log.




Version-Release number of selected component (if applicable):

[root@dell-pe2950-02 ~]# rpm -qa pulp-server
pulp-server-2.4.0-0.16.beta.el6.noarch
[root@dell-pe2950-02 ~]# 

How reproducible:
Random

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Preethi Thomas 2014-05-17 19:56:40 UTC
Created attachment 896668 [details]
log

Comment 2 Ina Panova 2014-05-19 08:16:36 UTC
I already encountered this issue but in the older builds, since build 2.4.0-0.10.beta have never seen it again... maybe it can be related to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1088060

Comment 3 Brian Bouterse 2014-05-19 13:58:14 UTC
babysit() was removed from the pulp code with beta 16, so this can't be related to the issue:

https://bugzilla.redhat.com/show_bug.cgi?id=1088060

I'm investigating the root cause today.  I'll post my findings back on this ticket.

Comment 4 Brian Bouterse 2014-05-19 15:03:26 UTC
In experimenting on the system, I believe that the mongo exception below is occurring some of the time and is causing celerybeat to die.  If celerybeat dies then celery events stop being processed, which causes Pulp to believe (in 5 minutes) that workers have gone offline because heartbeats stop arriving.  Once new work arrives at the system, the resource manager believes there are no workers, and raises a the exception in the but report.


May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: beat raised exception <class 'pymongo.errors.AutoReconnect'>: AutoReconnect('[Errno 9] Bad file descriptor',)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: Traceback (most recent call last):
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/apps/beat.py", line 112, in start_scheduler
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     beat.start()
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/beat.py", line 462, in start
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     interval = self.scheduler.tick()
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/pulp/server/async/scheduler.py", line 303, in tick
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     ret = super(Scheduler, self).tick()
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/beat.py", line 219, in tick
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     for entry in values(self.schedule):
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/pulp/server/async/scheduler.py", line 366, in schedule
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     if self._schedule is None or self.schedule_changed:
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/pulp/server/async/scheduler.py", line 348, in schedule_changed
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     if utils.get_enabled().count() != self._loaded_from_db_count:
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 566, in count
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     **command)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/database.py", line 388, in command
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     result = self["$cmd"].find_one(command, **extra_opts)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 596, in find_one
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 814, in next
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     if len(self.__data) or self._refresh():
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 763, in _refresh
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     self.__uuid_subtype))
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 700, in __send_message
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     **kwargs)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib/python2.6/site-packages/pulp/server/db/connection.py", line 166, in _with_end_request
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     return method(*args, **kwargs)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 994, in _send_message_with_response
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL:     raise AutoReconnect(str(e))
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: AutoReconnect: [Errno 9] Bad file descriptor

Comment 5 Brian Bouterse 2014-05-20 21:12:01 UTC
PR available at:   https://github.com/pulp/pulp/pull/979

Comment 6 Brian Bouterse 2014-05-20 21:12:21 UTC
Merged

Comment 7 Randy Barlow 2014-05-20 23:45:41 UTC
Fixed in 2.4.0-0.17.beta.

Comment 8 Preethi Thomas 2014-05-25 18:20:47 UTC
Closing this one because 

1100005 is the same issue and has more details.

*** This bug has been marked as a duplicate of bug 1100005 ***