Bug 1098734
| Summary: | No available Queue exception when running pulp commands | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Pulp | Reporter: | Preethi Thomas <pthomas> | ||||||
| Component: | z_other | Assignee: | Brian Bouterse <bmbouter> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | pulp-qe-list | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 2.4 Beta | CC: | bmbouter, ipanova, mhrivnak | ||||||
| Target Milestone: | --- | Keywords: | Triaged | ||||||
| Target Release: | 2.4.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-05-25 18:20:47 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 896668 [details]
log
I already encountered this issue but in the older builds, since build 2.4.0-0.10.beta have never seen it again... maybe it can be related to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1088060 babysit() was removed from the pulp code with beta 16, so this can't be related to the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1088060 I'm investigating the root cause today. I'll post my findings back on this ticket. In experimenting on the system, I believe that the mongo exception below is occurring some of the time and is causing celerybeat to die. If celerybeat dies then celery events stop being processed, which causes Pulp to believe (in 5 minutes) that workers have gone offline because heartbeats stop arriving. Once new work arrives at the system, the resource manager believes there are no workers, and raises a the exception in the but report.
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: beat raised exception <class 'pymongo.errors.AutoReconnect'>: AutoReconnect('[Errno 9] Bad file descriptor',)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: Traceback (most recent call last):
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/apps/beat.py", line 112, in start_scheduler
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: beat.start()
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/beat.py", line 462, in start
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: interval = self.scheduler.tick()
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/pulp/server/async/scheduler.py", line 303, in tick
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: ret = super(Scheduler, self).tick()
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/beat.py", line 219, in tick
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: for entry in values(self.schedule):
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/pulp/server/async/scheduler.py", line 366, in schedule
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: if self._schedule is None or self.schedule_changed:
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/pulp/server/async/scheduler.py", line 348, in schedule_changed
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: if utils.get_enabled().count() != self._loaded_from_db_count:
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 566, in count
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: **command)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/database.py", line 388, in command
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: result = self["$cmd"].find_one(command, **extra_opts)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 596, in find_one
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 814, in next
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: if len(self.__data) or self._refresh():
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 763, in _refresh
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: self.__uuid_subtype))
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 700, in __send_message
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: **kwargs)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib/python2.6/site-packages/pulp/server/db/connection.py", line 166, in _with_end_request
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: return method(*args, **kwargs)
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 994, in _send_message_with_response
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: raise AutoReconnect(str(e))
May 19 10:37:10 dell-pe2950-02 pulp: celery.beat:CRITICAL: AutoReconnect: [Errno 9] Bad file descriptor
PR available at: https://github.com/pulp/pulp/pull/979 Merged Fixed in 2.4.0-0.17.beta. Closing this one because 1100005 is the same issue and has more details. *** This bug has been marked as a duplicate of bug 1100005 *** |
Created attachment 896667 [details] admin.log Description of problem: Not sure what is causing this. I have run into this exception a couple of times on 2 different servers. May 17 15:47:52 dell-pe2950-02 pulp: celery.worker.job:ERROR: NoAvailableQueues: There are no available queues in the system for reserved task work. [root@dell-pe2950-02 ~]# pulp-admin rpm repo sync run --repo-id errata +----------------------------------------------------------------------+ Synchronizing Repository [errata] +----------------------------------------------------------------------+ An internal error occurred on the Pulp server. More information can be found in the client log file ~/.pulp/admin.log. [root@dell-pe2950-02 ~]# [root@pulp-24-server ~]# pulp-admin rpm repo copy all -f zoo -t zoo-copy An internal error occurred on the Pulp server. More information can be found in the client log file ~/.pulp/admin.log. Version-Release number of selected component (if applicable): [root@dell-pe2950-02 ~]# rpm -qa pulp-server pulp-server-2.4.0-0.16.beta.el6.noarch [root@dell-pe2950-02 ~]# How reproducible: Random Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: