Bug 1129768

Summary: pulp_celerybeat Mongodb Reconnect Support Fails After Three Tries
Product: Red Hat Satellite 6 Reporter: Brian Bouterse <bmbouter>
Component: OtherAssignee: Jason Montleon <jmontleo>
Status: CLOSED CURRENTRELEASE QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: UnspecifiedCC: bbuckingham, cwelton, pulp-bugs, pulp-qe-list, sthirugn
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1129488 Environment:
Last Closed: 2014-09-11 12:23:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 1129488    
Bug Blocks: 950743    

Description Brian Bouterse 2014-08-13 15:34:10 UTC
+++ This bug was initially created as a clone of Bug #1129488 +++

Description of problem: If pulp_celerybeat is running and correctly connected to mongod, and then mongod becomes unreachable for some reason, pulp_celerybeat will try to reconnect for 3 times, but will fail on the 4th time. Once the failure occurs it will die, and will never recover.

NOTE: Reconnect support does work properly on pulp_celerybeat if mongod is not available while starting. This BZ only affects reconnect support after an initial successful connection to mongod is made.


Version-Release number of selected component (if applicable): Pulp 2.4.0-1


How reproducible:
Always


Steps to Reproduce:
1. Stop pulp_celerybeat, pulp_resource_manager, pulp_workers, and httpd
2. Start mongod and qpidd
3. Start pulp_celerybeat and verify that the logs show that it starts correctly.
4. Stop mongod
5. Wait 2 minutes (ish)
6. Observe the following traceback:

Aug 12 16:42:41 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused
Aug 12 16:42:51 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused
Aug 12 16:42:56 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini
ng: 2
Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini
ng: 1
Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini
ng: 0
Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: create_index operation failed on pulp_database.workers: database conn
ection still down after 3 tries
Aug 12 16:43:01 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused
Aug 12 16:43:11 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused
Aug 12 16:43:21 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: beat raised exception <class 'pymongo.errors.AutoReconnect'>: AutoReconnect('could
 not connect to localhost:27017: [Errno 111] Connection refused',)
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: Traceback (most recent call last):
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/celery/celery/apps/beat.py", line 112, in start_scheduler
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     beat.start()
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/celery/celery/beat.py", line 462, in start
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     interval = self.scheduler.tick()
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line
 317, in tick
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     ret = super(Scheduler, self).tick()
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/celery/celery/beat.py", line 219, in tick
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     for entry in values(self.schedule):
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 387, in schedule
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     if self.schedule_changed:
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 366, in schedule_changed
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     if utils.get_enabled().count() != self._loaded_from_db_count:
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 566, in count
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     **command)
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/database.py", line 388, in command
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     result = self["$cmd"].find_one(command, **extra_opts)
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 596, in find_one
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 814, in next
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     if len(self.__data) or self._refresh():
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 763, in _refresh
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     self.__uuid_subtype))
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 700, in __send_message
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     **kwargs)
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/home/bmbouter/Documents/pulp/server/pulp/server/db/connection.py", line 171, in _with_end_request
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     return method(*args, **kwargs)
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 985, in _send_message_with_response
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     sock_info = self.__socket()
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 720, in __socket
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     host, port = self.__find_node()
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:   File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 713, in __find_node
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL:     raise AutoReconnect(', '.join(errors))
Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused


Expected results:
pulp_celerybeat to wait indefinitely for mongod to become available again.

Comment 2 Corey Welton 2014-08-14 14:23:54 UTC
Possibly root cause of bug #1128299?

Comment 7 Bryan Kearney 2014-09-11 12:23:32 UTC
This was delivered with Satellite 6.0 which was released on 10 September 2014.