Description of problem: If pulp_celerybeat is running and correctly connected to mongod, and then mongod becomes unreachable for some reason, pulp_celerybeat will try to reconnect for 3 times, but will fail on the 4th time. Once the failure occurs it will die, and will never recover. NOTE: Reconnect support does work properly on pulp_celerybeat if mongod is not available while starting. This BZ only affects reconnect support after an initial successful connection to mongod is made. Version-Release number of selected component (if applicable): Pulp 2.4.0-1 How reproducible: Always Steps to Reproduce: 1. Stop pulp_celerybeat, pulp_resource_manager, pulp_workers, and httpd 2. Start mongod and qpidd 3. Start pulp_celerybeat and verify that the logs show that it starts correctly. 4. Stop mongod 5. Wait 2 minutes (ish) 6. Observe the following traceback: Aug 12 16:42:41 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:42:51 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:42:56 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini ng: 2 Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini ng: 1 Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini ng: 0 Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: create_index operation failed on pulp_database.workers: database conn ection still down after 3 tries Aug 12 16:43:01 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:43:11 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:43:21 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: beat raised exception <class 'pymongo.errors.AutoReconnect'>: AutoReconnect('could not connect to localhost:27017: [Errno 111] Connection refused',) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: Traceback (most recent call last): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/celery/celery/apps/beat.py", line 112, in start_scheduler Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: beat.start() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/celery/celery/beat.py", line 462, in start Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: interval = self.scheduler.tick() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 317, in tick Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: ret = super(Scheduler, self).tick() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/celery/celery/beat.py", line 219, in tick Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: for entry in values(self.schedule): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 387, in schedule Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: if self.schedule_changed: Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 366, in schedule_changed Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: if utils.get_enabled().count() != self._loaded_from_db_count: Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 566, in count Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: **command) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/database.py", line 388, in command Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: result = self["$cmd"].find_one(command, **extra_opts) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 596, in find_one Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: for result in self.find(spec_or_id, *args, **kwargs).limit(-1): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 814, in next Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: if len(self.__data) or self._refresh(): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 763, in _refresh Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: self.__uuid_subtype)) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 700, in __send_message Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: **kwargs) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/db/connection.py", line 171, in _with_end_request Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: return method(*args, **kwargs) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 985, in _send_message_with_response Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: sock_info = self.__socket() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 720, in __socket Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: host, port = self.__find_node() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 713, in __find_node Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: raise AutoReconnect(', '.join(errors)) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused Expected results: pulp_celerybeat to wait indefinitely for mongod to become available again.
PR available at: https://github.com/pulp/pulp/pull/1109
Steps to reproduce: 1. Start all the pulp services 2. Perform a sync to ensure everything is running normally 3. Stop mongod 4. Wait a two minutes (or as long as you like) 5. Start mongod 6. Perform a sync to ensure everything is running normally
Actually the PR (against the correct branch this time) is here: https://github.com/pulp/pulp/pull/1111
https://github.com/pulp/pulp/pull/1118 PR #1111 was closed and reopened against the new 2.4.1-dev branch.
PR 1118 was merged into 2.4.1-dev branch
This PR is merged to the 2.4.1-dev branch, but the commits were not included in the 2.4-dev branch or the master branch. Please merge these commits to 2.4-dev and master, and then move this bug back to MODIFIED.
I've merged this PR into 2.4-dev and master. I also merged it into pulp-2.4 by mistake, but that will not cause any issues.
This was fixed in 2.4.1-0.2.alpha, available in Pulp's testing repository.
verified on rhel6.5 & 7 [root@qe-blade-08 module]# rpm -qa pulp-server pulp-server-2.4.1-0.2.alpha.el6.noarch [root@qe-blade-08 module]# [root@qe-blade-13 ~]# rpm -qa pulp-server pulp-server-2.4.1-0.2.alpha.el7.noarch [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# pulp-admin rpm repo create --repo-id zoo --feed https://repos.fedorapeople.org/repos/pulp/pulp/demo_repos/zoo/ Successfully created repository [zoo] [root@qe-blade-13 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [|] ... completed Downloading repository content... [==================================================] 100% RPMs: 32/32 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [==================================================] 100% 32 of 32 items ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# systemctl stop mongo Failed to issue method call: Unit mongo.service not loaded. [root@qe-blade-13 ~]# systemctl stop mongod [root@qe-blade-13 ~]# systemctl start mongod [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [|] ... completed Downloading repository content... [==================================================] 100% RPMs: 0/0 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [==================================================] 100% 32 of 32 items ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@qe-blade-13 ~]#
This is fixed in Pulp-2.4.1-1.
This was regressed in 2.5.0 so I'm reopening this again.
Another PR to fix the regression: https://github.com/pulp/pulp/pull/1302
Merged to 2.5-testing -> 2.5-dev -> master
verified [root@hp-dl380pgen8-02-vm-2 ~]# rpm -qa pulp-server pulp-server-2.5.0-0.19.rc.el7.noarch [root@hp-dl380pgen8-02-vm-2 ~]# [root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin login -u admin -p admin Successfully logged in. Session certificate will expire at Nov 25 13:38:36 2014 GMT. [root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin rpm repo create --repo-id zoo --feed https://repos.fedorapeople.org/repos/pulp/pulp/demo_repos/zoo/ Successfully created repository [zoo] [root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [\] ... completed Downloading repository content... [==================================================] 100% RPMs: 32/32 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [==================================================] 100% 32 of 32 items ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@hp-dl380pgen8-02-vm-2 ~]# systemctl stop mongod [root@hp-dl380pgen8-02-vm-2 ~]# systemctl start mongod[root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [\] ... completed Downloading repository content... [==================================================] 100% RPMs: 0/0 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Copying files [-] ... completed Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [-] ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@hp-dl380pgen8-02-vm-2 ~]#