Bug 1129488
| Summary: | pulp_celerybeat Mongodb Reconnect Support Fails After Three Tries | |||
|---|---|---|---|---|
| Product: | [Retired] Pulp | Reporter: | Brian Bouterse <bmbouter> | |
| Component: | async/tasks | Assignee: | Brian Bouterse <bmbouter> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Preethi Thomas <pthomas> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | medium | |||
| Version: | Master | CC: | austin, bmbouter, pthomas, rbarlow, skarmark | |
| Target Milestone: | --- | Keywords: | Reopened, Triaged | |
| Target Release: | 2.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1129768 (view as bug list) | Environment: | ||
| Last Closed: | 2014-11-24 21:33:22 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1129768, 1131719 | |||
PR available at: https://github.com/pulp/pulp/pull/1109 Steps to reproduce: 1. Start all the pulp services 2. Perform a sync to ensure everything is running normally 3. Stop mongod 4. Wait a two minutes (or as long as you like) 5. Start mongod 6. Perform a sync to ensure everything is running normally Actually the PR (against the correct branch this time) is here: https://github.com/pulp/pulp/pull/1111 https://github.com/pulp/pulp/pull/1118 PR #1111 was closed and reopened against the new 2.4.1-dev branch. PR 1118 was merged into 2.4.1-dev branch This PR is merged to the 2.4.1-dev branch, but the commits were not included in the 2.4-dev branch or the master branch. Please merge these commits to 2.4-dev and master, and then move this bug back to MODIFIED. I've merged this PR into 2.4-dev and master. I also merged it into pulp-2.4 by mistake, but that will not cause any issues. This was fixed in 2.4.1-0.2.alpha, available in Pulp's testing repository. verified on rhel6.5 & 7 [root@qe-blade-08 module]# rpm -qa pulp-server pulp-server-2.4.1-0.2.alpha.el6.noarch [root@qe-blade-08 module]# [root@qe-blade-13 ~]# rpm -qa pulp-server pulp-server-2.4.1-0.2.alpha.el7.noarch [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# pulp-admin rpm repo create --repo-id zoo --feed https://repos.fedorapeople.org/repos/pulp/pulp/demo_repos/zoo/ Successfully created repository [zoo] [root@qe-blade-13 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [|] ... completed Downloading repository content... [==================================================] 100% RPMs: 32/32 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [==================================================] 100% 32 of 32 items ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# systemctl stop mongo Failed to issue method call: Unit mongo.service not loaded. [root@qe-blade-13 ~]# systemctl stop mongod [root@qe-blade-13 ~]# systemctl start mongod [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# [root@qe-blade-13 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [|] ... completed Downloading repository content... [==================================================] 100% RPMs: 0/0 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [==================================================] 100% 32 of 32 items ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@qe-blade-13 ~]# This is fixed in Pulp-2.4.1-1. This was regressed in 2.5.0 so I'm reopening this again. Another PR to fix the regression: https://github.com/pulp/pulp/pull/1302 Merged to 2.5-testing -> 2.5-dev -> master verified [root@hp-dl380pgen8-02-vm-2 ~]# rpm -qa pulp-server pulp-server-2.5.0-0.19.rc.el7.noarch [root@hp-dl380pgen8-02-vm-2 ~]# [root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin login -u admin -p admin Successfully logged in. Session certificate will expire at Nov 25 13:38:36 2014 GMT. [root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin rpm repo create --repo-id zoo --feed https://repos.fedorapeople.org/repos/pulp/pulp/demo_repos/zoo/ Successfully created repository [zoo] [root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [\] ... completed Downloading repository content... [==================================================] 100% RPMs: 32/32 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [==================================================] 100% 32 of 32 items ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@hp-dl380pgen8-02-vm-2 ~]# systemctl stop mongod [root@hp-dl380pgen8-02-vm-2 ~]# systemctl start mongod[root@hp-dl380pgen8-02-vm-2 ~]# pulp-admin rpm repo sync run --repo-id zoo +----------------------------------------------------------------------+ Synchronizing Repository [zoo] +----------------------------------------------------------------------+ This command may be exited via ctrl+c without affecting the request. Downloading metadata... [\] ... completed Downloading repository content... [==================================================] 100% RPMs: 0/0 items Delta RPMs: 0/0 items ... completed Downloading distribution files... [==================================================] 100% Distributions: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [-] ... completed Task Succeeded Copying files [-] ... completed Initializing repo metadata [-] ... completed Publishing Distribution files [-] ... completed Publishing RPMs [-] ... completed Publishing Delta RPMs ... skipped Publishing Errata [==================================================] 100% 4 of 4 items ... completed Publishing Comps file [==================================================] 100% 3 of 3 items ... completed Publishing Metadata. [-] ... completed Closing repo metadata [-] ... completed Generating sqlite files ... skipped Publishing files to web [-] ... completed Writing Listings File [-] ... completed Task Succeeded [root@hp-dl380pgen8-02-vm-2 ~]# |
Description of problem: If pulp_celerybeat is running and correctly connected to mongod, and then mongod becomes unreachable for some reason, pulp_celerybeat will try to reconnect for 3 times, but will fail on the 4th time. Once the failure occurs it will die, and will never recover. NOTE: Reconnect support does work properly on pulp_celerybeat if mongod is not available while starting. This BZ only affects reconnect support after an initial successful connection to mongod is made. Version-Release number of selected component (if applicable): Pulp 2.4.0-1 How reproducible: Always Steps to Reproduce: 1. Stop pulp_celerybeat, pulp_resource_manager, pulp_workers, and httpd 2. Start mongod and qpidd 3. Start pulp_celerybeat and verify that the logs show that it starts correctly. 4. Stop mongod 5. Wait 2 minutes (ish) 6. Observe the following traceback: Aug 12 16:42:41 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:42:51 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:42:56 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini ng: 2 Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini ng: 1 Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.db.connection:WARNING: create_index operation failed on pulp_database.workers: tries remaini ng: 0 Aug 12 16:42:57 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: create_index operation failed on pulp_database.workers: database conn ection still down after 3 tries Aug 12 16:43:01 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:43:11 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:43:21 dhcp129-138.rdu.redhat.com pulp[25595]: pulp.server.async.scheduler:ERROR: [Errno 111] Connection refused Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: beat raised exception <class 'pymongo.errors.AutoReconnect'>: AutoReconnect('could not connect to localhost:27017: [Errno 111] Connection refused',) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: Traceback (most recent call last): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/celery/celery/apps/beat.py", line 112, in start_scheduler Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: beat.start() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/celery/celery/beat.py", line 462, in start Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: interval = self.scheduler.tick() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 317, in tick Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: ret = super(Scheduler, self).tick() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/celery/celery/beat.py", line 219, in tick Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: for entry in values(self.schedule): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 387, in schedule Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: if self.schedule_changed: Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/async/scheduler.py", line 366, in schedule_changed Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: if utils.get_enabled().count() != self._loaded_from_db_count: Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 566, in count Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: **command) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/database.py", line 388, in command Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: result = self["$cmd"].find_one(command, **extra_opts) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 596, in find_one Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: for result in self.find(spec_or_id, *args, **kwargs).limit(-1): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 814, in next Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: if len(self.__data) or self._refresh(): Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 763, in _refresh Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: self.__uuid_subtype)) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 700, in __send_message Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: **kwargs) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/home/bmbouter/Documents/pulp/server/pulp/server/db/connection.py", line 171, in _with_end_request Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: return method(*args, **kwargs) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 985, in _send_message_with_response Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: sock_info = self.__socket() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 720, in __socket Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: host, port = self.__find_node() Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 713, in __find_node Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: raise AutoReconnect(', '.join(errors)) Aug 12 16:43:27 dhcp129-138.rdu.redhat.com pulp[25595]: celery.beat:CRITICAL: AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused Expected results: pulp_celerybeat to wait indefinitely for mongod to become available again.