Description of problem: If pulp seeds from a mongodb replica set, and the mongo PRIMARY is re-elected, pulp fails to reconnect to the new PRIMARY. Version-Release number of selected component (if applicable): pulp-server-2.2.0-0.20.beta.git.0.d54a854.el6eng.cdn.1.noarch mongo server buildinfo - 2.4.6 (EPEL) pymongo-2.1.1-1.el6.x86_64 How reproducible: very Steps to Reproduce: 1. have a mongo replica set, where mongodb01.web.stage.our.domain.com is PRIMARY 2. setup pulp to seed from a host in the mongodb replica set /etc/pulp/server.conf [database] seeds: mongodb01.web.stage.our.domain.com:27017 3. Start the pulp sever (and pulp-manage-db)(ensure it's functioning correctly) 3. on the mongodb rs, have mongodb01 step down from PRIMARY ( rs.stepDown() ) 4. make calls to the pulp server ( `pulp-admin login -u admin -p S3krit` ) Actual results: An internal error occurred on the Pulp server. More information can be found in the client log file ~/.pulp/admin.log. === START ~/.pulp/admin.log ===== 2013-10-16 10:22:45,466 - ERROR - Client-side exception occurred Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/pulp/client/extensions/core.py", line 478, in run exit_code = Cli.run(self, args) File "/usr/lib/python2.6/site-packages/okaara/cli.py", line 974, in run exit_code = command_or_section.execute(self.prompt, remaining_args) File "/usr/lib/python2.6/site-packages/pulp/client/extensions/extensions.py", line 224, in execute return self.method(*arg_list, **clean_kwargs) File "/usr/lib/pulp/admin/extensions/pulp_server_info/pulp_cli.py", line 35, in types all_types = self.context.server.server_info.get_types() File "/usr/lib/python2.6/site-packages/pulp/bindings/server_info.py", line 33, in get_types return self.server.GET(path) File "/usr/lib/python2.6/site-packages/pulp/bindings/server.py", line 84, in GET return self._request('GET', path, queries) File "/usr/lib/python2.6/site-packages/pulp/bindings/server.py", line 142, in _request self._handle_exceptions(response_code, response_body) File "/usr/lib/python2.6/site-packages/pulp/bindings/server.py", line 183, in _handle_exceptions raise code_class_mappings[response_code](response_body) PermissionsException: RequestException: GET request on /pulp/api/v2/plugins/types/ failed with 401 - Pulp exception occurred: AuthenticationFailed 2013-10-16 10:23:05,446 - ERROR - Exception occurred: href: /pulp/api/v2/actions/login/ method: POST status: 500 error: create_index operation failed on pulp2_database.users: database connection still down after 3 tries traceback: None data: {u'args': [u'create_index operation failed on pulp2_database.users: database connection still down after 3 tries']} === END ~/.pulp/admin.log ===== Expected results: Successfully logged in. Session certificate will expire at Oct 23 14:24:33 2013 GMT. Additional info: If I can get the original PRIMARY node elected back as PRIMARY, then everything on pulp begins working again. This is enough of an issue to block us from promoting pulp v2 to production.
This is likely a regression. Unless it's particularly inconvenient, I think it makes sense to fix this in 2.2 and get it into our 2.2.1 release.
https://github.com/pulp/pulp/pull/672
For QE: Check out http://docs.mongodb.org/manual/tutorial/deploy-replica-set-for-testing/ for information on setting up a replica set. Here's what I did: - Set up a replica set with three mongod processes. I connected a mongo shell to each so that I could see who the primary was. It's pretty simple, the prompt in the console will indicate if it's a primary or secondary. Unconfigured Test: - Left the Pulp configuration at the default (i.e. no replica set configured but one in use) - Point Pulp at the replica set primary DB. - Run a watch on `pulp-admin rpm repo list` (or some other cheap command that hits the DB) and kill the primary database. - /var/log/pulp/pulp.log will spam messages about not being able to connect, even though another database is named the primary. Environment Reset: Stop the watch and Apache. Restart the killed Mongo DB process. At this point, it actually doesn't matter which is the primary for the purposes of Pulp server configuration; it can continue to point at the port used in the previous run even though it's very likely to be a secondary (when it comes back up it doesn't replace the newly elected primary). Configured Test: - Edit /etc/pulp/server.conf to configure it for your replica set. The comments in there should be enough to guide you, so I won't mention any more. - Restart Apache. - Restart the watch. - Kill the primary (remember to check the mongo shells to see which is the primary). The server logs will complain for a bit about not being able to connect (the sleep on the retry is super quick and mongo typically takes a bit longer than it to reorient itself). The CLI command on the watch should show errors too. - After a very short amount of time (~2 seconds), the pulp log should stop showing connection errors and the CLI should show the results of the command correctly. You can restart the killed instance, but again, it won't be renamed primary unless there's a need, so don't expect it to start fielding requests again immediately.
build: 2.2.1-0.1.beta
verified [root@pulp-v2-server ~]# rpm -qa pulp-server pulp-server-2.2.1-0.2.beta.el6.noarch [root@pulp-v2-server ~]# setup replica set as per above and made sure its working well, reconnecting to the new primary [database] name: pulp_database seeds: localhost:27017,localhost:27018,localhost:27019 operation_retries: 2
Released pulp 2.2.1