Bug 1021011

Summary: mongo seeds are not reconnecting to new PRIMARY of a replica set
Product: [Retired] Pulp Reporter: Michael Hrivnak <mhrivnak>
Component: z_otherAssignee: Jay Dobies <jason.dobies>
Status: CLOSED CURRENTRELEASE QA Contact: pulp-qe-list
Severity: high Docs Contact:
Priority: urgent    
Version: 2.2 BetaCC: dgregor, mhrivnak, pthomas, skarmark, vbatts
Target Milestone: ---Keywords: Triaged
Target Release: 2.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1019909 Environment:
Last Closed: 2013-12-09 14:31:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1019909    
Bug Blocks:    

Description Michael Hrivnak 2013-10-18 18:46:22 UTC
+++ This bug was initially created as a clone of Bug #1019909 +++

Description of problem:
If pulp seeds from a mongodb replica set, and the mongo PRIMARY is re-elected, pulp fails to reconnect to the new PRIMARY.

Version-Release number of selected component (if applicable):
pulp-server-2.2.0-0.20.beta.git.0.d54a854.el6eng.cdn.1.noarch
mongo server buildinfo - 2.4.6 (EPEL)
pymongo-2.1.1-1.el6.x86_64

How reproducible:
very

Steps to Reproduce:
1. have a mongo replica set, where mongodb01.web.stage.our.domain.com is PRIMARY
2. setup pulp to seed from a host in the mongodb replica set
 /etc/pulp/server.conf [database] seeds: mongodb01.web.stage.our.domain.com:27017
3. Start the pulp sever (and pulp-manage-db)(ensure it's functioning correctly)
3. on the mongodb rs, have mongodb01 step down from PRIMARY ( rs.stepDown() )
4. make calls to the pulp server ( `pulp-admin login -u admin -p S3krit` )

Actual results:
   An internal error occurred on the Pulp server. More information can be found in the client log file ~/.pulp/admin.log.
=== START ~/.pulp/admin.log =====
2013-10-16 10:22:45,466 - ERROR - Client-side exception occurred
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/client/extensions/core.py", line 478, in run
    exit_code = Cli.run(self, args)
  File "/usr/lib/python2.6/site-packages/okaara/cli.py", line 974, in run
    exit_code = command_or_section.execute(self.prompt, remaining_args)
  File "/usr/lib/python2.6/site-packages/pulp/client/extensions/extensions.py", line 224, in execute
    return self.method(*arg_list, **clean_kwargs)
  File "/usr/lib/pulp/admin/extensions/pulp_server_info/pulp_cli.py", line 35, in types
    all_types = self.context.server.server_info.get_types()
  File "/usr/lib/python2.6/site-packages/pulp/bindings/server_info.py", line 33, in get_types
    return self.server.GET(path)
  File "/usr/lib/python2.6/site-packages/pulp/bindings/server.py", line 84, in GET
    return self._request('GET', path, queries)
  File "/usr/lib/python2.6/site-packages/pulp/bindings/server.py", line 142, in _request
    self._handle_exceptions(response_code, response_body)
  File "/usr/lib/python2.6/site-packages/pulp/bindings/server.py", line 183, in _handle_exceptions
    raise code_class_mappings[response_code](response_body)
PermissionsException: RequestException: GET request on /pulp/api/v2/plugins/types/ failed with 401 - Pulp exception occurred: AuthenticationFailed
2013-10-16 10:23:05,446 - ERROR - Exception occurred:
        href:      /pulp/api/v2/actions/login/
        method:    POST
        status:    500
        error:     create_index operation failed on pulp2_database.users: database connection still down after 3 tries
        traceback: None
        data:      {u'args': [u'create_index operation failed on pulp2_database.users: database connection still down after 3 tries']}
=== END ~/.pulp/admin.log =====

Expected results:
   Successfully logged in. Session certificate will expire at Oct 23 14:24:33 2013 GMT.

Additional info:
If I can get the original PRIMARY node elected back as PRIMARY, then everything on pulp begins working again.

This is enough of an issue to block us from promoting pulp v2 to production.

--- Additional comment from Michael Hrivnak on 2013-10-18 14:45:48 EDT ---

This is likely a regression. Unless it's particularly inconvenient, I think it makes sense to fix this in 2.2 and get it into our 2.2.1 release.

Comment 1 Jeff Ortel 2013-10-29 15:29:33 UTC
build: 2.3.0-0.26.beta

Comment 2 Preethi Thomas 2013-11-01 01:28:55 UTC
I followed the steps from 

https://bugzilla.redhat.com/show_bug.cgi?id=1019909#c3

So the first part of the test gave the following result in the log

PulpCollectionFailure: find_one operation failed on pulp_database.users: database connection still down after 3 tries


Now I configured server.cog


[root@hp-dl120g5-01 ~]# cat /etc/pulp/server.conf |grep seed
# seeds: comma-separated list of hostname:port of database replica seed hosts
seeds: localhost:27018,localhost:27019,localhost:27020


First I killed 27018

So 27019 became primary and the it continued work.

Then I killed 27019, so the only one running was localhost:27020, but it did not work after.  localhost:27020 stayed as SECONDARY

Comment 3 Preethi Thomas 2013-11-01 13:25:20 UTC
moving to verified.
[root@hp-sl2x170zg6-01 ~]# rpm -qa pulp-server
pulp-server-2.3.0-0.26.beta.el6.noarch
[root@hp-sl2x170zg6-01 ~]# 

The last statement in the above comment is mongo behavior.

Comment 4 Preethi Thomas 2013-12-09 14:31:03 UTC
Pulp 2.3 released.