Bug 1135321

Summary: common_secret.pem.pub handling prevents multiple geo-replication sessions
Product: [Community] GlusterFS Reporter: nathan r. hruby <nhruby>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: medium    
Version: 3.5.2CC: avishwan, bugs, gluster-bugs, nhruby
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-30 05:30:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1183229    
Bug Blocks:    

Description nathan r. hruby 2014-08-29 05:01:28 UTC
Description of problem:

Assume a geo-replication topography where two different hosts with a volume both replicate to a central remote host having two volumes, and the central host then re-mirrors origin node data to the opposing node on a seperate volume in a chained fashion.

Like so, (capital letter is a host, lowercase is a volume)
A(a) -> C(a) -> B(a)
B(b) -> C(b) -> A(b)

So far this seems to work, expect in the case of initial creation:  

- The push-pem option of create replaces common_secret.pem.pub on the target host.

- When that target host is also a replication master, this replaces the master common_secret.pem.pub on this host.

- Create commands after this event on the second host will then push the common_secret.pem.pub containing the masters' masters' keys (eg, keys from step 1).

- This breaks start since the pubkeys are wrong.

- Start reports a success, even though the session doesn't start and the only thing in the log is a permission denied error from using the wrong key.


Version-Release number of selected component (if applicable):
3.5.2

How reproducible:
Always

Steps to Reproduce:
1. Create chained replication as described above
2. Note that common_secret.pem.pub has the "wrong" keys for a replication master on some subset of hosts.
3. Note that start doesn't start

Actual results:
All session creation and starting works as expected

Expected results:
Subset of nodes need their keys recreated again and pushed in order to function correctly.

Additional info:

It looks like common_secret.pem.pub is a shortcut for only having to copy around one file instead of two.  Probably the easiest fix would be to change the name of this file to contain a host string, and modify the command that use it to require the file-name instead of assuming it.

For future folks: The workaround is to run "gluster system:: execute gsec_create" again which will regenerate common_secret.pem.pub for the host without changing the keys.  You can then re-run the "create push-pem" step to recoy keys after which, a start will work correctly.  It seems like you could also use OOB management (ansible, puppet, etc..) to copy the keys around and skip the push-pem business.

Comment 1 Aravinda VK 2015-12-29 10:37:04 UTC
Patch merged in Mainline, Do we need this fix for 3.5.x?
https://bugzilla.redhat.com/show_bug.cgi?id=1183229

We can close this bug if not required for 3.5.x. Please confirm.

Comment 2 nathan r. hruby 2015-12-29 15:33:27 UTC
Fix for this is in 3.7 so OK to close.

Comment 3 Aravinda VK 2015-12-30 05:30:44 UTC
BZ 1183229 fixes this issue in gluster 3.7, Closing this bug based on Comment 2.

Please reopen if required again in 3.5.x

Thanks.