Bug 1510752 - [geo-rep]: Failover / Failback shows fault status in a non-root setup
Summary: [geo-rep]: Failover / Failback shows fault status in a non-root setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: RHGS 3.4.z Batch Update 4
Assignee: Kotresh HR
QA Contact: Rochelle
URL:
Whiteboard:
Depends On:
Blocks: 1651498 1654117 1654118
TreeView+ depends on / blocked
 
Reported: 2017-11-08 07:20 UTC by Rochelle
Modified: 2019-03-27 03:43 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.12.2-41
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1651498 (view as bug list)
Environment:
Last Closed: 2019-03-27 03:43:36 UTC
Embargoed:


Attachments (Terms of Use)

Description Rochelle 2017-11-08 07:20:36 UTC
Description of problem:
=======================

While executing a failover / failback scenario on a non-root geo-rep set up, while starting the original non-root session between the master and the slave, the status is faulty.

The logs show the following:


[2017-11-08 06:52:08.899] E [resource(/rhs/brick1/b1):234:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=
auto -S /tmp/gsyncd-aux-ssh-ozvxWN/ab5534f3bb3f74602da3c8c3068a4aa5.sock geoaccount.43.175 /nonexistent/gsyncd --session-owner b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick1%2Fb1 --local-
node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying:
[2017-11-08 06:52:08.1159] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:08.1418] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> @         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
[2017-11-08 06:52:08.1662] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:08.1856] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are too open.
[2017-11-08 06:52:08.2038] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> It is required that your private key files are NOT accessible by others.
[2017-11-08 06:52:08.2216] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> This private key will be ignored.
[2017-11-08 06:52:08.2465] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions
[2017-11-08 06:52:08.2824] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[2017-11-08 06:52:08.3571] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting.
[2017-11-08 06:52:08.7929] I [monitor(monitor):347:monitor] Monitor: worker(/rhs/brick1/b1) died before establishing connection
[2017-11-08 06:52:08.8866] I [repce(/rhs/brick1/b1):92:service_loop] RepceServer: terminating on reaching EOF.
[2017-11-08 06:52:08.9479] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting.
[2017-11-08 06:52:17.353275] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/rhs/brick2/b4). Slave node: ssh://geoaccount.43.175:gluster://localhost:slave
[2017-11-08 06:52:17.565080] I [resource(/rhs/brick2/b4):1684:connect_remote] SSH: Initializing SSH connection between master and slave...
[2017-11-08 06:52:17.567466] I [changelogagent(/rhs/brick2/b4):73:__init__] ChangelogAgent: Agent listining...
[2017-11-08 06:52:17.714301] E [syncdutils(/rhs/brick2/b4):269:log_raise_exception] <top>: connection to peer is broken
[2017-11-08 06:52:17.715086] E [resource(/rhs/brick2/b4):234:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMast
er=auto -S /tmp/gsyncd-aux-ssh-S6S9iP/ab5534f3bb3f74602da3c8c3068a4aa5.sock geoaccount.43.175 /nonexistent/gsyncd --session-owner b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick2%2Fb4 --loc
al-node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying:
[2017-11-08 06:52:17.715459] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:17.715709] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> @         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
[2017-11-08 06:52:17.715914] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:17.716105] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are too open.
[2017-11-08 06:52:17.716289] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> It is required that your private key files are NOT accessible by others.
[2017-11-08 06:52:17.716600] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> This private key will be ignored.
[2017-11-08 06:52:17.716799] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions
[2017-11-08 06:52:17.717060] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[2017-11-08 06:52:17.717834] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>: exiting.
[2017-11-08 06:52:17.721502] I [repce(/rhs/brick2/b4):92:service_loop] RepceServer: terminating on reaching EOF.
[2017-11-08 06:52:17.722084] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>: exiting.
[2017-11-08 06:52:17.721748] I [monitor(monitor):347:monitor] Monitor: worker(/rhs/brick2/b4) died before establishing connection


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-geo-replication-3.8.4-51.el7rhgs.x86_64


Steps to Reproduce:
===================
1. Created a non-root session between the master and the slave 
2. Stopped the master volume with the force option
3. Promoted slave to master
4. Brought master back online and stopped original geo-rep session between original master and slave
5. Set up non-root session from original slave to original master and wrote some data
6. Stopped IO and set checkpoint
7. Waited for checkpoint to complete
8. Stopped and deleted geo-rep session between original slave to original master
9. Reset the options that promoted slave volume as master volume
10. Resume the original session between the original master and original slave



Actual results:
===============
Geo-rep status was faulty

Expected results:
================
Geo-rep status should be ACTIVE / PASSIVE

Comment 7 Sweta Anandpara 2018-12-11 07:13:09 UTC
At the release stakeholders meeting this morning, it was agreed to push this out of proposed list of 3.4.3, and to be considered for a future batch update.

Comment 22 errata-xmlrpc 2019-03-27 03:43:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0658


Note You need to log in before you can comment on or make changes to this bug.