Description of problem: ======================= While executing a failover / failback scenario on a non-root geo-rep set up, while starting the original non-root session between the master and the slave, the status is faulty. The logs show the following: [2017-11-08 06:52:08.899] E [resource(/rhs/brick1/b1):234:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster= auto -S /tmp/gsyncd-aux-ssh-ozvxWN/ab5534f3bb3f74602da3c8c3068a4aa5.sock geoaccount.43.175 /nonexistent/gsyncd --session-owner b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick1%2Fb1 --local- node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying: [2017-11-08 06:52:08.1159] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [2017-11-08 06:52:08.1418] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> @ WARNING: UNPROTECTED PRIVATE KEY FILE! @ [2017-11-08 06:52:08.1662] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [2017-11-08 06:52:08.1856] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are too open. [2017-11-08 06:52:08.2038] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> It is required that your private key files are NOT accessible by others. [2017-11-08 06:52:08.2216] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> This private key will be ignored. [2017-11-08 06:52:08.2465] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions [2017-11-08 06:52:08.2824] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). [2017-11-08 06:52:08.3571] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting. [2017-11-08 06:52:08.7929] I [monitor(monitor):347:monitor] Monitor: worker(/rhs/brick1/b1) died before establishing connection [2017-11-08 06:52:08.8866] I [repce(/rhs/brick1/b1):92:service_loop] RepceServer: terminating on reaching EOF. [2017-11-08 06:52:08.9479] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting. [2017-11-08 06:52:17.353275] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/rhs/brick2/b4). Slave node: ssh://geoaccount.43.175:gluster://localhost:slave [2017-11-08 06:52:17.565080] I [resource(/rhs/brick2/b4):1684:connect_remote] SSH: Initializing SSH connection between master and slave... [2017-11-08 06:52:17.567466] I [changelogagent(/rhs/brick2/b4):73:__init__] ChangelogAgent: Agent listining... [2017-11-08 06:52:17.714301] E [syncdutils(/rhs/brick2/b4):269:log_raise_exception] <top>: connection to peer is broken [2017-11-08 06:52:17.715086] E [resource(/rhs/brick2/b4):234:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMast er=auto -S /tmp/gsyncd-aux-ssh-S6S9iP/ab5534f3bb3f74602da3c8c3068a4aa5.sock geoaccount.43.175 /nonexistent/gsyncd --session-owner b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick2%2Fb4 --loc al-node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying: [2017-11-08 06:52:17.715459] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [2017-11-08 06:52:17.715709] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> @ WARNING: UNPROTECTED PRIVATE KEY FILE! @ [2017-11-08 06:52:17.715914] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [2017-11-08 06:52:17.716105] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are too open. [2017-11-08 06:52:17.716289] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> It is required that your private key files are NOT accessible by others. [2017-11-08 06:52:17.716600] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> This private key will be ignored. [2017-11-08 06:52:17.716799] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions [2017-11-08 06:52:17.717060] E [resource(/rhs/brick2/b4):238:logerr] Popen: ssh> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). [2017-11-08 06:52:17.717834] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>: exiting. [2017-11-08 06:52:17.721502] I [repce(/rhs/brick2/b4):92:service_loop] RepceServer: terminating on reaching EOF. [2017-11-08 06:52:17.722084] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>: exiting. [2017-11-08 06:52:17.721748] I [monitor(monitor):347:monitor] Monitor: worker(/rhs/brick2/b4) died before establishing connection Version-Release number of selected component (if applicable): ============================================================= glusterfs-geo-replication-3.8.4-51.el7rhgs.x86_64 Steps to Reproduce: =================== 1. Created a non-root session between the master and the slave 2. Stopped the master volume with the force option 3. Promoted slave to master 4. Brought master back online and stopped original geo-rep session between original master and slave 5. Set up non-root session from original slave to original master and wrote some data 6. Stopped IO and set checkpoint 7. Waited for checkpoint to complete 8. Stopped and deleted geo-rep session between original slave to original master 9. Reset the options that promoted slave volume as master volume 10. Resume the original session between the original master and original slave Actual results: =============== Geo-rep status was faulty Expected results: ================ Geo-rep status should be ACTIVE / PASSIVE
At the release stakeholders meeting this morning, it was agreed to push this out of proposed list of 3.4.3, and to be considered for a future batch update.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0658