Description of problem: After the creation of the geo replication session, one of the volumes goes faulty. Version-Release number of selected component (if applicable): RHS 3.0 EA How reproducible: Create a geo replication session Steps to Reproduce: My architecture is 2 RHS v3EA exporting a replicated volume and a third RHS v3EA acting as a slave. I create the ssh passwordless between all the nodes and the volumes: Master volume: [root@rhs3ea-node2 ~]# gluster volume info vol_gluster Volume Name: vol_gluster Type: Replicate Volume ID: 99454c86-5e50-4895-a542-5c6a07c925c1 Status: Started Snap Volume: no Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhs3ea-node1:/bricks/b-vol-gl Brick2: rhs3ea-node2:/bricks/b-vol-gl Options Reconfigured: auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 Slave volume: [root@rhs3ea-node3 ~]# gluster volume create vol_geo_gluster rhs3ea-node3:/bricks/b-vol-gl force [root@rhs3ea-node3 ~]# gluster volume status vol_geo_gluster Status of volume: vol_geo_gluster Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs3ea-node3:/bricks/b-vol-gl 49152 Y 20156 NFS Server on localhost 2049 Y 20169 Task Status of Volume vol_geo_gluster ------------------------------------------------------------------------------ There are no active volume tasks But when I try to verify its status, I can see volume is in faulty state: [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS --------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster faulty N/A N/A rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Passive N/A N/A [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS --------------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl ssh://rhs3ea-node3::vol_geo_gluster faulty N/A N/A rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Passive N/A N/A So I went to log file (/var/log/glusterfs/geo-replication/vol_gluster/ssh%3A%2F%2Froot%4010.33.11.171%3Agluster%3A%2F%2F127.0.0.1%3Avol_geo_gluster.log) and I can see: [2014-07-29 13:13:49.577354] E [resource(/bricks/b-vol-gl):220:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-wQCqbR/2f6b7000343f40f5be2b9828056ab09f.sock root@rhs3ea-node3 /nonexistent/gsyncd --session-owner 99454c86-5e50-4895-a542-5c6a07c925c1 -N --listen --timeout 120 gluster://localhost:vol_geo_gluster" returned with 127, saying: [2014-07-29 13:13:49.577475] E [resource(/bricks/b-vol-gl):224:logerr] Popen: ssh> bash: /nonexistent/gsyncd: No such file or directory [2014-07-29 13:13:49.577889] I [syncdutils(/bricks/b-vol-gl):214:finalize] <top>: exiting. [2014-07-29 13:13:49.579058] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2014-07-29 13:13:49.579267] I [syncdutils(agent):214:finalize] <top>: exiting. I went to the admin guide where Iām pointed to ssh config, but since it is working between the nodes, I decided to look for the missing file: [root@rhs3ea-node1 ~]# ll /usr/local/libexec/glusterfs/gsyncd ls: no se puede acceder a /usr/local/libexec/glusterfs/gsyncd: No existe el fichero o el directorio [root@rhs3ea-node1 ~]# find / -name gsyncd /usr/libexec/glusterfs/gsyncd Since there is nothing on gluster man page about those options, I did this workaround which is no elegant at all, but works: [root@rhs3ea-node1 ~]# mkdir /nonexistent [root@rhs3ea-node1 ~]# ln -s /usr/libexec/glusterfs/gsyncd /nonexistent/gsyncd [root@rhs3ea-node1 glusterfs]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Stopped N/A N/A N/A N/A N/A N/A N/A rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Stopped N/A N/A 0 0 0 0 0 [root@rhs3ea-node1 glusterfs]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster start Starting geo-replication session between vol_gluster & rhs3ea-node3::vol_geo_gluster has been successful [root@rhs3ea-node1 glusterfs]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Initializing... N/A N/A 0 0 0 0 0 rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Initializing... N/A N/A 0 0 0 0 0 [root@rhs3ea-node1 glusterfs]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Active N/A Changelog Crawl 0 0 0 0 0 rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Passive N/A N/A 0 0 0 0 0 Actual results: [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS --------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster faulty N/A N/A rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Passive N/A N/A Expected results: MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Active N/A Changelog Crawl 0 0 0 0 0 rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Passive N/A N/A 0 0 0 0 0 Additional info:
Is `gluster system:: execute gsec_create` is run before geo-rep create?
yes it is the first command I executed: To create the pem pub file on the master node: [root@rhs3ea-node1 ~]# gluster system:: execute gsec_create Common secret pub file present at /var/lib/glusterd/geo-replication/common_secret.pem.pub Create the geo-replication session [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster create push-pem force Creating geo-replication session between vol_gluster & rhs3ea-node3::vol_geo_gluster has been successful [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ------------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Not Started N/A N/A rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Not Started N/A N/A [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster start Starting geo-replication session between vol_gluster & rhs3ea-node3::vol_geo_gluster has been successful [root@rhs3ea-node1 ~]# gluster volume geo-replication vol_gluster rhs3ea-node3::vol_geo_gluster status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS --------------------------------------------------------------------------------------------------------------------------------- rhs3ea-node1 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster faulty N/A N/A rhs3ea-node2 vol_gluster /bricks/b-vol-gl rhs3ea-node3::vol_geo_gluster Passive N/A N/A
Please copy the output of following command from all the nodes. grep "gsyncd" /root/.ssh/authorized_keys
Master Node 1: [root@rhs3ea-node1 ~]# grep "gsyncd" /root/.ssh/authorized_keys [root@rhs3ea-node1 ~]# Master Node 2: [root@rhs3ea-node2 ~]# grep "gsyncd" /root/.ssh/authorized_keys [root@rhs3ea-node2 ~]# Slave Node: [root@rhs3ea-node3 ~]# grep "gsyncd" /root/.ssh/authorized_keys command="/usr/libexec/glusterfs/gsyncd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ== root@rhs3ea-node1 command="/usr/libexec/glusterfs/gsyncd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAoQB28c+u32eqxOFrxGT2iuHT/C8Z+K/GbX3ewF+bJDQCVMEdzmu1K+YXr4miK2CfMgrprmjz5I2a1M7+jdgOJmWDnZIPY3ZsjGgwH/zJKPzQ3koy5jtHQtzFQIGMQ9d6jz85uGQpFxH9jQ908ksKYMtoI4Glofwxi9rCHDBkfI2d9YLZUjm0DMCdWHw11QPRmTVk/HGIG396mwD99h+fwXYgrx8BAwhM0kFk6IUCzoL2To3k4RhmyVjKWDHVHwHIPsV+Z8b7WU1bOUwKXd3yxq45XrauVmc56JsoogIdlSfgSJs9FXO5OiDCraz212gYEYX8sbZZD6cMzAzA0+DGcQ== root@rhs3ea-node2 [root@rhs3ea-node3 ~]#
command(command="/usr/libexec/glusterfs/gsyncd") in /root/.ssh/authorized_keys prevents running any other command in slave other than gsyncd. So with this entry in authorized keys even if we run `ssh root@rhs3ea-node3 /nonexisting/gsyncd` it will always run the command specified in the authorized keys.(/usr/libexec/glusterfs/gsyncd) But here it looks like their is one more entry in authorized_keys without the command. Please paste the content of /root/.ssh/authorized_keys in slave if possible(If no security issues) else run the following commands in slave and paste the output. `grep "AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ==" /root/.ssh/authorized_keys` and `grep "AAAAB3NzaC1yc2EAAAABIwAAAQEAoQB28c+u32eqxOFrxGT2iuHT/C8Z+K/GbX3ewF+bJDQCVMEdzmu1K+YXr4miK2CfMgrprmjz5I2a1M7+jdgOJmWDnZIPY3ZsjGgwH/zJKPzQ3koy5jtHQtzFQIGMQ9d6jz85uGQpFxH9jQ908ksKYMtoI4Glofwxi9rCHDBkfI2d9YLZUjm0DMCdWHw11QPRmTVk/HGIG396mwD99h+fwXYgrx8BAwhM0kFk6IUCzoL2To3k4RhmyVjKWDHVHwHIPsV+Z8b7WU1bOUwKXd3yxq45XrauVmc56JsoogIdlSfgSJs9FXO5OiDCraz212gYEYX8sbZZD6cMzAzA0+DGcQ==" /root/.ssh/authorized_keys`
[root@rhs3ea-node3 ~]# grep "AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ==" /root/.ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ== root@rhs3ea-node1 command="/usr/libexec/glusterfs/gsyncd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ== root@rhs3ea-node1 [root@rhs3ea-node3 ~]# [root@rhs3ea-node3 ~]# grep "AAAAB3NzaC1yc2EAAAABIwAAAQEAoQB28c+u32eqxOFrxGT2iuHT/C8Z+K/GbX3ewF+bJDQCVMEdzmu1K+YXr4miK2CfMgrprmjz5I2a1M7+jdgOJmWDnZIPY3ZsjGgwH/zJKPzQ3koy5jtHQtzFQIGMQ9d6jz85uGQpFxH9jQ908ksKYMtoI4Glofwxi9rCHDBkfI2d9YLZUjm0DMCdWHw11QPRmTVk/HGIG396mwD99h+fwXYgrx8BAwhM0kFk6IUCzoL2To3k4RhmyVjKWDHVHwHIPsV+Z8b7WU1bOUwKXd3yxq45XrauVmc56JsoogIdlSfgSJs9FXO5OiDCraz212gYEYX8sbZZD6cMzAzA0+DGcQ==" /root/.ssh/authorized_keys command="/usr/libexec/glusterfs/gsyncd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAoQB28c+u32eqxOFrxGT2iuHT/C8Z+K/GbX3ewF+bJDQCVMEdzmu1K+YXr4miK2CfMgrprmjz5I2a1M7+jdgOJmWDnZIPY3ZsjGgwH/zJKPzQ3koy5jtHQtzFQIGMQ9d6jz85uGQpFxH9jQ908ksKYMtoI4Glofwxi9rCHDBkfI2d9YLZUjm0DMCdWHw11QPRmTVk/HGIG396mwD99h+fwXYgrx8BAwhM0kFk6IUCzoL2To3k4RhmyVjKWDHVHwHIPsV+Z8b7WU1bOUwKXd3yxq45XrauVmc56JsoogIdlSfgSJs9FXO5OiDCraz212gYEYX8sbZZD6cMzAzA0+DGcQ== root@rhs3ea-node2 [root@rhs3ea-node3 ~]# Also, the entire contents of the /root/.ssh/authorized_keys file in the slave node are: [root@rhs3ea-node3 ~]# cat /root/.ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ== root@rhs3ea-node1 command="/usr/libexec/glusterfs/gsyncd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ== root@rhs3ea-node1 command="tar ${SSH_ORIGINAL_COMMAND#* }" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAujdaeH52IU+SQLOpnTbz61hqSAbhKhlTYtmWnNPgnBJqjdj2vPg9CM7zmYnKZdL4vlt9tNdT4ZR9sN+Cc9vcoWYVCUldbCzEsbFAlBCcYbqxqR/iqOq5g+plz9VGX+3hiZDqQZjjKPZODcW4/iMh/mDULVbIDFnmPn2o1rPg5+XP+wTnXx7ne+3AILZcv0P5PBbpqQiodl6/dhQnH/g+MK5dQcdqP9gKBKaH/rRzfA42FsZbrFW4RjhHV5Hz0nLXrXJzXp29RPFq/VTC3chQrdgvz5tRA4GNPFshOQYSBnw7hQJUhsMf2kK6yb6xSeS+1YwE+eAYRx+b+gEq683yNQ== root@rhs3ea-node1 command="/usr/libexec/glusterfs/gsyncd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAoQB28c+u32eqxOFrxGT2iuHT/C8Z+K/GbX3ewF+bJDQCVMEdzmu1K+YXr4miK2CfMgrprmjz5I2a1M7+jdgOJmWDnZIPY3ZsjGgwH/zJKPzQ3koy5jtHQtzFQIGMQ9d6jz85uGQpFxH9jQ908ksKYMtoI4Glofwxi9rCHDBkfI2d9YLZUjm0DMCdWHw11QPRmTVk/HGIG396mwD99h+fwXYgrx8BAwhM0kFk6IUCzoL2To3k4RhmyVjKWDHVHwHIPsV+Z8b7WU1bOUwKXd3yxq45XrauVmc56JsoogIdlSfgSJs9FXO5OiDCraz212gYEYX8sbZZD6cMzAzA0+DGcQ== root@rhs3ea-node2 command="tar ${SSH_ORIGINAL_COMMAND#* }" ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA7V9dJ2TLSka8dAUpkFb8UTU2wdUJA+Vs4qwVIptNWLv1GTW/F+9foZBSqQCzaTwcvqk5U2LzvjJIRffz6f+g7TUQVZg+PRwRg6o1N6pupfMB2q5wcUrbrg0wlu8lcvWMquzLMvKgzqCUqJ/woj7JlThPN/FcY3qf489OLInkSpio5uQ7cq1ASYUPYpIOBl2AMt5oW7Fn/9/SwRWfrkEZBfKeMiEADMznIaU2EwPj7f7PKuQgEEI6nnOPFBou1h5jYxxUijaon5qaWeQtkPmaAlXOsj7jDArnMgCf6UGIKxos/b+ayF8dYSPkQyACqvAMFHnZi/itEdFSc3FdPKanuQ== root@rhs3ea-node2 [root@rhs3ea-node3 ~]#
I confirm this as setup issue, looks like ssh key is added manually using the secret.pem file. `ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem root@rhs3ea-node3` Delete the first line from authorized_keys and everything will work fine. ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqxqMiZ8dyXUQq0pLVOYpRSsC+aYFn6pbPQZ3LtRPKGYfA63SNoYnifhnM2UR9fnZz3hisBUxIzcVrVux2y3ojI/vPFFi08tVtK8/rglJf5F83YS16a9yoqqh3HqlBnotY50H1/1qeco+71U9hy276fUONP64KoOZtme3MwYuoNz4z1NvCQFcEbXtPfHO5A9P3C+NuMhgNK8N63RSCzZ6dtO+wZygbVJlbPNQxp8Yc9Gs7eG4Lgb9fqsZNcBjmI5E8rbIuzRy6bD/0nmEKc/nqvEYTYgkckES0Xy92JVxbcwCOnZFNi4rT6+HarDIuFRB835I5ss+QBrT9SM09qmFuQ== root@rhs3ea-node1 Description: Two authorized keys entry present for node1 in slave authorized_keys file. First one without command= and second entry with command= in it. While establishing ssh connection ssh uses the first in the entry and uses it. Hence it allows any command to run, which is security issue. That is why /nonexistent/gsyncd is used in ssh command. When authorized_keys contains only one entry with command=/usr/libexec/glusterfs/gsyncd then only that command is enabled to run when ssh connection is established through geo-rep. Which is safe since no other commands can be executed in slave other than gsyncd.
thanks for the info and workaround, but, why is the command doing that? I haven't added it manually so, this should be the expected behaviour.
Hi, I've re-done all the configuration in my three new nodes from the scratch using your command, instead of appending keys, now ssh password less only works in one direction: [root@rhss1 ~]# ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem Generating public/private rsa key pair. /var/lib/glusterd/geo-replication/secret.pem already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/glusterd/geo-replication/secret.pem. Your public key has been saved in /var/lib/glusterd/geo-replication/secret.pem.pub. The key fingerprint is: 5c:a9:d7:0f:86:b7:c5:90:f1:49:b2:7a:73:d5:db:21 root.rh.hpintelco.org The key's randomart image is: +--[ RSA 2048]----+ | o . | | . B ..| | o +Eo.o| | . o + o..+| | S + B +..| | . + B | | . . | | | | | +-----------------+ [root@rhss1 ~]# cat /var/lib/glusterd/geo-replication/secret.pem.pub > ~root/.ssh/authorized_keys [root@rhss1 ~]# cat !$ cat ~root/.ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAmvpMwDEv6c3DD6dKt5IkTIOch5FSZGiwzlOL/YouSnlmljBoEs2jnt6q/LhDLlSqUf19JSF3yZ/mI4QbZZsifhvCejBKt7HJI5EeoiTcBceyOVQ5resaoWMX1p00TZrMdI+9pDyQ1IvirupJ0NeRULf1G7p6iBksmE1WytQc22/gmi8wD5M91Mxa4H59f5sBxepvDzcOxDSy5mGawn1BmW30n0hivEgnpWNndjeSby6qj7mGTv8H5LpSekU9T4uEBPUkPAu6doe9bFegudRKrEvbRV5sIbWSs5l0ek4svv2XG26zrTrUsMfpRovWBJJvd6rBHkVl+8y/vLOSeAT5GQ== root.rh.hpintelco.org [root@rhss1 ~]# cp /var/lib/glusterd/geo-replication/secret.pem .ssh/id_rsa cp: overwrite `.ssh/id_rsa'? y [root@rhss1 ~]# ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem root@rhss3 root@rhss3's password: Now try logging into the machine, with "ssh 'root@rhss3'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [root@rhss1 ~]# ssh root@rhss3 root@rhss3's password: [root@rhss1 ~]# ssh rhss3 root@rhss3's password: [root@rhss2 ~]# ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem Generating public/private rsa key pair. /var/lib/glusterd/geo-replication/secret.pem already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/glusterd/geo-replication/secret.pem. Your public key has been saved in /var/lib/glusterd/geo-replication/secret.pem.pub. The key fingerprint is: 83:a9:45:94:52:82:9e:9e:5f:07:82:42:21:1d:a6:45 root.rh.hpintelco.org The key's randomart image is: +--[ RSA 2048]----+ |.+Eo..o. | |.=o .o. | |o. o .. | |. + ...o | | o . .+.S | | o o. .. | | ... . | | . | | | +-----------------+ [root@rhss2 ~]# cat /var/lib/glusterd/geo-replication/secret.pem.pub > ~root/.ssh/authorized_keys [root@rhss2 ~]# cp /var/lib/glusterd/geo-replication/secret.pem .ssh/id_rsa cp: overwrite `.ssh/id_rsa'? y [root@rhss2 ~]# ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem root@rhss3 root@rhss3's password: Now try logging into the machine, with "ssh 'root@rhss3'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [root@rhss2 ~]# ssh root@rhss3 root@rhss3's password: However, from node 3 to the other two, it does work: [root@rhss3 ~]# ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem Generating public/private rsa key pair. /var/lib/glusterd/geo-replication/secret.pem already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/glusterd/geo-replication/secret.pem. Your public key has been saved in /var/lib/glusterd/geo-replication/secret.pem.pub. The key fingerprint is: 71:44:b6:2a:f3:f4:ed:ea:6f:f9:af:a9:55:4d:8c:d2 root@rhss3 The key's randomart image is: +--[ RSA 2048]----+ | .+ | | o . . o | | . o . E o| | + . ..| | o S o| | = . . . | | . . ... | | .o. . | | .o++o+o.| +-----------------+ [root@rhss3 ~]# cat /var/lib/glusterd/geo-replication/secret.pem.pub > ~root/.ssh/authorized_keys [root@rhss3 ~]# cp /var/lib/glusterd/geo-replication/secret.pem .ssh/id_rsa cp: overwrite `.ssh/id_rsa'? y [root@rhss3 ~]# ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem root@rhss1 root@rhss1's password: Now try logging into the machine, with "ssh 'root@rhss1'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [root@rhss3 ~]# ssh-copy-id -i /var/lib/glusterd/geo-replication/secret.pem root@rhss2 root@rhss2's password: Now try logging into the machine, with "ssh 'root@rhss2'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [root@rhss3 ~]# root@rhss2 -bash: root@rhss2: command not found [root@rhss3 ~]# ssh root@rhss2 Last login: Fri Aug 8 16:08:04 2014 from rhss3 [root@rhss2 ~]# logout Connection to rhss2 closed. [root@rhss3 ~]# ssh root@rhss1 Last login: Fri Aug 8 16:08:08 2014 from rhss3 [root@rhss1 ~]# logout Connection to rhss1 closed. This makes geo replication not to work [root@rhss1 ~]# gluster volume geo-replication vol_geo rhss3::vol_geo create push-pem force Passwordless ssh login has not been setup with rhss3 for user root. geo-replication command failed I've checked on the permissions on the .ssh dir and files, they are all the same and ok (700 and 640) and nothing still works. My guess is that bug is still standing, could you please advice?
i fixed the issue Notice on non-working ssh clients that log says: debug1: Next authentication method: publickey debug1: Trying private key: /root/.ssh/identity debug1: Offering public key: /root/.ssh/id_rsa <<<--- Meanwhile working machine: debug1: Next authentication method: publickey debug1: Trying private key: /root/.ssh/identity debug1: Trying private key: /root/.ssh/id_rsa <<<--- I noticed also that on non-working machines there are a id_rsa.pub file created since installation date (and not present in 3rd node, the working one) containing a pre-created pub key: -rw------- 1 root root 1220 Aug 8 17:02 authorized_keys -rw------- 1 root root 1675 Aug 8 16:52 id_rsa -rw-r--r-- 1 root root 414 Aug 3 23:26 id_rsa.pub <<<<--- -rw-r--r-- 1 root root 800 Aug 8 17:02 known_hosts -rw-r--r-- 1 root root 1181 Aug 8 12:22 known_hosts.kk Since this key was directly offered instead of just private one, that was not a match on authorized_keys (secure log): Aug 8 18:10:18 rhss3 sshd[2784]: debug1: trying public key file /root/.ssh/authorized_keys Aug 8 18:10:18 rhss3 sshd[2784]: debug1: fd 4 clearing O_NONBLOCK Aug 8 18:10:18 rhss3 sshd[2784]: debug3: secure_filename: checking '/root/.ssh' Aug 8 18:10:18 rhss3 sshd[2784]: debug3: secure_filename: checking '/root' Aug 8 18:10:18 rhss3 sshd[2784]: debug3: secure_filename: terminating check at '/root' Aug 8 18:10:18 rhss3 sshd[2784]: debug2: key not found <<<--- Now, once removed id_rsa.pub, it works smoothly Surprisingly this behaviour is found in nodes 1 and 2 and not on node 3, when all have been installed with the same ISO and in the same manner, no updates have been performed and no additional software has been installed othe than gtop (nodes 1 and 2).
In addition to above, and after using ssh-copy-id as you suggested, issue still stands: root@rhss2 ~]# gluster volume geo-replication vol_geo rhss3::vol_geo status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rhss2.rhev.rh.hpintelco.org vol_geo /brick/b-geo rhss3::vol_geo faulty N/A N/A N/A N/A N/A N/A N/A rhss1.rhev.rh.hpintelco.org vol_geo /brick/b-geo rhss3::vol_geo faulty N/A N/A N/A N/A N/A N/A N/A But now, even with the workaround I found, nothing changes faulty state.
Based on the comments, this looks like setup issue. Please clean the setup and try again. Closing this bug since this issue was not seen in RHGS 3.0.4 and 3.1 regression runs. Please reopen this bug if this issue found again.