Bug 1260918

Summary: [BACKUP]: If more than 1 node in cluster are not added in known_host, glusterfind create command hungs
Product: [Community] GlusterFS Reporter: Aravinda VK <avishwan>
Component: glusterfindAssignee: Aravinda VK <avishwan>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: avishwan, khiremat, rhinduja, rhs-bugs, sanandpa
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1260119
: 1284735 (view as bug list) Environment:
Last Closed: 2016-06-16 13:35:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1260119    
Bug Blocks: 1284735    

Description Aravinda VK 2015-09-08 08:44:10 UTC
+++ This bug was initially created as a clone of Bug #1260119 +++

Description of problem:
======================

If more than 1 node from cluster do not have entry in the known_host of a node which is creating glusterfind session, the create hungs forever.

[root@georep1 scripts]# glusterfind create s1 master
The authenticity of host '10.70.46.97 (10.70.46.97)' can't be established.
ECDSA key fingerprint is 76:e4:6d:07:1e:82:26:1c:0a:95:b2:4c:a3:3f:f1:e2.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.154 (10.70.46.154)' can't be established.
ECDSA key fingerprint is b4:a8:00:41:ec:f8:12:a9:89:88:cb:7a:20:a8:83:3c.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.97 (10.70.46.97)' can't be established.
ECDSA key fingerprint is 76:e4:6d:07:1e:82:26:1c:0a:95:b2:4c:a3:3f:f1:e2.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.154 (10.70.46.154)' can't be established.
ECDSA key fingerprint is b4:a8:00:41:ec:f8:12:a9:89:88:cb:7a:20:a8:83:3c.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.97 (10.70.46.97)' can't be established.
ECDSA key fingerprint is 76:e4:6d:07:1e:82:26:1c:0a:95:b2:4c:a3:3f:f1:e2.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.93 (10.70.46.93)' can't be established.
ECDSA key fingerprint is 0d:bc:e3:70:e0:86:65:5e:3e:d2:ea:9c:fb:a9:53:66.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.93 (10.70.46.93)' can't be established.
ECDSA key fingerprint is 0d:bc:e3:70:e0:86:65:5e:3e:d2:ea:9c:fb:a9:53:66.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.93 (10.70.46.93)' can't be established.
ECDSA key fingerprint is 0d:bc:e3:70:e0:86:65:5e:3e:d2:ea:9c:fb:a9:53:66.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.154 (10.70.46.154)' can't be established.
ECDSA key fingerprint is b4:a8:00:41:ec:f8:12:a9:89:88:cb:7a:20:a8:83:3c.
Are you sure you want to continue connecting (yes/no)? yes


[root@georep1 scripts]# cat /root/.ssh/known_hosts


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-14.el7rhgs.x86_64

How reproducible:
=================

Always

Steps to Reproduce:
===================
1. Flush the known_hosts from node or remove the cluster host entries
2. Create glusterfind session


Actual results:
===============
glusterfind session creation hungs

Expected results:
================

Should create the session


Workaround:
===========

SSH to all the nodes in cluster to have known_hosts updated.

[root@georep1 scripts]# cat /root/.ssh/known_hosts
[root@georep1 scripts]# for i in {97,93,154}; do ssh root.46.$i; doneThe authenticity of host '10.70.46.97 (10.70.46.97)' can't be established.
ECDSA key fingerprint is 76:e4:6d:07:1e:82:26:1c:0a:95:b2:4c:a3:3f:f1:e2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.70.46.97' (ECDSA) to the list of known hosts.
root.46.97's password: 
Last login: Fri Sep  4 12:38:50 2015 from 10.70.6.115
[root@georep2 ~]# exit
logout
Connection to 10.70.46.97 closed.
The authenticity of host '10.70.46.93 (10.70.46.93)' can't be established.
ECDSA key fingerprint is 0d:bc:e3:70:e0:86:65:5e:3e:d2:ea:9c:fb:a9:53:66.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.70.46.93' (ECDSA) to the list of known hosts.
root.46.93's password: 
Last login: Fri Sep  4 12:38:50 2015 from 10.70.6.115
[root@georep3 ~]# exit
logout
Connection to 10.70.46.93 closed.
The authenticity of host '10.70.46.154 (10.70.46.154)' can't be established.
ECDSA key fingerprint is b4:a8:00:41:ec:f8:12:a9:89:88:cb:7a:20:a8:83:3c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.70.46.154' (ECDSA) to the list of known hosts.
root.46.154's password: 
Last login: Fri Sep  4 12:38:50 2015 from 10.70.6.115
[root@georep4 ~]# exit
logout
Connection to 10.70.46.154 closed.
[root@georep1 scripts]# cat /root/.ssh/known_hosts
10.70.46.97 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNDWD/hxpscM20kGEWOTsiIzgmnBd78d2uyQRI7AGIX2JRRr0hIoZPOGCrW/ytRpluPEnJVr7s+vAYglVYLZlOo=
10.70.46.93 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBGCE5KvbJtmgXmQXfVVjUVjG0bjkP7fb0v7owFJnzAxy5FKjtTDQSF+qVAHA17MBh9Br7KP+SZQOxSmHyY9Tq8s=
10.70.46.154 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNMULJn47vZ1Azq/SCi4i5VBSrLQAqs6sMZTSamzpwkhedtHrNhKe5QW7W5l+mirLJTIrLuqy8HQYSp5jDYyfrk=
[root@georep1 scripts]# glusterfind create s1 master
Session s1 created with volume master
[root@georep1 scripts]#

--- Additional comment from Rahul Hinduja on 2015-09-07 02:28:54 EDT ---

glusterfind pre also hungs for local host checking as: 


[root@georep1 scripts]# glusterfind pre --output-prefix '/mnt/glusterfs/' s1 master /root/log2
10.70.46.97 - pre failed: /rhs/brick2/b6 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.97 - pre failed: /rhs/brick3/b10 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.97 - pre failed: /rhs/brick1/b2 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.93 - pre failed: /rhs/brick1/b3 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.93 - pre failed: /rhs/brick2/b7 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.93 - pre failed: /rhs/brick3/b11 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.154 - pre failed: /rhs/brick2/b8 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.154 - pre failed: /rhs/brick3/b12 Historical Changelogs not available: [Errno 2] No such file or directory

10.70.46.154 - pre failed: /rhs/brick1/b4 Historical Changelogs not available: [Errno 2] No such file or directory

The authenticity of host '10.70.46.96 (10.70.46.96)' can't be established.
ECDSA key fingerprint is 44:23:1a:4b:3c:78:63:a6:66:3d:18:01:8d:dd:17:74.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.96 (10.70.46.96)' can't be established.
ECDSA key fingerprint is 44:23:1a:4b:3c:78:63:a6:66:3d:18:01:8d:dd:17:74.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.70.46.96 (10.70.46.96)' can't be established.
ECDSA key fingerprint is 44:23:1a:4b:3c:78:63:a6:66:3d:18:01:8d:dd:17:74.
Are you sure you want to continue connecting (yes/no)? yes

--- Additional comment from Aravinda VK on 2015-09-07 06:52:20 EDT ---

RCA:

While connecting to other nodes programatically, Geo-rep uses an additional option with ssh(-oStrictHostKeyChecking=no). We need to use the option with Glusterfind too.

Other issue is about asking yes/no prompt for localhost, which is during scp command. We need to use the same option as used in ssh. Other fix is required in not running scp command if local node.

Workaround:
Add all the hosts in peer including local node to known_hosts.

Comment 1 Vijay Bellur 2015-09-08 08:46:19 UTC
REVIEW: http://review.gluster.org/12124 (tools/glusterfind: StrictHostKeyChecking=no for ssh/scp verification) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Vijay Bellur 2015-11-19 05:08:11 UTC
REVIEW: http://review.gluster.org/12124 (tools/glusterfind: StrictHostKeyChecking=no for ssh/scp verification) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 3 Vijay Bellur 2015-11-21 14:19:28 UTC
REVIEW: http://review.gluster.org/12124 (tools/glusterfind: StrictHostKeyChecking=no for ssh/scp verification) posted (#3) for review on master by Aravinda VK (avishwan)

Comment 4 Vijay Bellur 2015-11-23 05:02:46 UTC
REVIEW: http://review.gluster.org/12124 (tools/glusterfind: StrictHostKeyChecking=no for ssh/scp verification) posted (#4) for review on master by Aravinda VK (avishwan)

Comment 5 Vijay Bellur 2015-11-23 17:29:27 UTC
COMMIT: http://review.gluster.org/12124 committed in master by Vijay Bellur (vbellur) 
------
commit d47323d0e6f543a8ece04c32b8d77d2785390c3c
Author: Aravinda VK <avishwan>
Date:   Mon Sep 7 14:18:45 2015 +0530

    tools/glusterfind: StrictHostKeyChecking=no for ssh/scp verification
    
    Also do not use scp command in case copy file from local
    node.
    
    Change-Id: Ie78c77eb0252945867173937391b82001f29c3b0
    Signed-off-by: Aravinda VK <avishwan>
    BUG: 1260918
    Reviewed-on: http://review.gluster.org/12124
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 6 Niels de Vos 2016-06-16 13:35:52 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user