Bug 1245036

Summary:	glusterd fails to peer probe if one of the node is behind the NAT.
Product:	[Community] GlusterFS	Reporter:	Humble Chirammal <hchiramm>
Component:	glusterd	Assignee:	bugs <bugs>
Status:	CLOSED EOL	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.7.1	CC:	amukherj, bugs
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-08 10:51:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Humble Chirammal 2015-07-21 06:17:58 UTC

Description of problem:

Currently glusterd fails to establish successful 'peer probe' if one of the node which is participating in peer probe is behind the NAT. For ex: containers running in multiple hosts fails when it peer probe to form a trusted pool. 

Test setup is configured with Atomic Hosts and 'flannel' for overlay networking 



                       Test Setup:

Container-1 IP : 10.50.72.2  ( running on Worker-1 where Worker-1 is atomic host1)
Container-2 IP : 10.50.97.2  ( running on Worker-2 where Worker-2 is atomic host2)


PING  from Container-1 to Container-2 works
SSH   from Container-1 to Container-2 works.


The gluster pool list says:

 Container-1:
--------------------------------------------------------------------------------------
-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:32:61:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.97.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:6102/64 scope link
       valid_lft forever preferred_lft forever

-bash-4.3# gluster pool list
UUID                                    Hostname        State
3c6bf65d-6a58-46ad-90d4-4e2d9b4dc80e    10.50.72.2      Connected
175daada-0ca4-4e18-b72b-460c9da19f96    localhost       Connected


As you can see above, in Container -1 it says both gluster nodes are connected and the peer probe is successful. However in Container-2, the remote node is in "disconnected" status.



 Container-2:
--------------------------------------------------------------------------------------

-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:32:48:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.72.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:4802/64 scope link
       valid_lft forever preferred_lft forever

-bash-4.3# gluster pool list
UUID                                    Hostname        State
175daada-0ca4-4e18-b72b-460c9da19f96    10.50.97.0      Disconnected
3c6bf65d-6a58-46ad-90d4-4e2d9b4dc80e    localhost       Connected



The below netstat output shows the "flannel" GW IP as the source IP in reverse connection. which cause the glusterd to fail   




-bash-4.3# netstat -ntp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
tcp        0      0 10.50.72.2:45442        202.255.47.226:80       TIME_WAIT   -                  
tcp        1      1 10.50.72.2:41806        140.138.144.170:80      LAST_ACK    -                  
tcp        0      0 10.50.72.2:22           10.50.72.1:55834        ESTABLISHED 146/sshd: root@pts/
tcp        0      0 10.50.72.2:58350        192.26.91.193:80        TIME_WAIT   -    
             
tcp        0      0 10.50.72.2:24007        10.50.97.0:1022         ESTABLISHED 35/glusterd           ---> flannel GW IP

tcp        1      1 10.50.72.2:49727        123.255.202.74:80       LAST_ACK    -                  

tcp        0      0 10.50.72.2:22           10.50.97.0:51955        ESTABLISHED 330/sshd: root@pts/  --> flannel GW IP

tcp        1      1 10.50.72.2:49723        123.255.202.74:80       LAST_ACK    -                  
tcp        1      1 10.50.72.2:49734        123.255.202.74:80       LAST_ACK    -                  
tcp        0      0 10.50.72.2:44396        103.22.220.133:80       TIME_WAIT   -                  
tcp        0      0 10.50.72.2:37028        212.138.64.22:80        TIME_WAIT   -                  
tcp        0      1 10.50.72.2:58308        137.189.4.14:80         LAST_ACK    -    

As an additional info, the telnet from containers to 24007 works in both direction.

-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:32:61:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.97.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:6102/64 scope link 
       valid_lft forever preferred_lft forever


 -bash-4.3# telnet 10.50.72.2 24007
Trying 10.50.72.2...
Connected to 10.50.72.2.
Escape character is '^]'.
^]
telnet> Connection closed.



-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:32:48:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.72.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:4802/64 scope link 
       valid_lft forever preferred_lft forever
-bash-4.3# telnet 10.50.97.2 24007
Trying 10.50.97.2...
Connected to 10.50.97.2.
Escape character is '^]'.
^]
telnet> Connection closed.




Version-Release number of selected component (if applicable):

GlusterFS 3.7.2

How reproducible:

Always

Steps to Reproduce:


Same as above.

Comment 1 Kaushal 2017-03-08 10:51:41 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.