Description of problem: Currently glusterd fails to establish successful 'peer probe' if one of the node which is participating in peer probe is behind the NAT. For ex: containers running in multiple hosts fails when it peer probe to form a trusted pool. Test setup is configured with Atomic Hosts and 'flannel' for overlay networking Test Setup: Container-1 IP : 10.50.72.2 ( running on Worker-1 where Worker-1 is atomic host1) Container-2 IP : 10.50.97.2 ( running on Worker-2 where Worker-2 is atomic host2) PING from Container-1 to Container-2 works SSH from Container-1 to Container-2 works. The gluster pool list says: Container-1: -------------------------------------------------------------------------------------- -bash-4.3# ip a s eth0 5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:32:61:02 brd ff:ff:ff:ff:ff:ff inet 10.50.97.2/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::42:aff:fe32:6102/64 scope link valid_lft forever preferred_lft forever -bash-4.3# gluster pool list UUID Hostname State 3c6bf65d-6a58-46ad-90d4-4e2d9b4dc80e 10.50.72.2 Connected 175daada-0ca4-4e18-b72b-460c9da19f96 localhost Connected As you can see above, in Container -1 it says both gluster nodes are connected and the peer probe is successful. However in Container-2, the remote node is in "disconnected" status. Container-2: -------------------------------------------------------------------------------------- -bash-4.3# ip a s eth0 5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:32:48:02 brd ff:ff:ff:ff:ff:ff inet 10.50.72.2/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::42:aff:fe32:4802/64 scope link valid_lft forever preferred_lft forever -bash-4.3# gluster pool list UUID Hostname State 175daada-0ca4-4e18-b72b-460c9da19f96 10.50.97.0 Disconnected 3c6bf65d-6a58-46ad-90d4-4e2d9b4dc80e localhost Connected The below netstat output shows the "flannel" GW IP as the source IP in reverse connection. which cause the glusterd to fail -bash-4.3# netstat -ntp Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.50.72.2:45442 202.255.47.226:80 TIME_WAIT - tcp 1 1 10.50.72.2:41806 140.138.144.170:80 LAST_ACK - tcp 0 0 10.50.72.2:22 10.50.72.1:55834 ESTABLISHED 146/sshd: root@pts/ tcp 0 0 10.50.72.2:58350 192.26.91.193:80 TIME_WAIT - tcp 0 0 10.50.72.2:24007 10.50.97.0:1022 ESTABLISHED 35/glusterd ---> flannel GW IP tcp 1 1 10.50.72.2:49727 123.255.202.74:80 LAST_ACK - tcp 0 0 10.50.72.2:22 10.50.97.0:51955 ESTABLISHED 330/sshd: root@pts/ --> flannel GW IP tcp 1 1 10.50.72.2:49723 123.255.202.74:80 LAST_ACK - tcp 1 1 10.50.72.2:49734 123.255.202.74:80 LAST_ACK - tcp 0 0 10.50.72.2:44396 103.22.220.133:80 TIME_WAIT - tcp 0 0 10.50.72.2:37028 212.138.64.22:80 TIME_WAIT - tcp 0 1 10.50.72.2:58308 137.189.4.14:80 LAST_ACK - As an additional info, the telnet from containers to 24007 works in both direction. -bash-4.3# ip a s eth0 5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:32:61:02 brd ff:ff:ff:ff:ff:ff inet 10.50.97.2/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::42:aff:fe32:6102/64 scope link valid_lft forever preferred_lft forever -bash-4.3# telnet 10.50.72.2 24007 Trying 10.50.72.2... Connected to 10.50.72.2. Escape character is '^]'. ^] telnet> Connection closed. -bash-4.3# ip a s eth0 5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:32:48:02 brd ff:ff:ff:ff:ff:ff inet 10.50.72.2/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::42:aff:fe32:4802/64 scope link valid_lft forever preferred_lft forever -bash-4.3# telnet 10.50.97.2 24007 Trying 10.50.97.2... Connected to 10.50.97.2. Escape character is '^]'. ^] telnet> Connection closed. Version-Release number of selected component (if applicable): GlusterFS 3.7.2 How reproducible: Always Steps to Reproduce: Same as above.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.