This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 990330 - geo-replication fails for longer fqdn's
geo-replication fails for longer fqdn's
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: geo-replication (Show other bugs)
3.3.2
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Harshavardhana
:
Depends On:
Blocks: 990331
  Show dependency treegraph
 
Reported: 2013-07-30 19:53 EDT by Harshavardhana
Modified: 2015-03-22 21:04 EDT (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 990331 (view as bug list)
Environment:
Last Closed: 2014-04-17 07:44:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Harshavardhana 2013-07-30 19:53:40 EDT
Description of problem:

Geo-replication fails with long fqdn's - work around is to use IP instead - but this case should be handled properly - or documented? 

[2013-07-30 19:21:42.168776] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus
terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-%r@%h:%p root@hp-dl380pgen8-05.osas.lab.eng.rdu2.r
edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with
 127, saying:
[2013-07-30 19:21:42.168994] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-root@hp-dl380pgen8-05.osas.l
ab.eng.rdu2.redhat.com:22.x9XCkMnhGWYQjcg7" too long for Unix domain socket
[2013-07-30 19:21:42.169449] I [syncdutils:142:finalize] <top>: exiting.
[2013-07-30 19:21:52.181953] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------
[2013-07-30 19:21:52.182389] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker
[2013-07-30 19:21:52.277359] I [gsyncd:354:main_i] <top>: syncing: gluster://localhost:iso -> ssh://root@hp-dl380pgen8-05.osas.lab.eng.rdu2.redh
at.com:/mnt/geoslave
[2013-07-30 19:21:52.528315] E [syncdutils:173:log_raise_exception] <top>: connection to peer is broken
[2013-07-30 19:21:52.529310] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus
terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-%r@%h:%p root@hp-dl380pgen8-05.osas.lab.eng.rdu2.r
edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with
 127, saying:
[2013-07-30 19:21:52.529525] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root@hp-dl380pgen8-05.osas.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" too long for Unix domain socket
[2013-07-30 19:21:52.530002] I [syncdutils:142:finalize] <top>: exiting.


Version-Release number of selected component (if applicable):
3.3.2

How reproducible:
Always with socket path

$ echo "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root@hp-dl380pgen8-05.osas.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" | wc -c
109

The actual limit is -

/usr/include/linux/un.h:#define UNIX_PATH_MAX   108

Find a 'hostname' with fqdn with 46 characters and you should be able to see this issue. 

Expected results:
Handle this issue and document it.
Comment 2 Anand Avati 2013-08-02 03:49:31 EDT
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#1) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 3 Anand Avati 2013-08-02 03:54:26 EDT
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#2) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 4 Anand Avati 2013-08-02 04:47:26 EDT
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#3) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 5 Anand Avati 2013-08-08 23:09:54 EDT
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#4) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 6 Anand Avati 2013-08-09 21:04:33 EDT
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#5) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 7 Anand Avati 2013-08-13 20:49:38 EDT
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#6) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 8 Anand Avati 2013-08-21 19:36:13 EDT
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#1) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 9 Anand Avati 2013-08-22 14:49:11 EDT
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#2) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 10 Anand Avati 2013-08-22 15:39:43 EDT
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#3) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 11 Anand Avati 2013-08-27 19:19:23 EDT
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#4) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 12 Anand Avati 2013-08-28 10:10:55 EDT
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#5) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 13 Anand Avati 2013-09-03 05:32:49 EDT
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#6) for review on master by Harshavardhana (harsha@harshavardhana.net)
Comment 14 Anand Avati 2013-09-04 15:29:46 EDT
COMMIT: http://review.gluster.org/5681 committed in master by Anand Avati (avati@redhat.com) 
------
commit fa095c24979db2d0a3a6413aa431fe7256be5206
Author: Harshavardhana <harsha@harshavardhana.net>
Date:   Wed Aug 21 16:28:41 2013 -0700

    geo-replication: Use a md5 based unique control path
    
    A hostname fqdn can be of length 255 according to RFC1123
    ------------------------->
    /usr/include/bits/posix1_lim.h:#define _POSIX_HOST_NAME_MAX  255
    <-------------------------
    On linux this length is 64
    ------------------------->
    /usr/include/bits/local_lim.h:#define HOST_NAME_MAX 64
    <-------------------------
    
    When a given hostname is > 45 (characters) - SSH fails with
    
    -------------------------->
    "ControlPath too long for Unix domain socket".
    <--------------------------
    
    Indicating that the total length of ControlPath which is
    on linux should be 108
    
    ------------------------->
    /usr/include/linux/un.h:#define UNIX_PATH_MAX   108
    <-------------------------
    
    This leads to "faulty" geo-replication status.
    
    This patch brings in a new file called manifest which carries
    given a geo-rep session some unique information - with which
    a unique `md5` is generated in a 32length digest, this ensures
    that we don't exceed UNIX_PATH_MAX limitations instead we use
    a conservative approach and still be able to provide a unique
    socket path.
    
    Change-Id: I3a6a27d605d751a86e7c82eace4561d9b0134fe1
    BUG: 990330
    Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
    Reviewed-on: http://review.gluster.org/5681
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Csaba Henk <csaba@redhat.com>
Comment 15 Niels de Vos 2014-04-17 07:44:38 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.