Bug 990330 - geo-replication fails for longer fqdn's
Summary: geo-replication fails for longer fqdn's
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.3.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Harshavardhana
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 990331
TreeView+ depends on / blocked
 
Reported: 2013-07-30 23:53 UTC by Harshavardhana
Modified: 2015-03-23 01:04 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 990331 (view as bug list)
Environment:
Last Closed: 2014-04-17 11:44:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Harshavardhana 2013-07-30 23:53:40 UTC
Description of problem:

Geo-replication fails with long fqdn's - work around is to use IP instead - but this case should be handled properly - or documented? 

[2013-07-30 19:21:42.168776] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus
terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-%r@%h:%p root.lab.eng.rdu2.r
edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with
 127, saying:
[2013-07-30 19:21:42.168994] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-root.l
ab.eng.rdu2.redhat.com:22.x9XCkMnhGWYQjcg7" too long for Unix domain socket
[2013-07-30 19:21:42.169449] I [syncdutils:142:finalize] <top>: exiting.
[2013-07-30 19:21:52.181953] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------
[2013-07-30 19:21:52.182389] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker
[2013-07-30 19:21:52.277359] I [gsyncd:354:main_i] <top>: syncing: gluster://localhost:iso -> ssh://root.lab.eng.rdu2.redh
at.com:/mnt/geoslave
[2013-07-30 19:21:52.528315] E [syncdutils:173:log_raise_exception] <top>: connection to peer is broken
[2013-07-30 19:21:52.529310] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus
terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-%r@%h:%p root.lab.eng.rdu2.r
edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with
 127, saying:
[2013-07-30 19:21:52.529525] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" too long for Unix domain socket
[2013-07-30 19:21:52.530002] I [syncdutils:142:finalize] <top>: exiting.


Version-Release number of selected component (if applicable):
3.3.2

How reproducible:
Always with socket path

$ echo "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" | wc -c
109

The actual limit is -

/usr/include/linux/un.h:#define UNIX_PATH_MAX   108

Find a 'hostname' with fqdn with 46 characters and you should be able to see this issue. 

Expected results:
Handle this issue and document it.

Comment 2 Anand Avati 2013-08-02 07:49:31 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#1) for review on master by Harshavardhana (harsha)

Comment 3 Anand Avati 2013-08-02 07:54:26 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#2) for review on master by Harshavardhana (harsha)

Comment 4 Anand Avati 2013-08-02 08:47:26 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#3) for review on master by Harshavardhana (harsha)

Comment 5 Anand Avati 2013-08-09 03:09:54 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#4) for review on master by Harshavardhana (harsha)

Comment 6 Anand Avati 2013-08-10 01:04:33 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#5) for review on master by Harshavardhana (harsha)

Comment 7 Anand Avati 2013-08-14 00:49:38 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#6) for review on master by Harshavardhana (harsha)

Comment 8 Anand Avati 2013-08-21 23:36:13 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#1) for review on master by Harshavardhana (harsha)

Comment 9 Anand Avati 2013-08-22 18:49:11 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#2) for review on master by Harshavardhana (harsha)

Comment 10 Anand Avati 2013-08-22 19:39:43 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#3) for review on master by Harshavardhana (harsha)

Comment 11 Anand Avati 2013-08-27 23:19:23 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#4) for review on master by Harshavardhana (harsha)

Comment 12 Anand Avati 2013-08-28 14:10:55 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#5) for review on master by Harshavardhana (harsha)

Comment 13 Anand Avati 2013-09-03 09:32:49 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#6) for review on master by Harshavardhana (harsha)

Comment 14 Anand Avati 2013-09-04 19:29:46 UTC
COMMIT: http://review.gluster.org/5681 committed in master by Anand Avati (avati) 
------
commit fa095c24979db2d0a3a6413aa431fe7256be5206
Author: Harshavardhana <harsha>
Date:   Wed Aug 21 16:28:41 2013 -0700

    geo-replication: Use a md5 based unique control path
    
    A hostname fqdn can be of length 255 according to RFC1123
    ------------------------->
    /usr/include/bits/posix1_lim.h:#define _POSIX_HOST_NAME_MAX  255
    <-------------------------
    On linux this length is 64
    ------------------------->
    /usr/include/bits/local_lim.h:#define HOST_NAME_MAX 64
    <-------------------------
    
    When a given hostname is > 45 (characters) - SSH fails with
    
    -------------------------->
    "ControlPath too long for Unix domain socket".
    <--------------------------
    
    Indicating that the total length of ControlPath which is
    on linux should be 108
    
    ------------------------->
    /usr/include/linux/un.h:#define UNIX_PATH_MAX   108
    <-------------------------
    
    This leads to "faulty" geo-replication status.
    
    This patch brings in a new file called manifest which carries
    given a geo-rep session some unique information - with which
    a unique `md5` is generated in a 32length digest, this ensures
    that we don't exceed UNIX_PATH_MAX limitations instead we use
    a conservative approach and still be able to provide a unique
    socket path.
    
    Change-Id: I3a6a27d605d751a86e7c82eace4561d9b0134fe1
    BUG: 990330
    Signed-off-by: Harshavardhana <harsha>
    Reviewed-on: http://review.gluster.org/5681
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Csaba Henk <csaba>

Comment 15 Niels de Vos 2014-04-17 11:44:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.