Description of problem: Geo-replication fails with long fqdn's - work around is to use IP instead - but this case should be handled properly - or documented? [2013-07-30 19:21:42.168776] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-%r@%h:%p root.lab.eng.rdu2.r edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with 127, saying: [2013-07-30 19:21:42.168994] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-root.l ab.eng.rdu2.redhat.com:22.x9XCkMnhGWYQjcg7" too long for Unix domain socket [2013-07-30 19:21:42.169449] I [syncdutils:142:finalize] <top>: exiting. [2013-07-30 19:21:52.181953] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------ [2013-07-30 19:21:52.182389] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker [2013-07-30 19:21:52.277359] I [gsyncd:354:main_i] <top>: syncing: gluster://localhost:iso -> ssh://root.lab.eng.rdu2.redh at.com:/mnt/geoslave [2013-07-30 19:21:52.528315] E [syncdutils:173:log_raise_exception] <top>: connection to peer is broken [2013-07-30 19:21:52.529310] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-%r@%h:%p root.lab.eng.rdu2.r edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with 127, saying: [2013-07-30 19:21:52.529525] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" too long for Unix domain socket [2013-07-30 19:21:52.530002] I [syncdutils:142:finalize] <top>: exiting. Version-Release number of selected component (if applicable): 3.3.2 How reproducible: Always with socket path $ echo "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" | wc -c 109 The actual limit is - /usr/include/linux/un.h:#define UNIX_PATH_MAX 108 Find a 'hostname' with fqdn with 46 characters and you should be able to see this issue. Expected results: Handle this issue and document it.
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#1) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#2) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#3) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#4) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#5) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#6) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#1) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#2) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#3) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#4) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#5) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#6) for review on master by Harshavardhana (harsha)
COMMIT: http://review.gluster.org/5681 committed in master by Anand Avati (avati) ------ commit fa095c24979db2d0a3a6413aa431fe7256be5206 Author: Harshavardhana <harsha> Date: Wed Aug 21 16:28:41 2013 -0700 geo-replication: Use a md5 based unique control path A hostname fqdn can be of length 255 according to RFC1123 -------------------------> /usr/include/bits/posix1_lim.h:#define _POSIX_HOST_NAME_MAX 255 <------------------------- On linux this length is 64 -------------------------> /usr/include/bits/local_lim.h:#define HOST_NAME_MAX 64 <------------------------- When a given hostname is > 45 (characters) - SSH fails with --------------------------> "ControlPath too long for Unix domain socket". <-------------------------- Indicating that the total length of ControlPath which is on linux should be 108 -------------------------> /usr/include/linux/un.h:#define UNIX_PATH_MAX 108 <------------------------- This leads to "faulty" geo-replication status. This patch brings in a new file called manifest which carries given a geo-rep session some unique information - with which a unique `md5` is generated in a 32length digest, this ensures that we don't exceed UNIX_PATH_MAX limitations instead we use a conservative approach and still be able to provide a unique socket path. Change-Id: I3a6a27d605d751a86e7c82eace4561d9b0134fe1 BUG: 990330 Signed-off-by: Harshavardhana <harsha> Reviewed-on: http://review.gluster.org/5681 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Csaba Henk <csaba>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user