Bug 765418 (GLUSTER-3686) - geo-replication fails with mesg "connection to peer is broken"
Summary: geo-replication fails with mesg "connection to peer is broken"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3686
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.3-beta
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Csaba Henk
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-03 11:29 UTC by Lakshmipathi G
Modified: 2011-12-20 18:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTNR
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
hotfix for the issue (1.94 KB, patch)
2011-10-03 10:03 UTC, Csaba Henk
no flags Details | Diff

Description Csaba Henk 2011-10-03 10:03:33 UTC
Created attachment 682

Comment 1 Csaba Henk 2011-10-03 10:08:30 UTC
There are some problems with logging to /dev/stderr by slave-side cli on Centos 5.2 (kernel 2.6.18-238.el5).

Debugging with strace, we can see:

 open("/dev/stderr", O_WRONLY|O_CREAT|O_APPEND, 0666) = -1 ENXIO (No such device or address)

See the following test program:

# echo '#!/bin/sh
echo foo > "$1"' > /tmp/test.sh
# chmod a+x /tmp/test.sh
# /tmp/test.sh /dev/stderr
foo
# ssh localhost /tmp/test.sh /dev/stderr
/tmp/test.sh: line 2: /dev/stderr: No such device or address

Apparenty opening /dev/stderr writably fails with ENXIO in this system if stderr is a socket. I don't know, is it a bug or a feature?

It can be worked around with the attached hotfix, which sends those logs to /dev/null instead.

Comment 2 Lakshmipathi G 2011-10-03 11:29:17 UTC
starting geo-replication with glusterfs-3.3qa13 fails with following error message.(on the same setup ,if i install glfs-3.2.3 -it works)

#gluster volume geo-replication pythonchk root.11.140:/pychk2 start


# cat /usr/local/var/log/glusterfs/geo-replication/pythonchk/ssh%3A%2F%2Froot%4010.1.11.140%3Afile%3A%2F%2F%2Fpychk2.log
[2011-10-03 04:23:12.913885] I [monitor(monitor):22:set_state] Monitor: new state: starting...
[2011-10-03 04:23:12.918194] I [monitor(monitor):63:monitor] Monitor: ------------------------------------------------------------
[2011-10-03 04:23:12.918314] I [monitor(monitor):64:monitor] Monitor: starting gsyncd worker
[2011-10-03 04:23:12.964476] I [gsyncd:352:main_i] <top>: syncing: gluster://localhost:pythonchk -> ssh://root.11.140:/pychk2
[2011-10-03 04:23:13.182882] E [syncdutils:171:log_raise_exception] <top>: connection to peer is broken
[2011-10-03 04:23:13.183144] E [resource:166:errfail] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /etc/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-POjX7l/gsycnd-ssh-%r@%h:%p root.11.140 /usr/local/libexec/glusterfs/gsyncd --session-owner b4efd5e8-c72a-478b-88cb-0dad5298aeaf -N --listen --timeout 120 file:///pychk2" returned with 1, saying:
[2011-10-03 04:23:13.183264] E [resource:170:errfail] Popen: ssh> Warning: Identity file /etc/glusterd/geo-replication/secret.pem not accessible: No such file or directory.
[2011-10-03 04:23:13.183352] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile "/dev/stderr" (No such device or address)
[2011-10-03 04:23:13.183438] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile /dev/stderr
[2011-10-03 04:23:13.183521] E [resource:170:errfail] Popen: ssh> gsyncd initializaion failed
[2011-10-03 04:23:13.183667] I [syncdutils:140:finalize] <top>: exiting.
[2011-10-03 04:23:14.185211] I [monitor(monitor):22:set_state] Monitor: new state: faulty
[2011-10-03 04:23:24.188642] I [monitor(monitor):63:monitor] Monitor: ------------------------------------------------------------
[2011-10-03 04:23:24.188827] I [monitor(monitor):64:monitor] Monitor: starting gsyncd worker
[2011-10-03 04:23:24.235424] I [gsyncd:352:main_i] <top>: syncing: gluster://localhost:pythonchk -> ssh://root.11.140:/pychk2
[2011-10-03 04:23:24.388060] E [syncdutils:171:log_raise_exception] <top>: connection to peer is broken
[2011-10-03 04:23:24.388242] E [resource:166:errfail] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /etc/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-ISX8fc/gsycnd-ssh-%r@%h:%p root.11.140 /usr/local/libexec/glusterfs/gsyncd --session-owner b4efd5e8-c72a-478b-88cb-0dad5298aeaf -N --listen --timeout 120 file:///pychk2" returned with 1, saying:
[2011-10-03 04:23:24.388345] E [resource:170:errfail] Popen: ssh> Warning: Identity file /etc/glusterd/geo-replication/secret.pem not accessible: No such file or directory.
[2011-10-03 04:23:24.388432] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile "/dev/stderr" (No such device or address)
[2011-10-03 04:23:24.388550] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile /dev/stderr
[2011-10-03 04:23:24.388641] E [resource:170:errfail] Popen: ssh> gsyncd initializaion failed
[2011-10-03 04:23:24.388806] I [syncdutils:140:finalize] <top>: exiting.

Comment 3 Anand Avati 2011-11-20 12:36:05 UTC
CHANGE: http://review.gluster.com/560 (This works around broken /dev/stderr on some systems.) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.