Bug 765418 (GLUSTER-3686)

Summary: geo-replication fails with mesg "connection to peer is broken"
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: geo-replicationAssignee: Csaba Henk <csaba>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.3-betaCC: gluster-bugs, rahulcs, vijay, vshankar
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
hotfix for the issue none

Description Csaba Henk 2011-10-03 10:03:33 UTC
Created attachment 682

Comment 1 Csaba Henk 2011-10-03 10:08:30 UTC
There are some problems with logging to /dev/stderr by slave-side cli on Centos 5.2 (kernel 2.6.18-238.el5).

Debugging with strace, we can see:

 open("/dev/stderr", O_WRONLY|O_CREAT|O_APPEND, 0666) = -1 ENXIO (No such device or address)

See the following test program:

# echo '#!/bin/sh
echo foo > "$1"' > /tmp/test.sh
# chmod a+x /tmp/test.sh
# /tmp/test.sh /dev/stderr
foo
# ssh localhost /tmp/test.sh /dev/stderr
/tmp/test.sh: line 2: /dev/stderr: No such device or address

Apparenty opening /dev/stderr writably fails with ENXIO in this system if stderr is a socket. I don't know, is it a bug or a feature?

It can be worked around with the attached hotfix, which sends those logs to /dev/null instead.

Comment 2 Lakshmipathi G 2011-10-03 11:29:17 UTC
starting geo-replication with glusterfs-3.3qa13 fails with following error message.(on the same setup ,if i install glfs-3.2.3 -it works)

#gluster volume geo-replication pythonchk root.11.140:/pychk2 start


# cat /usr/local/var/log/glusterfs/geo-replication/pythonchk/ssh%3A%2F%2Froot%4010.1.11.140%3Afile%3A%2F%2F%2Fpychk2.log
[2011-10-03 04:23:12.913885] I [monitor(monitor):22:set_state] Monitor: new state: starting...
[2011-10-03 04:23:12.918194] I [monitor(monitor):63:monitor] Monitor: ------------------------------------------------------------
[2011-10-03 04:23:12.918314] I [monitor(monitor):64:monitor] Monitor: starting gsyncd worker
[2011-10-03 04:23:12.964476] I [gsyncd:352:main_i] <top>: syncing: gluster://localhost:pythonchk -> ssh://root.11.140:/pychk2
[2011-10-03 04:23:13.182882] E [syncdutils:171:log_raise_exception] <top>: connection to peer is broken
[2011-10-03 04:23:13.183144] E [resource:166:errfail] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /etc/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-POjX7l/gsycnd-ssh-%r@%h:%p root.11.140 /usr/local/libexec/glusterfs/gsyncd --session-owner b4efd5e8-c72a-478b-88cb-0dad5298aeaf -N --listen --timeout 120 file:///pychk2" returned with 1, saying:
[2011-10-03 04:23:13.183264] E [resource:170:errfail] Popen: ssh> Warning: Identity file /etc/glusterd/geo-replication/secret.pem not accessible: No such file or directory.
[2011-10-03 04:23:13.183352] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile "/dev/stderr" (No such device or address)
[2011-10-03 04:23:13.183438] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile /dev/stderr
[2011-10-03 04:23:13.183521] E [resource:170:errfail] Popen: ssh> gsyncd initializaion failed
[2011-10-03 04:23:13.183667] I [syncdutils:140:finalize] <top>: exiting.
[2011-10-03 04:23:14.185211] I [monitor(monitor):22:set_state] Monitor: new state: faulty
[2011-10-03 04:23:24.188642] I [monitor(monitor):63:monitor] Monitor: ------------------------------------------------------------
[2011-10-03 04:23:24.188827] I [monitor(monitor):64:monitor] Monitor: starting gsyncd worker
[2011-10-03 04:23:24.235424] I [gsyncd:352:main_i] <top>: syncing: gluster://localhost:pythonchk -> ssh://root.11.140:/pychk2
[2011-10-03 04:23:24.388060] E [syncdutils:171:log_raise_exception] <top>: connection to peer is broken
[2011-10-03 04:23:24.388242] E [resource:166:errfail] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /etc/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-ISX8fc/gsycnd-ssh-%r@%h:%p root.11.140 /usr/local/libexec/glusterfs/gsyncd --session-owner b4efd5e8-c72a-478b-88cb-0dad5298aeaf -N --listen --timeout 120 file:///pychk2" returned with 1, saying:
[2011-10-03 04:23:24.388345] E [resource:170:errfail] Popen: ssh> Warning: Identity file /etc/glusterd/geo-replication/secret.pem not accessible: No such file or directory.
[2011-10-03 04:23:24.388432] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile "/dev/stderr" (No such device or address)
[2011-10-03 04:23:24.388550] E [resource:170:errfail] Popen: ssh> ERROR: failed to open logfile /dev/stderr
[2011-10-03 04:23:24.388641] E [resource:170:errfail] Popen: ssh> gsyncd initializaion failed
[2011-10-03 04:23:24.388806] I [syncdutils:140:finalize] <top>: exiting.

Comment 3 Anand Avati 2011-11-20 12:36:05 UTC
CHANGE: http://review.gluster.com/560 (This works around broken /dev/stderr on some systems.) merged in master by Vijay Bellur (vijay)