Bug 895656

Summary: geo-replication problem (debian) [resource:194:logerr] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory
Product: [Community] GlusterFS Reporter: xarlos
Component: geo-replicationAssignee: Csaba Henk <csaba>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: glusterbugs, gluster-bugs, jdarcy, kkeithle
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=764679
https://bugzilla.redhat.com/show_bug.cgi?id=764623
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 13:44:57 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 895528    

Description xarlos 2013-01-15 13:05:12 EST
Description of problem:
Gluster looks for the gsyncd binary in the wrong place to where the debian package puts it. It is different from fedora/centos/rh etc. 


Version-Release number of selected component (if applicable):
Debian .deb packages for 3.1.1
- from gluster.org
  &
- from experimental debian repo (before it switched to 3.4)

How reproducible:
Seemingly unable not to. The file is in the wrong place. 

Steps to Reproduce:
1. Set up a geo-replication server -> client connection
gluster volume geo-replication VOLNAME IPADDRESS:MOUNTPOINT start
2. (copy ssh keys)
ssh-copy-id CLIENT
3. Check the /var/log/glusterfs/geo-replication/database-archive/<aplicable> log
  
Actual results:
The status of the geo-replication is: faulty. 
[2013-01-15 17:31:16.708810] E [resource:194:logerr] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory  
[2013-01-15 17:31:16.709078] I [syncdutils:142:finalize] <top>: exiting.                                                               

Expected results:
Should start to replicate properly, and work :-)
gluster should be looking for this file: "/usr/lib/glusterfs/glusterfs/gsyncd" on debian and not "/usr/local/libexec/glusterfs/gsyncd"

Additional info:
Comment 1 Louis Zuckerman 2013-01-15 14:45:51 EST
I believe this can be solved by having the debian/ubuntu packages create a symlink.  I will try this out and post another update here when I have more information.
Comment 2 xarlos 2013-01-16 04:41:31 EST
I created a symlink on the master and the slave and this has rectified the problem. 

My geo-replication now has a status of OK, but logs show a momentary FAIL (though this may not be something to worry about, and would occur under normal circumstances? ) 

[2013-01-16 09:32:19.169287] I [master:284:crawl] GMaster: new master is 5c08bddb-769e-40bb-9077-2590348b0c21                          
[2013-01-16 09:32:19.191009] I [master:288:crawl] GMaster: primary master with volume id 5c08bddb-769e-40bb-9077-2590348b0c21 ...      
[2013-01-16 09:32:46.625880] E [syncdutils:184:log_raise_exception] <top>: FAIL:                                                       
Traceback (most recent call last):                                                                                                     
  File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 210, in twrap                                              
    tf(*aa)                                                                                                                            
  File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line 123, in tailer                                               
    poe, _ ,_ = select([po.stderr for po in errstore], [], [], 1)                                                                      
  File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 270, in select                                             
    return eintr_wrap(oselect.select, oselect.error, *a)                                                                               
  File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 263, in eintr_wrap                                         
    return func(*a)                                                                                                                    
error: (9, 'Bad file descriptor')                                                                                                      
[2013-01-16 09:32:46.643009] I [syncdutils:142:finalize] <top>: exiting.                                                               
[2013-01-16 09:32:57.62796] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------      
[2013-01-16 09:32:57.63226] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker                                            
[2013-01-16 09:32:57.103140] I [gsyncd:354:main_i] <top>: syncing: gluster://localhost:database-archive -> ssh://192.168.111.30:/mnt/da
tabase-archive                                                                                                                         
[2013-01-16 09:33:02.296112] I [master:284:crawl] GMaster: new master is 5c08bddb-769e-40bb-9077-2590348b0c21                          
[2013-01-16 09:33:02.296350] I [master:288:crawl] GMaster: primary master with volume id 5c08bddb-769e-40bb-9077-2590348b0c21 ...      
[2013-01-16 09:33:57.331610] I [monitor(monitor):21:set_state] Monitor: new state: OK                                                  
[2013-01-16 09:34:01.699632] I [master:272:crawl] GMaster: completed 59 crawls, 0 turns                                                
[2013-01-16 09:35:02.91568] I [master:272:crawl] GMaster: completed 60 crawls, 0 turns                                                 
[2013-01-16 09:36:02.481292] I [master:272:crawl] GMaster: completed 60 crawls, 0 turns
Comment 3 Louis Zuckerman 2013-01-16 14:13:19 EST
Thanks for verifying the solution.  I'll work on updating the packages to create this symlink & follow up once more info is available.
Comment 4 Vijay Bellur 2013-01-18 16:18:08 EST
CHANGE: http://review.gluster.org/4392 (glusterd: replace obsolete /usr/local reference for remote ssh/gsyncd) merged in master by Anand Avati (avati@redhat.com)
Comment 5 Vijay Bellur 2013-03-14 03:23:39 EDT
CHANGE: http://review.gluster.org/4602 (geo-rep: retire old style ssh setup) merged in master by Anand Avati (avati@redhat.com)
Comment 6 Anand Avati 2013-04-27 06:19:12 EDT
REVIEW: http://review.gluster.org/4891 (glusterd: replace obsolete /usr/local reference for remote ssh/gsyncd) posted (#1) for review on release-3.3 by Csaba Henk (csaba@redhat.com)
Comment 7 Anand Avati 2013-04-27 06:19:33 EDT
REVIEW: http://review.gluster.org/4892 (geo-rep: retire old style ssh setup) posted (#1) for review on release-3.3 by Csaba Henk (csaba@redhat.com)
Comment 8 Anand Avati 2013-04-27 12:37:07 EDT
COMMIT: http://review.gluster.org/4891 committed in release-3.3 by Vijay Bellur (vbellur@redhat.com) 
------
commit 6113bfe8c9f5528f54b66d31a7bd0f1803ffe092
Author: Csaba Henk <csaba@redhat.com>
Date:   Thu Jan 17 13:54:36 2013 -0500

    glusterd: replace obsolete /usr/local reference for remote ssh/gsyncd
    
    See https://bugzilla.redhat.com/show_bug.cgi?id=895656
        https://bugzilla.redhat.com/show_bug.cgi?id=764679 (GLUSTER-2947)
        https://bugzilla.redhat.com/show_bug.cgi?id=764623 (GLUSTER-2891)
    
    The comments in the bzs are a bit obtuse and/or vague. As near as I
    can make out we had, for a while, a "convenience symlink" to or from
    /usr/local/libexec/gsyncd, which no longer exists.
    
    And, lacking any comments in the code, I gather this is some sort of
    fallback or failsafe logic: if the first, normal attempt to invoke gsyncd
    fails then an attempt is made to ssh to the box and invoke it.
    
    In any event, there's nothing in /usr/local/... so it's unquestionably
    wrong to try to invoke anything there.
    
    [Backporting Kaleb's patch]
    
    BUG: 895656
    Change-Id: I3b7ac7a049b91ce101b930599294830147cc60ad
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
    Signed-off-by: Csaba Henk <csaba@redhat.com>
    Reviewed-on: http://review.gluster.org/4891
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Comment 9 Anand Avati 2013-04-27 12:37:38 EDT
COMMIT: http://review.gluster.org/4892 committed in release-3.3 by Vijay Bellur (vbellur@redhat.com) 
------
commit 3490689f29342116cf78cc57bc90ad1979d9e1bb
Author: Csaba Henk <csaba@redhat.com>
Date:   Thu Feb 28 04:18:41 2013 +0100

    geo-rep: retire old style ssh setup
    
    Users are still using geo-rep with the old, deprecated, insecure, unsupported
    ssh setup. Not their fault -- the implementation of the new method had the
    following charasteristics:
    - old method is possible, but with default settings it's not working
    - it can be made operational by fiddling with "remote-gsyncd" tunable
    - with default setting, an unhelpful, actually misleading error message is
      produced
    - the UI gave no hint to the changes in the ssh setup
    
    http://review.gluster.org/4392 tried to fix these; what it accomplished was
    unrestricted support to the bad practice (by making the default old setup
    operational).
    
    From this on:
    - we disable the old method by reserving the "remote-gsyncd" tunable
    - if the old method is attempted, give a hint what to do
    
    Change-Id: Icade94725d8d8d2d4c89cab992d4226351637b86
    BUG: 895656
    Signed-off-by: Csaba Henk <csaba@redhat.com>
    Reviewed-on: http://review.gluster.org/4892
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>