Bug 1388150 - geo-replica slave node goes faulty for non-root user session due to fail to locate gluster binary
Summary: geo-replica slave node goes faulty for non-root user session due to fail to l...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.9
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1382241 1383898 1386123 1399088
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-24 15:26 UTC by Aravinda VK
Modified: 2016-12-06 06:00 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.9.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1386123
Environment:
Last Closed: 2016-12-06 06:00:36 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2016-10-24 15:26:59 UTC
+++ This bug was initially created as a clone of Bug #1386123 +++

+++ This bug was initially created as a clone of Bug #1383898 +++

+++ This bug was initially created as a clone of Bug #1382241 +++

Description of problem:
The slave nodes goes to faulty state because of popen command failed with error "execution of "gluster" failed with ENOENT (No such file or directory)" when geo-replication session started for non-root user.


How reproducible:
Frequently for non-root user

Steps to Reproduce:
1. Setup master cluster 
2. Setup slave cluster
3. Create volume on master and slave cluster
4. Create geo-replication session between master and slave volume for non-root user
5. Start geo-replication session
# gluster volume geo-replication geovol geouser@slave-node1::geovol start


Actual results:
The master node, geo-replication logs showing below error:
[2016-10-06 02:21:42.558072] E [resource(/brick/brick_georepl_01):226:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-2jmykC/57bcd6e6cf5884cd7845fa826e0cf3b5.sock geouser@dell10-pd-gluster-node2 /nonexistent/gsyncd --session-owner e2a48078-ba0d-4f22-9fbb-482c07654c09 -N --listen --timeout 120 gluster://localhost:geovol" returned with 1, saying:
[2016-10-06 02:21:42.558177] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> Warning: Permanently added 'dell10-pd-gluster-node2,10.74.130.162' (ECDSA) to the list of known hosts.^M
[2016-10-06 02:21:42.558247] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.305069] I [cli.c:721:main] 0-cli: Started running /usr/sbin/gluster with version 3.7.9
[2016-10-06 02:21:42.558303] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.305097] I [cli.c:608:cli_rpc_init] 0-cli: Connecting to remote glusterd at localhost
[2016-10-06 02:21:42.558374] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.376366] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-10-06 02:21:42.558440] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.376450] I [socket.c:2472:socket_event_handler] 0-transport: disconnecting now
[2016-10-06 02:21:42.558495] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.376966] I [cli-rpc-ops.c:6514:gf_cli_getwd_cbk] 0-cli: Received resp to getwd
[2016-10-06 02:21:42.558548] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.376998] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-10-06 02:21:42.558599] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.440936] I [gsyncd(slave):710:main_i] <top>: syncing: gluster://localhost:geovol
[2016-10-06 02:21:42.558651] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.445895] E [syncdutils(slave):247:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory)
[2016-10-06 02:21:42.558701] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> failure: execution of "gluster" failed with ENOENT (No such file or directory)
[2016-10-06 02:21:42.558753] E [resource(/brick/brick_georepl_01):230:logerr] Popen: ssh> [2016-10-06 02:21:29.446126] I [syncdutils(slave):220:finalize] <top>: exiting.

Expected results:
All slave nodes should be in Active/Passive state



--- Additional comment from Prashant Dhange on 2016-10-12 01:33:29 EDT ---

Adding /usr/sbin path to PATH environment variable in non-root user's .bashrc file on all slave nodes resolves the issue

1. edit /home/geouser/.bashrc
2. Add below lines to /home/geouser/.bashrc
PATH=/usr/sbin:$PATH
export PATH

--- Additional comment from Aravinda VK on 2016-10-13 06:27:33 EDT ---

Debugged this issue, If the Geo-replication config file is not readable by slave gsyncd, then gconf.gluster_command_dir will be substituted as default empty value.

gluster_bin_path = gluster_command_dir + "gluster"

If gluster_command_dir is empty, then gluster_bin_path will be set to "gluster" instead of "/usr/sbin/gluster".

.bashrc step is not required if /var/lib/glusterd/geo-replication/gsyncd-template.conf is readable by Slave User.

--- Additional comment from Atin Mukherjee on 2016-10-13 12:47:05 EDT ---

(In reply to Aravinda VK from comment #3)
> Debugged this issue, If the Geo-replication config file is not readable by
> slave gsyncd, then gconf.gluster_command_dir will be substituted as default
> empty value.
> 
> gluster_bin_path = gluster_command_dir + "gluster"
> 
> If gluster_command_dir is empty, then gluster_bin_path will be set to
> "gluster" instead of "/usr/sbin/gluster".
> 
> .bashrc step is not required if
> /var/lib/glusterd/geo-replication/gsyncd-template.conf is readable by Slave
> User.

If it is expected that a slave user should have read permission which apparently was not the case here is this a valid bug? If no, can this BZ be closed?

--- Additional comment from Prashant Dhange on 2016-10-14 02:57:04 EDT ---

Considering the fact that for the non-root user due to gsyncd_template.conf permission issue, the gluster_command_dir value could not be read.

Can it be possible to set the default value for gluster_command_dir to '/usr/sbin' ? If there is no harm in making this value as a default.
Are there any consequences if we do so?

I am suggesting this change based on default installation path for gluster binaries.

--- Additional comment from Aravinda VK on 2016-10-17 08:19:40 EDT ---

(In reply to Prashant Dhange from comment #5)
> Considering the fact that for the non-root user due to gsyncd_template.conf
> permission issue, the gluster_command_dir value could not be read.
> 
> Can it be possible to set the default value for gluster_command_dir to
> '/usr/sbin' ? If there is no harm in making this value as a default.
> Are there any consequences if we do so?
> 
> I am suggesting this change based on default installation path for gluster
> binaries.

Good suggestion for the default values in code instead of blank. But we don't know what other problems exists with default values of variables which can't be read from default config. Proper fix should be done by raising error when template.conf is not readable instead of silently continuing without reading config values and substituting other values.

--- Additional comment from Aravinda VK on 2016-10-18 04:42:39 EDT ---

Posted Patch to Master http://review.gluster.org/15669

--- Additional comment from Worker Ant on 2016-10-18 08:07:57 EDT ---

REVIEW: http://review.gluster.org/15669 (geo-rep: Assert error if gsyncd conf file is not readable) posted (#2) for review on master by Aravinda VK (avishwan)

--- Additional comment from Worker Ant on 2016-10-19 02:03:58 EDT ---

REVIEW: http://review.gluster.org/15669 (geo-rep: Assert error if gsyncd conf file is not readable) posted (#3) for review on master by Aravinda VK (avishwan)

--- Additional comment from Worker Ant on 2016-10-19 03:12:16 EDT ---

REVIEW: http://review.gluster.org/15669 (geo-rep: Assert error if gsyncd conf file is not readable) posted (#4) for review on master by Aravinda VK (avishwan)

--- Additional comment from Worker Ant on 2016-10-21 06:40:39 EDT ---

REVIEW: http://review.gluster.org/15669 (geo-rep: Upgrade conf file only if it is session config) posted (#5) for review on master by Aravinda VK (avishwan)

--- Additional comment from Aravinda VK on 2016-10-21 06:42:52 EDT ---

This patch in upstream already asserts if any issue while opening config file.
http://review.gluster.org/14777

Modified Patch http://review.gluster.org/15669 to ignore upgrading config file if it is not session config.

--- Additional comment from Worker Ant on 2016-10-24 11:26:29 EDT ---

COMMIT: http://review.gluster.org/15669 committed in master by Aravinda VK (avishwan) 
------
commit 1506c7a98d8d3b31e68d0f214ab331f28ffa9fb5
Author: Aravinda VK <avishwan>
Date:   Tue Oct 18 13:34:57 2016 +0530

    geo-rep: Upgrade conf file only if it is session config
    
    Ignore config upgrade if it is template config file present in
    /var/lib/glusterd/geo-replication/gsyncd_template.conf
    
    BUG: 1386123
    Change-Id: I2cbba3103b6801c16ff57f778a90b9a0bb2467cf
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/15669
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 1 Worker Ant 2016-10-24 15:28:07 UTC
REVIEW: http://review.gluster.org/15715 (geo-rep: Upgrade conf file only if it is session config) posted (#1) for review on release-3.9 by Aravinda VK (avishwan)

Comment 2 Worker Ant 2016-10-25 07:22:21 UTC
COMMIT: http://review.gluster.org/15715 committed in release-3.9 by Aravinda VK (avishwan) 
------
commit 4a8a5e77cd9323fd2a4413ddc89575e146634e8e
Author: Aravinda VK <avishwan>
Date:   Tue Oct 18 13:34:57 2016 +0530

    geo-rep: Upgrade conf file only if it is session config
    
    Ignore config upgrade if it is template config file present in
    /var/lib/glusterd/geo-replication/gsyncd_template.conf
    
    > Reviewed-on: http://review.gluster.org/15669
    > Smoke: Gluster Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Kotresh HR <khiremat>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    
    BUG: 1388150
    Change-Id: I2cbba3103b6801c16ff57f778a90b9a0bb2467cf
    Signed-off-by: Aravinda VK <avishwan>
    (cherry picked from commit 1506c7a98d8d3b31e68d0f214ab331f28ffa9fb5)
    Reviewed-on: http://review.gluster.org/15715
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>

Comment 3 Aravinda VK 2016-10-27 05:23:08 UTC
glusterfs-3.9.0rc2 is released[1] and packages are available for different distributions[2] to test.

[1] http://www.gluster.org/pipermail/maintainers/2016-October/001601.html
[2] http://www.gluster.org/pipermail/maintainers/2016-October/001605.html and http://www.gluster.org/pipermail/maintainers/2016-October/001606.html

Comment 4 Aravinda VK 2016-12-06 06:00:36 UTC
Gluster 3.9 GA is released http://blog.gluster.org/2016/11/announcing-gluster-3-9/


Note You need to log in before you can comment on or make changes to this bug.