Bug 1663583 - Geo-replication fails to open logfile "/var/log/glusterfs/cli.log" on slave.
Summary: Geo-replication fails to open logfile "/var/log/glusterfs/cli.log" on slave.
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 5
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Sunny Kumar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-04 23:33 UTC by vnosov
Modified: 2019-06-18 09:02 UTC (History)
4 users (show)

Fixed In Version: glusterfs-7.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-18 09:02:28 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description vnosov 2019-01-04 23:33:54 UTC
Description of problem: When non root account on slave system is used by SSH to setup geo-replication, the initialization fails on slave to open logfile /var/log/glusterfs/cli.log. As result geo-replication is not active and has status "Created". 


Version-Release number of selected component (if applicable):
GlusterFS 5.2, manually installed from GlusterFS source code.

How reproducible: 100%

In our tests:
Master system IP address: 10.10.60.182.
Slave system DNS name: vn-sc-2.westus2.cloudapp.azure.com
Master volume: master-volume-0007
Slave volume:  slave-volume-0001

Steps to Reproduce:

1. Setup nonprivileged account on slave system. In our case account "nasgorep" was used. Test SSH from master to slave.

12592 01/04/2019 14:27:35.101560269 1546640855 command: /usr/bin/ssh nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd
12592 01/04/2019 14:27:35.839285578 1546640855 status=0 /usr/bin/ssh nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd
12640 01/04/2019 14:27:37.847050433 1546640857 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd 2>&1
12640 01/04/2019 14:27:38.596877738 1546640858 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd 2>&1


2. Setup geo-replication:

12592 01/04/2019 14:27:35.101560269 1546640855 command: /usr/bin/ssh nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd
12592 01/04/2019 14:27:35.839285578 1546640855 status=0 /usr/bin/ssh nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd
12640 01/04/2019 14:27:37.847050433 1546640857 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd 2>&1
12640 01/04/2019 14:27:38.596877738 1546640858 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep@vn-sc-2.westus2.cloudapp.azure.com /bin/pwd 2>&1


[2019-01-04 22:27:52.449174]  : system:: execute gsec_create : SUCCESS
[2019-01-04 22:28:32.288278]  : volume geo-replication master-volume-0007 nasgorep@vn-sc-2.westus2.cloudapp.azure.com::slave-volume-0001 create push-pem : SUCCESS
[2019-01-04 22:28:39.447194]  : volume geo-replication master-volume-0007 nasgorep@vn-sc-2.westus2.cloudapp.azure.com::slave-volume-0001 start : SUCCESS


Actual results: 

Check status of the geo-replication, it is in "Created" state:

[root@SC-10-10-63-182 log]# /usr/sbin/gluster volume geo-replication master-volume-0007 nasgorep@vn-sc-2.westus2.cloudapp.azure.com::slave-olume-0001 status detail

MASTER NODE     MASTER VOL            MASTER BRICK                                       SLAVE USER    SLAVE                                                             SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED    ENTRY    DATA    META    FAILURES    CHECKPOINT TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.10.60.182    master-volume-0007    /exports/master-segment-0015/master-volume-0007    nasgorep      nasgorep@vn-sc-2.westus2.cloudapp.azure.com::slave-volume-0001    N/A           Created    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     N/A


Expected results: state of the geo-replication has to be "Active".


Additional info:

On master system: geo-replication logfile "gsyncd.log" has next error messages:

[root@SC-10-10-63-182 log]# vi /var/log/glusterfs/geo-replication/master-volume-0005_10.10.60.183_slave-volume-0001/gsyncd.log:


……….

[2019-01-04 22:28:40.276280] E [syncdutils(monitor):809:errlog] Popen: command returned error   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 nasgorep@vn-sc-2.westus2.cloudapp.azure.com /usr/sbin/gluster --xml --remote-host=localhost volume info slave-volume-0001  error=255
[2019-01-04 22:28:40.277328] E [syncdutils(monitor):813:logerr] Popen: ssh> ERROR: failed to create logfile "/var/log/glusterfs/cli.log" (Permission denied)
[2019-01-04 22:28:40.277438] E [syncdutils(monitor):813:logerr] Popen: ssh> ERROR: failed to open logfile /var/log/glusterfs/cli.log
…….
 
On slave: file "cli.log" has next attributes:

[root@VN-SC-2 log]# ls -l /var/log/glusterfs/cli.log
-rw------- 1 root root 28552 Jan  4 14:35 /var/log/glusterfs/cli.log

It seems slave system does not let user "nasgorep" open or create logfile 
/var/log/glusterfs/cli.log" when "gluster" call is invoked from master system through SSH.

On slave: command history log:

[root@VN-SC-2 log]# cat /var/log/glusterfs/cmd_history.log
[2019-01-04 22:22:49.893261]  : volume status : SUCCESS
[2019-01-04 22:22:59.929183]  : volume geo-replication status : SUCCESS
[2019-01-04 22:24:25.105256]  : volume create slave-volume-0001 transport tcp VN-SC-2.cxn2n2xii2hepedpq4siv2akfe.xx.internal.cloudapp.net:/exports/nas-segment-0015/slave-volume-0001 : SUCCESS
[2019-01-04 22:24:25.894525]  : volume set slave-volume-0001 nfs.addr-namelookup off : SUCCESS
[2019-01-04 22:24:26.418826]  : volume reset slave-volume-0001 network.ping-timeout : SUCCESS
[2019-01-04 22:24:27.086126]  : volume set slave-volume-0001 nfs.disable on : SUCCESS
[2019-01-04 22:24:28.082211]  : volume set slave-volume-0001 performance.stat-prefetch off : SUCCESS
[2019-01-04 22:24:28.873671]  : volume set slave-volume-0001 performance.quick-read off : SUCCESS
[2019-01-04 22:24:29.716433]  : volume set slave-volume-0001 allow-insecure on : SUCCESS
[2019-01-04 22:24:30.272508]  : volume reset slave-volume-0001 nfs.rpc-auth-allow : SUCCESS
[2019-01-04 22:24:31.303757]  : volume start slave-volume-0001 : SUCCESS
[2019-01-04 22:24:51.709793]  : volume geo-replication status : SUCCESS
[2019-01-04 22:24:52.575909]  : volume status slave-volume-0001 : SUCCESS
[2019-01-04 22:27:55.543280]  : system:: uuid get : SUCCESS
[2019-01-04 22:27:55.574006]  : system:: execute mountbroker.py node-add slave-volume-0001 nasgorep : SUCCESS
[2019-01-04 22:28:38.062065]  : system:: copy file /geo-replication/master-volume-0007_slave-volume-0001_common_secret.pem.pub : SUCCESS
[2019-01-04 22:28:38.296566]  : system:: execute add_secret_pub nasgorep geo-replication/master-volume-0007_slave-volume-0001_common_secret.pem.pub : SUCCESS
[2019-01-04 22:31:31.417565]  : volume geo-replication slave-volume-0001 status : SUCCESS

Comment 1 Sunny Kumar 2019-01-07 08:15:21 UTC
Hi,

Can you confirm that whether master and slave on same host ?

- Sunny

Comment 2 Amar Tumballi 2019-06-18 09:02:28 UTC
https://review.gluster.org/22865 should fix the issue.


Note You need to log in before you can comment on or make changes to this bug.