1663583 – Geo-replication fails to open logfile "/var/log/glusterfs/cli.log" on slave.

Bug 1663583 - Geo-replication fails to open logfile "/var/log/glusterfs/cli.log" on slave.

Summary: Geo-replication fails to open logfile "/var/log/glusterfs/cli.log" on slave.

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	5
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Sunny Kumar
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-04 23:33 UTC by vnosov
Modified:	2019-06-18 09:02 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-7.0
Clone Of:
Environment:
Last Closed:	2019-06-18 09:02:28 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description vnosov 2019-01-04 23:33:54 UTC

Description of problem: When non root account on slave system is used by SSH to setup geo-replication, the initialization fails on slave to open logfile /var/log/glusterfs/cli.log. As result geo-replication is not active and has status "Created". 


Version-Release number of selected component (if applicable):
GlusterFS 5.2, manually installed from GlusterFS source code.

How reproducible: 100%

In our tests:
Master system IP address: 10.10.60.182.
Slave system DNS name: vn-sc-2.westus2.cloudapp.azure.com
Master volume: master-volume-0007
Slave volume:  slave-volume-0001

Steps to Reproduce:

1. Setup nonprivileged account on slave system. In our case account "nasgorep" was used. Test SSH from master to slave.

12592 01/04/2019 14:27:35.101560269 1546640855 command: /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd
12592 01/04/2019 14:27:35.839285578 1546640855 status=0 /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd
12640 01/04/2019 14:27:37.847050433 1546640857 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1
12640 01/04/2019 14:27:38.596877738 1546640858 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1


2. Setup geo-replication:

12592 01/04/2019 14:27:35.101560269 1546640855 command: /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd
12592 01/04/2019 14:27:35.839285578 1546640855 status=0 /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd
12640 01/04/2019 14:27:37.847050433 1546640857 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1
12640 01/04/2019 14:27:38.596877738 1546640858 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1


[2019-01-04 22:27:52.449174]  : system:: execute gsec_create : SUCCESS
[2019-01-04 22:28:32.288278]  : volume geo-replication master-volume-0007 nasgorep.cloudapp.azure.com::slave-volume-0001 create push-pem : SUCCESS
[2019-01-04 22:28:39.447194]  : volume geo-replication master-volume-0007 nasgorep.cloudapp.azure.com::slave-volume-0001 start : SUCCESS


Actual results: 

Check status of the geo-replication, it is in "Created" state:

[root@SC-10-10-63-182 log]# /usr/sbin/gluster volume geo-replication master-volume-0007 nasgorep.cloudapp.azure.com::slave-olume-0001 status detail

MASTER NODE     MASTER VOL            MASTER BRICK                                       SLAVE USER    SLAVE                                                             SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED    ENTRY    DATA    META    FAILURES    CHECKPOINT TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.10.60.182    master-volume-0007    /exports/master-segment-0015/master-volume-0007    nasgorep      nasgorep.cloudapp.azure.com::slave-volume-0001    N/A           Created    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     N/A


Expected results: state of the geo-replication has to be "Active".


Additional info:

On master system: geo-replication logfile "gsyncd.log" has next error messages:

[root@SC-10-10-63-182 log]# vi /var/log/glusterfs/geo-replication/master-volume-0005_10.10.60.183_slave-volume-0001/gsyncd.log:


……….

[2019-01-04 22:28:40.276280] E [syncdutils(monitor):809:errlog] Popen: command returned error   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 nasgorep.cloudapp.azure.com /usr/sbin/gluster --xml --remote-host=localhost volume info slave-volume-0001  error=255
[2019-01-04 22:28:40.277328] E [syncdutils(monitor):813:logerr] Popen: ssh> ERROR: failed to create logfile "/var/log/glusterfs/cli.log" (Permission denied)
[2019-01-04 22:28:40.277438] E [syncdutils(monitor):813:logerr] Popen: ssh> ERROR: failed to open logfile /var/log/glusterfs/cli.log
…….
 
On slave: file "cli.log" has next attributes:

[root@VN-SC-2 log]# ls -l /var/log/glusterfs/cli.log
-rw------- 1 root root 28552 Jan  4 14:35 /var/log/glusterfs/cli.log

It seems slave system does not let user "nasgorep" open or create logfile 
/var/log/glusterfs/cli.log" when "gluster" call is invoked from master system through SSH.

On slave: command history log:

[root@VN-SC-2 log]# cat /var/log/glusterfs/cmd_history.log
[2019-01-04 22:22:49.893261]  : volume status : SUCCESS
[2019-01-04 22:22:59.929183]  : volume geo-replication status : SUCCESS
[2019-01-04 22:24:25.105256]  : volume create slave-volume-0001 transport tcp VN-SC-2.cxn2n2xii2hepedpq4siv2akfe.xx.internal.cloudapp.net:/exports/nas-segment-0015/slave-volume-0001 : SUCCESS
[2019-01-04 22:24:25.894525]  : volume set slave-volume-0001 nfs.addr-namelookup off : SUCCESS
[2019-01-04 22:24:26.418826]  : volume reset slave-volume-0001 network.ping-timeout : SUCCESS
[2019-01-04 22:24:27.086126]  : volume set slave-volume-0001 nfs.disable on : SUCCESS
[2019-01-04 22:24:28.082211]  : volume set slave-volume-0001 performance.stat-prefetch off : SUCCESS
[2019-01-04 22:24:28.873671]  : volume set slave-volume-0001 performance.quick-read off : SUCCESS
[2019-01-04 22:24:29.716433]  : volume set slave-volume-0001 allow-insecure on : SUCCESS
[2019-01-04 22:24:30.272508]  : volume reset slave-volume-0001 nfs.rpc-auth-allow : SUCCESS
[2019-01-04 22:24:31.303757]  : volume start slave-volume-0001 : SUCCESS
[2019-01-04 22:24:51.709793]  : volume geo-replication status : SUCCESS
[2019-01-04 22:24:52.575909]  : volume status slave-volume-0001 : SUCCESS
[2019-01-04 22:27:55.543280]  : system:: uuid get : SUCCESS
[2019-01-04 22:27:55.574006]  : system:: execute mountbroker.py node-add slave-volume-0001 nasgorep : SUCCESS
[2019-01-04 22:28:38.062065]  : system:: copy file /geo-replication/master-volume-0007_slave-volume-0001_common_secret.pem.pub : SUCCESS
[2019-01-04 22:28:38.296566]  : system:: execute add_secret_pub nasgorep geo-replication/master-volume-0007_slave-volume-0001_common_secret.pem.pub : SUCCESS
[2019-01-04 22:31:31.417565]  : volume geo-replication slave-volume-0001 status : SUCCESS

Comment 1 Sunny Kumar 2019-01-07 08:15:21 UTC

Hi,

Can you confirm that whether master and slave on same host ?

- Sunny

Comment 2 Amar Tumballi 2019-06-18 09:02:28 UTC

https://review.gluster.org/22865 should fix the issue.

Note You need to log in before you can comment on or make changes to this bug.