Bug 1668118 - Failure to start geo-replication for tiered volume.
Summary: Failure to start geo-replication for tiered volume.
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-21 23:21 UTC by vnosov
Modified: 2019-05-27 16:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-27 16:20:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description vnosov 2019-01-21 23:21:09 UTC
Description of problem: Status of geo-replication workers on master nodes is "inconsistent" if master volume is tiered. 


Version-Release number of selected component (if applicable):

GlusterFS 5.2 installation from source code TAR file


How reproducible:  100%


Steps to Reproduce:

1. Set up two nodes. One will host geo-replication master volume. Master volume has to be tiered. Other node will host geo-replication slave volume.

[root@SC-10-10-63-182 log]# glusterfsd --version
glusterfs 5.2

[root@SC-10-10-63-183 log]# glusterfsd --version
glusterfs 5.2

 
2. On master node create tiered volume:

[root@SC-10-10-63-182 log]# gluster volume info master-volume-1

Volume Name: master-volume-1
Type: Tier
Volume ID: aa95df34-f181-456c-aa26-9756b68ed679
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 1
Brick1: 10.10.60.182:/exports/master-hot-tier/master-volume-1
Cold Tier:
Cold Tier Type : Distribute
Number of Bricks: 1
Brick2: 10.10.60.182:/exports/master-segment-1/master-volume-1
Options Reconfigured:
features.ctr-sql-db-wal-autocheckpoint: 25000
features.ctr-sql-db-cachesize: 12500
cluster.tier-mode: cache
features.ctr-enabled: on
server.allow-insecure: on
performance.quick-read: off
performance.stat-prefetch: off
nfs.addr-namelookup: off
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: disable
snap-activate-on-create: enable

[root@SC-10-10-63-182 log]# gluster volume status master-volume-1
Status of volume: master-volume-1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.10.60.182:/exports/master-hot-tier
/master-volume-1                            62001     0          Y       15690
Cold Bricks:
Brick 10.10.60.182:/exports/master-segment-
1/master-volume-1                           62000     0          Y       9762
Tier Daemon on localhost                    N/A       N/A        Y       15713

Task Status of Volume master-volume-1
------------------------------------------------------------------------------
There are no active volume tasks

[root@SC-10-10-63-182 log]# gluster volume tier master-volume-1 status
Node                 Promoted files       Demoted files        Status               run time in h:m:s
---------            ---------            ---------            ---------            ---------
localhost            0                    0                    in progress          0:3:40
Tiering Migration Functionality: master-volume-1: success



3. On slave node create slave volume:

[root@SC-10-10-63-183 log]# gluster volume info slave-volume-1

Volume Name: slave-volume-1
Type: Distribute
Volume ID: 569a340b-35f8-4109-8816-720982b11806
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.60.183:/exports/slave-segment-1/slave-volume-1
Options Reconfigured:
server.allow-insecure: on
performance.quick-read: off
performance.stat-prefetch: off
nfs.addr-namelookup: off
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: disable
snap-activate-on-create: enable

[root@SC-10-10-63-183 log]# gluster volume status slave-volume-1
Status of volume: slave-volume-1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.60.183:/exports/slave-segment-1
/slave-volume-1                             62000     0          Y       2532

Task Status of Volume slave-volume-1
------------------------------------------------------------------------------
There are no active volume tasks

4. Set up SSH access to slave node:

SSH from 182 to 183:

20660 01/21/2019 13:58:54.930122501 1548107934 command: /usr/bin/ssh nasgorep.60.183 /bin/pwd
20660 01/21/2019 13:58:55.021906148 1548107935 status=0 /usr/bin/ssh nasgorep.60.183 /bin/pwd
20694 01/21/2019 13:58:56.169890800 1548107936 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.60.183 /bin/pwd 2>&1
20694 01/21/2019 13:58:56.256032202 1548107936 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.60.183 /bin/pwd 2>&1


5. Initialize geo-replication from master volume to slave volume:

[root@SC-10-10-63-182 log]# vi /var/log/glusterfs/cmd_history.log

[2019-01-21 21:59:08.942567]  : system:: execute gsec_create : SUCCESS
[2019-01-21 21:59:42.722194]  : volume geo-replication master-volume-1 nasgorep.60.183::slave-volume-1 create push-pem : SUCCESS
[2019-01-21 21:59:49.527353]  : volume geo-replication master-volume-1 nasgorep.60.183::slave-volume-1 start : SUCCESS
[2019-01-21 21:59:55.636198]  : volume geo-replication master-volume-1 nasgorep.60.183::slave-volume-1 status detail : SUCCESS

6. Check status of the geo-replication:

Actual results:

[root@SC-10-10-63-183 log]# /usr/sbin/gluster-mountbroker status
+-----------+-------------+---------------------------+--------------+---------------------------+
|    NODE   | NODE STATUS |         MOUNT ROOT        |    GROUP     |           USERS           |
+-----------+-------------+---------------------------+--------------+---------------------------+
| localhost |          UP | /var/mountbroker-root(OK) | nasgorep(OK) | nasgorep(slave-volume-1)  |
+-----------+-------------+---------------------------+--------------+---------------------------+

[root@SC-10-10-63-182 log]# gluster volume geo-replication master-volume-1 nasgorep.60.183::slave-volume-1 status

MASTER NODE     MASTER VOL         MASTER BRICK                                 SLAVE USER    SLAVE                                    SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.10.60.182    master-volume-1    /exports/master-hot-tier/master-volume-1     nasgorep      nasgorep.60.183::slave-volume-1    N/A           Stopped    N/A             N/A
10.10.60.182    master-volume-1    /exports/master-segment-1/master-volume-1    nasgorep      nasgorep.60.183::slave-volume-1    N/A           Stopped    N/A             N/A


Expected results:

Status of the geo-replication workers on master node has to be "Active".


Additional info:

Contents of file /var/log/glusterfs/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.log on master node has explanation what is wrong:

[root@SC-10-10-63-182 log]# vi /var/log/glusterfs/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.log

[2019-01-21 21:59:39.347943] W [gsyncd(config-get):304:main] <top>: Session config file not exists, using the default config    path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:42.438145] I [gsyncd(monitor-status):308:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:42.454929] I [subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change  status=Created
[2019-01-21 21:59:48.756702] I [gsyncd(config-get):308:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.4720] I [gsyncd(config-get):308:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.239733] I [gsyncd(config-get):308:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.475193] I [gsyncd(monitor):308:main] <top>: Using session config file  path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.868150] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
[2019-01-21 21:59:49.868396] I [monitor(monitor):157:monitor] Monitor: starting gsyncd worker   slave_node=10.10.60.183 brick=/exports/master-segment-1/master-volume-1
[2019-01-21 21:59:49.871593] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
[2019-01-21 21:59:49.871963] I [monitor(monitor):157:monitor] Monitor: starting gsyncd worker   slave_node=10.10.60.183 brick=/exports/master-hot-tier/master-volume-1
[2019-01-21 21:59:50.4395] I [monitor(monitor):268:monitor] Monitor: worker died before establishing connection brick=/exports/master-segment-1/master-volume-1
[2019-01-21 21:59:50.7447] I [monitor(monitor):268:monitor] Monitor: worker died before establishing connection brick=/exports/master-hot-tier/master-volume-1
[2019-01-21 21:59:50.8415] I [gsyncd(agent /exports/master-segment-1/master-volume-1):308:main] <top>: Using session config file    path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:50.10383] I [gsyncd(agent /exports/master-hot-tier/master-volume-1):308:main] <top>: Using session config file    path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:50.14039] I [repce(agent /exports/master-segment-1/master-volume-1):97:service_loop] RepceServer: terminating on reaching EOF.
[2019-01-21 21:59:50.15556] I [changelogagent(agent /exports/master-hot-tier/master-volume-1):72:__init__] ChangelogAgent: Agent listining...
[2019-01-21 21:59:50.15964] I [repce(agent /exports/master-hot-tier/master-volume-1):97:service_loop] RepceServer: terminating on reaching EOF.
[2019-01-21 21:59:55.141768] I [gsyncd(config-get):308:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:55.380496] I [gsyncd(status):308:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:55.625045] I [gsyncd(status):308:main] <top>: Using session config file   path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 22:00:00.66032] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change  status=inconsistent
[2019-01-21 22:00:00.66289] E [syncdutils(monitor):338:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in twrap
    tf(*aargs)
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 339, in wmon
    slave_host, master, suuid, slavenodes)
TypeError: 'int' object is not iterable


Similar test on GlusterFS 3.12.14 does not show the same failure.

Comment 1 Amar Tumballi 2019-05-27 16:20:50 UTC
We have deprecated 'tier' feature of glusterfs. Hence not possible to fix it in future.


Note You need to log in before you can comment on or make changes to this bug.