1221935 – Detach tier commit failed on a dist-rep volume

Bug 1221935 - Detach tier commit failed on a dist-rep volume

Summary: Detach tier commit failed on a dist-rep volume

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Dan Lambright
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:	TIERING
Depends On:
Blocks:	1223636 1224086
TreeView+	depends on / blocked

Reported:	2015-05-15 09:49 UTC by Triveni Rao
Modified:	2016-09-17 14:39 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1224086 (view as bug list)
Environment:
Last Closed:	2016-05-18 11:27:06 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Triveni Rao 2015-05-15 09:49:49 UTC

Description of problem:

Detach tier commit failed on a dist-rep volume. detach tier start succeeds though.

Version-Release number of selected component (if applicable):

[root@rhsqa14-vm1 ~]# rpm -qa | grep gluster
glusterfs-3.7.0-2.el6rhs.x86_64
glusterfs-cli-3.7.0-2.el6rhs.x86_64
glusterfs-libs-3.7.0-2.el6rhs.x86_64
glusterfs-client-xlators-3.7.0-2.el6rhs.x86_64
glusterfs-api-3.7.0-2.el6rhs.x86_64
glusterfs-server-3.7.0-2.el6rhs.x86_64
glusterfs-fuse-3.7.0-2.el6rhs.x86_64
[root@rhsqa14-vm1 ~]# 
[root@rhsqa14-vm1 ~]# 
[root@rhsqa14-vm1 ~]# glusterfs --version
glusterfs 3.7.0 built on May 15 2015 01:31:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@rhsqa14-vm1 ~]# 


How reproducible:

easily

Steps to Reproduce:
1.create dist-rep volume, attach  dist-rep tier,
2.fuse mount the volume, add some dirs and data.
3.now detach tier start, success, detach tier commit fails.

Actual results:

detach tier commit failed.

Expected results:


commit should not fail

Additional info:


[root@rhsqa14-vm1 ~]# gluster v detach-tier vol1 start
volume detach-tier start: success
ID: f62bb1b6-ffdd-40ca-87fe-ea49675a825c
[root@rhsqa14-vm1 ~]# 
You have new mail in /var/spool/mail/root
[root@rhsqa14-vm1 ~]# gluster v info

Volume Name: test
Type: Distribute
Volume ID: 102100c7-6a81-4ffc-9736-b9aa773b5044
Status: Started
Number of Bricks: 2   
Transport-type: tcp   
Bricks:
Brick1: 10.70.46.233:/rhs/brick1/t1
Brick2: 10.70.46.236:/rhs/brick1/t1
Options Reconfigured: 
features.uss: enable  
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.min-free-disk: 10
performance.readdir-ahead: on

Volume Name: vol1
Type: Tier
Volume ID: 37d0a9c0-21c1-46cf-ba95-419f9fbfbab0
Status: Started
Number of Bricks: 8   
Transport-type: tcp   
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.46.236:/rhs/brick5/m1
Brick2: 10.70.46.233:/rhs/brick5/m1
Brick3: 10.70.46.236:/rhs/brick3/m1
Brick4: 10.70.46.233:/rhs/brick3/m1
Cold Bricks:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: 10.70.46.233:/rhs/brick1/t2
Brick6: 10.70.46.236:/rhs/brick1/t2
Brick7: 10.70.46.233:/rhs/brick2/t2
Brick8: 10.70.46.236:/rhs/brick2/t2
Options Reconfigured: 
features.uss: enable  
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.min-free-disk: 10
performance.readdir-ahead: on
[root@rhsqa14-vm1 ~]# gluster v detach-tier vol1 commit
volume detach-tier commit: failed: Staging failed on 4cdeee40-2cb6-463e-ba08-905cedb3d26a. Error: Deleting all the bricks of the volume is not allowed
[root@rhsqa14-vm1 ~]# 


On mount point:

[root@rhsqa14-vm5 disk1]# ls -la
total 80397
drwxr-xr-x.  6 root root     8408 May 15 04:57 .
dr-xr-xr-x. 30 root root     4096 May 15 04:16 ..
drwx------.  3 root root      224 May 15 04:56 linux-4.0
-rw-r--r--.  1 root root 82313052 May 15 04:54 linux-4.0.tar.xz
drwxr-xr-x.  3 root root       96 May 15 04:55 .trashcan
drwxr-xr-x.  2 root root       12 May 15 04:57 triveni
[root@rhsqa14-vm5 disk1]# 



Log messages:


[root@rhsqa14-vm1 ~]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[2015-05-14 06:34:47.596813] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-glusterd: Started running glusterd version 3.7.0beta2 (args: glusterd --xlator-option *.upgrade=on -N)
[2015-05-14 06:34:47.605211] I [graph.c:269:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on'
[2015-05-14 06:34:47.605328] I [glusterd.c:1282:init] 0-management: Maximum allowed open file descriptors set to 65536
[2015-05-14 06:34:47.605370] I [glusterd.c:1327:init] 0-management: Using /var/lib/glusterd as working directory
[2015-05-14 06:34:47.630137] E [rpc-transport.c:291:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.7.0beta2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2015-05-14 06:34:47.630198] W [rpc-transport.c:295:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2015-05-14 06:34:47.630218] W [rpcsvc.c:1595:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2015-05-14 06:34:47.630235] E [glusterd.c:1515:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2015-05-14 06:34:47.649973] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system
[2015-05-14 06:34:47.650135] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-05-14 06:34:47.650161] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-05-14 06:34:47.650173] I [glusterd-store.c:2005:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 30700
[2015-05-14 06:34:47.650462] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-05-14 06:34:47.650781] I [glusterd.c:184:glusterd_uuid_generate_save] 0-management: generated UUID: 87acbf29-e821-48bf-9aa8-bbda9321e609
[2015-05-14 06:34:47.817571] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2015-05-14 06:34:47.818755] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2015-05-14 06:34:47.819295] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2015-05-14 06:34:47.819873] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2015-05-14 06:34:47.820373] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2015-05-14 06:34:47.820867] I [glusterd-store.c:3371:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[2015-05-14 06:34:47.821075] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options, returned error: (No such file or directory)
Final graph:
+------------------------------------------------------------------------------+
...skipping...
[2015-05-15 08:55:31.542076] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id b954b5e0-c4fa-4619-92ca-3c5e657269aa for key rebalance-id
[2015-05-15 08:55:36.665309] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-05-15 08:55:36.721645] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/gluster-rebalance-37d0a9c0-21c1-46cf-ba95-419f9fbfbab0.sock failed (No data available)
[2015-05-15 08:55:36.886990] I [MSGID: 106007] [glusterd-rebalance.c:164:__glusterd_defrag_notify] 0-management: Rebalance process for volume vol1 has disconnected.
[2015-05-15 08:55:36.887034] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=588 max=0 total=0
[2015-05-15 08:55:36.887049] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2015-05-15 08:55:36.887701] E [glusterd-utils.c:7739:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index
[2015-05-15 08:55:36.887776] E [glusterd-utils.c:7739:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index
[2015-05-15 08:55:41.837901] E [glusterd-utils.c:7739:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index
[2015-05-15 08:56:14.356035] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-05-15 08:56:14.360109] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-05-15 08:56:14.363834] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-05-15 08:57:53.033171] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2015-05-15 08:57:53.042488] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id f62bb1b6-ffdd-40ca-87fe-ea49675a825c for key remove-brick-id
[2015-05-15 08:57:58.722604] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-05-15 08:57:58.916716] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/gluster-rebalance-37d0a9c0-21c1-46cf-ba95-419f9fbfbab0.sock failed (No data available)
[2015-05-15 08:57:59.291387] I [MSGID: 106007] [glusterd-rebalance.c:164:__glusterd_defrag_notify] 0-management: Rebalance process for volume vol1 has disconnected.
[2015-05-15 08:57:59.291452] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=588 max=0 total=0
[2015-05-15 08:57:59.291466] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2015-05-15 08:58:16.702972] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-05-15 08:58:16.706960] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-05-15 08:58:16.711386] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-05-15 08:58:24.524612] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2015-05-15 08:58:24.539302] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Staging failed on 4cdeee40-2cb6-463e-ba08-905cedb3d26a. Error: Deleting all the bricks of the volume is not allowed

Comment 3 Dan Lambright 2015-05-20 04:20:35 UTC

This will probably need fix 10795. The issue is related to the new rebalance code inserted in the DHT translator. Will confirm once 10795 is merged.

Comment 4 Mohammed Rafi KC 2015-05-20 07:15:40 UTC

I tried to reproduce the issue with two volumes, I couldn't. 

vol info 

 
Volume Name: patchy
Type: Tier
Volume ID: e538e79b-3a18-4f9c-a153-7b15d736effe
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: dhcp43-148:/home/brick4
Brick2: dhcp42-212:/home/brick4
Brick3: dhcp43-148:/home/brick3
Brick4: dhcp42-212:/home/brick3
Cold Bricks:
Cold Tier Type : Distribute
Number of Bricks: 2
Brick5: 10.70.43.148:/home/brick1
Brick6: 10.70.42.212:/home/brick2
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
 
Volume Name: patchy1
Type: Tier
Volume ID: f95f44fd-4891-472d-9778-c37685327b3e
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: dhcp43-148:/home/brick114
Brick2: dhcp42-212:/home/brick113
Brick3: dhcp43-148:/home/brick112
Brick4: dhcp42-212:/home/brick111
Cold Bricks:
Cold Tier Type : Distribute
Number of Bricks: 2
Brick5: 10.70.43.148:/home/brick11
Brick6: 10.70.42.212:/home/brick12
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on

Comment 5 Mohammed Rafi KC 2015-05-22 07:30:08 UTC

based on RCA given in bug #1222442 , this bug looks like a dependent on the same.

Comment 7 Atin Mukherjee 2016-05-18 07:08:49 UTC

Dan,

I see that the patch http://review.gluster.org/#/c/10795/ is already merged. Is this already part of rhgs-3.1.3? In that case can we move this to ON_QA and get an agreement from QE to test it?

~Atin

Comment 8 Dan Lambright 2016-05-18 11:27:06 UTC

This bug is from 2015 and related to the DHT multithreaded rebalance changes from last year. It probably should have been closed many months ago. The basic problem of commit not working is not an issue we see today.

Note You need to log in before you can comment on or make changes to this bug.