Description of problem: Detach tier commit failed on a dist-rep volume. detach tier start succeeds though. Version-Release number of selected component (if applicable): [root@rhsqa14-vm1 ~]# rpm -qa | grep gluster glusterfs-3.7.0-2.el6rhs.x86_64 glusterfs-cli-3.7.0-2.el6rhs.x86_64 glusterfs-libs-3.7.0-2.el6rhs.x86_64 glusterfs-client-xlators-3.7.0-2.el6rhs.x86_64 glusterfs-api-3.7.0-2.el6rhs.x86_64 glusterfs-server-3.7.0-2.el6rhs.x86_64 glusterfs-fuse-3.7.0-2.el6rhs.x86_64 [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# glusterfs --version glusterfs 3.7.0 built on May 15 2015 01:31:10 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@rhsqa14-vm1 ~]# How reproducible: easily Steps to Reproduce: 1.create dist-rep volume, attach dist-rep tier, 2.fuse mount the volume, add some dirs and data. 3.now detach tier start, success, detach tier commit fails. Actual results: detach tier commit failed. Expected results: commit should not fail Additional info: [root@rhsqa14-vm1 ~]# gluster v detach-tier vol1 start volume detach-tier start: success ID: f62bb1b6-ffdd-40ca-87fe-ea49675a825c [root@rhsqa14-vm1 ~]# You have new mail in /var/spool/mail/root [root@rhsqa14-vm1 ~]# gluster v info Volume Name: test Type: Distribute Volume ID: 102100c7-6a81-4ffc-9736-b9aa773b5044 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.70.46.233:/rhs/brick1/t1 Brick2: 10.70.46.236:/rhs/brick1/t1 Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.min-free-disk: 10 performance.readdir-ahead: on Volume Name: vol1 Type: Tier Volume ID: 37d0a9c0-21c1-46cf-ba95-419f9fbfbab0 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.46.236:/rhs/brick5/m1 Brick2: 10.70.46.233:/rhs/brick5/m1 Brick3: 10.70.46.236:/rhs/brick3/m1 Brick4: 10.70.46.233:/rhs/brick3/m1 Cold Bricks: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: 10.70.46.233:/rhs/brick1/t2 Brick6: 10.70.46.236:/rhs/brick1/t2 Brick7: 10.70.46.233:/rhs/brick2/t2 Brick8: 10.70.46.236:/rhs/brick2/t2 Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.min-free-disk: 10 performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# gluster v detach-tier vol1 commit volume detach-tier commit: failed: Staging failed on 4cdeee40-2cb6-463e-ba08-905cedb3d26a. Error: Deleting all the bricks of the volume is not allowed [root@rhsqa14-vm1 ~]# On mount point: [root@rhsqa14-vm5 disk1]# ls -la total 80397 drwxr-xr-x. 6 root root 8408 May 15 04:57 . dr-xr-xr-x. 30 root root 4096 May 15 04:16 .. drwx------. 3 root root 224 May 15 04:56 linux-4.0 -rw-r--r--. 1 root root 82313052 May 15 04:54 linux-4.0.tar.xz drwxr-xr-x. 3 root root 96 May 15 04:55 .trashcan drwxr-xr-x. 2 root root 12 May 15 04:57 triveni [root@rhsqa14-vm5 disk1]# Log messages: [root@rhsqa14-vm1 ~]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log [2015-05-14 06:34:47.596813] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-glusterd: Started running glusterd version 3.7.0beta2 (args: glusterd --xlator-option *.upgrade=on -N) [2015-05-14 06:34:47.605211] I [graph.c:269:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on' [2015-05-14 06:34:47.605328] I [glusterd.c:1282:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-05-14 06:34:47.605370] I [glusterd.c:1327:init] 0-management: Using /var/lib/glusterd as working directory [2015-05-14 06:34:47.630137] E [rpc-transport.c:291:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.7.0beta2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2015-05-14 06:34:47.630198] W [rpc-transport.c:295:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2015-05-14 06:34:47.630218] W [rpcsvc.c:1595:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-05-14 06:34:47.630235] E [glusterd.c:1515:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2015-05-14 06:34:47.649973] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [2015-05-14 06:34:47.650135] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory) [2015-05-14 06:34:47.650161] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory) [2015-05-14 06:34:47.650173] I [glusterd-store.c:2005:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 30700 [2015-05-14 06:34:47.650462] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory) [2015-05-14 06:34:47.650781] I [glusterd.c:184:glusterd_uuid_generate_save] 0-management: generated UUID: 87acbf29-e821-48bf-9aa8-bbda9321e609 [2015-05-14 06:34:47.817571] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2015-05-14 06:34:47.818755] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2015-05-14 06:34:47.819295] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2015-05-14 06:34:47.819873] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2015-05-14 06:34:47.820373] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2015-05-14 06:34:47.820867] I [glusterd-store.c:3371:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. [2015-05-14 06:34:47.821075] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options, returned error: (No such file or directory) Final graph: +------------------------------------------------------------------------------+ ...skipping... [2015-05-15 08:55:31.542076] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id b954b5e0-c4fa-4619-92ca-3c5e657269aa for key rebalance-id [2015-05-15 08:55:36.665309] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-05-15 08:55:36.721645] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/gluster-rebalance-37d0a9c0-21c1-46cf-ba95-419f9fbfbab0.sock failed (No data available) [2015-05-15 08:55:36.886990] I [MSGID: 106007] [glusterd-rebalance.c:164:__glusterd_defrag_notify] 0-management: Rebalance process for volume vol1 has disconnected. [2015-05-15 08:55:36.887034] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-05-15 08:55:36.887049] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-05-15 08:55:36.887701] E [glusterd-utils.c:7739:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index [2015-05-15 08:55:36.887776] E [glusterd-utils.c:7739:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index [2015-05-15 08:55:41.837901] E [glusterd-utils.c:7739:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index [2015-05-15 08:56:14.356035] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-05-15 08:56:14.360109] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-05-15 08:56:14.363834] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-05-15 08:57:53.033171] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2015-05-15 08:57:53.042488] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id f62bb1b6-ffdd-40ca-87fe-ea49675a825c for key remove-brick-id [2015-05-15 08:57:58.722604] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-05-15 08:57:58.916716] W [socket.c:642:__socket_rwv] 0-management: readv on /var/run/gluster/gluster-rebalance-37d0a9c0-21c1-46cf-ba95-419f9fbfbab0.sock failed (No data available) [2015-05-15 08:57:59.291387] I [MSGID: 106007] [glusterd-rebalance.c:164:__glusterd_defrag_notify] 0-management: Rebalance process for volume vol1 has disconnected. [2015-05-15 08:57:59.291452] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-05-15 08:57:59.291466] I [mem-pool.c:604:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-05-15 08:58:16.702972] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-05-15 08:58:16.706960] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-05-15 08:58:16.711386] I [glusterd-handler.c:1402:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-05-15 08:58:24.524612] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2015-05-15 08:58:24.539302] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Staging failed on 4cdeee40-2cb6-463e-ba08-905cedb3d26a. Error: Deleting all the bricks of the volume is not allowed
This will probably need fix 10795. The issue is related to the new rebalance code inserted in the DHT translator. Will confirm once 10795 is merged.
I tried to reproduce the issue with two volumes, I couldn't. vol info Volume Name: patchy Type: Tier Volume ID: e538e79b-3a18-4f9c-a153-7b15d736effe Status: Started Number of Bricks: 6 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: dhcp43-148:/home/brick4 Brick2: dhcp42-212:/home/brick4 Brick3: dhcp43-148:/home/brick3 Brick4: dhcp42-212:/home/brick3 Cold Bricks: Cold Tier Type : Distribute Number of Bricks: 2 Brick5: 10.70.43.148:/home/brick1 Brick6: 10.70.42.212:/home/brick2 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on Volume Name: patchy1 Type: Tier Volume ID: f95f44fd-4891-472d-9778-c37685327b3e Status: Started Number of Bricks: 6 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: dhcp43-148:/home/brick114 Brick2: dhcp42-212:/home/brick113 Brick3: dhcp43-148:/home/brick112 Brick4: dhcp42-212:/home/brick111 Cold Bricks: Cold Tier Type : Distribute Number of Bricks: 2 Brick5: 10.70.43.148:/home/brick11 Brick6: 10.70.42.212:/home/brick12 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on
based on RCA given in bug #1222442 , this bug looks like a dependent on the same.
Dan, I see that the patch http://review.gluster.org/#/c/10795/ is already merged. Is this already part of rhgs-3.1.3? In that case can we move this to ON_QA and get an agreement from QE to test it? ~Atin
This bug is from 2015 and related to the DHT multithreaded rebalance changes from last year. It probably should have been closed many months ago. The basic problem of commit not working is not an issue we see today.