Bug 1245202 - Glusterfs-afr: Remove brick process ends up with split-brain issue along with failures in rebalance.
Summary: Glusterfs-afr: Remove brick process ends up with split-brain issue along with...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: AFR
Depends On: 1243542
Blocks: 1216951
TreeView+ depends on / blocked
 
Reported: 2015-07-21 13:15 UTC by Triveni Rao
Modified: 2018-12-17 04:38 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
When rebalance is run as a part of remove-brick command, some files may be reported as split-brain and, therefore, not migrated, even if the files are not split-brain. Workaround: Manually copy the files that did not migrate from the bricks into the Gluster volume via the mount.
Clone Of:
Environment:
Last Closed: 2016-06-15 09:19:44 UTC
Embargoed:


Attachments (Terms of Use)

Description Triveni Rao 2015-07-21 13:15:12 UTC
Problem Description :
======================

Glusterfs-afr: Remove brick process ends up with split-brain issue along with failures in rebalance.

===observations:==
1. rebalance process completed with few failures.
2. Split brain messages and Input/output errors seen in log messages
3. Checked the gfid of few files for split-brain could not find them in that state.
4. data is available on the backend bricks which were removed.
5. files which were failed with migration were not found on mount point.
6. Even though the other replica pair is available but files migration failed so its completely data loss issue.


Version seen:
==============
[root@casino-vm1 ~]# rpm -qa | grep gluster
gluster-nagios-addons-0.2.3-1.el6rhs.x86_64
glusterfs-api-3.7.1-10.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-10.el6rhs.x86_64
gluster-nagios-common-0.2.0-1.el6rhs.noarch
glusterfs-libs-3.7.1-10.el6rhs.x86_64
glusterfs-client-xlators-3.7.1-10.el6rhs.x86_64
glusterfs-fuse-3.7.1-10.el6rhs.x86_64
glusterfs-server-3.7.1-10.el6rhs.x86_64
glusterfs-rdma-3.7.1-10.el6rhs.x86_64
vdsm-gluster-4.16.20-1.1.el6rhs.noarch
glusterfs-3.7.1-10.el6rhs.x86_64
glusterfs-cli-3.7.1-10.el6rhs.x86_64
[root@casino-vm1 ~]# 

Procedure to reproduce:
=========================

1. create 4x2 distrep volume, mount it, add data,
2. remove brick from n1:b1 n2:b1 start
3. kill -9 n1:b1 while remove brick rebalance is happening.
4. rebalance completed but saw some failures and observed split-brains for few files.


Output:
==========

[root@casino-vm1 ~]# gluster v create Sun replica 2 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 10.70.35.57:/rhs/brick2/s0 10.70.35.136:/rh
s/brick2/s0 10.70.35.57:/rhs/brick3/s0 10.70.35.136:/rhs/brick3/s0 10.70.35.57:/rhs/brick4/s0 10.70.35.136:/rhs/brick4/s0
volume create: Sun: success: please start the volume to access data
[root@casino-vm1 ~]# gluster v start Sun
volume start: Sun: success
[root@casino-vm1 ~]# ./options.sh Sun
volume set: success  
volume quota : success
volume set: success  
volume quota : success
volume set: success  
[root@casino-vm1 ~]# gluster v info Sun                                                                                                               

Volume Name: Sun
Type: Distributed-Replicate
Volume ID: c574f9fb-0ac6-4211-9a2e-054227810641
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp  
Bricks:
Brick1: 10.70.35.57:/rhs/brick1/s0
Brick2: 10.70.35.136:/rhs/brick1/s0
Brick3: 10.70.35.57:/rhs/brick2/s0
Brick4: 10.70.35.136:/rhs/brick2/s0
Brick5: 10.70.35.57:/rhs/brick3/s0
Brick6: 10.70.35.136:/rhs/brick3/s0
Brick7: 10.70.35.57:/rhs/brick4/s0
Brick8: 10.70.35.136:/rhs/brick4/s0
Options Reconfigured:
features.uss: enable 
features.quota-deem-statfs: on
features.inode-quota: on


features.quota: on   
cluster.min-free-disk: 10
performance.readdir-ahead: on
[root@casino-vm1 ~]# gluster v statusSun
unrecognized word: statusSun (position 1)
[root@casino-vm1 ~]# gluster v status Sun
Status of volume: Sun
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.57:/rhs/brick1/s0            49180     0          Y       25608
Brick 10.70.35.136:/rhs/brick1/s0           49180     0          Y       29160
Brick 10.70.35.57:/rhs/brick2/s0            49181     0          Y       25626
Brick 10.70.35.136:/rhs/brick2/s0           49181     0          Y       29178
Brick 10.70.35.57:/rhs/brick3/s0            49182     0          Y       25644
Brick 10.70.35.136:/rhs/brick3/s0           49182     0          Y       29196
Brick 10.70.35.57:/rhs/brick4/s0            49183     0          Y       25662
Brick 10.70.35.136:/rhs/brick4/s0           49183     0          Y       29214
Snapshot Daemon on localhost                49184     0          Y       25849
NFS Server on localhost                     2049      0          Y       25857
Self-heal Daemon on localhost               N/A       N/A        Y       25689
Quota Daemon on localhost                   N/A       N/A        Y       25787
Snapshot Daemon on 10.70.35.136             49184     0          Y       29406
NFS Server on 10.70.35.136                  2049      0          Y       29414
Self-heal Daemon on 10.70.35.136            N/A       N/A        Y       29240
Quota Daemon on 10.70.35.136                N/A       N/A        Y       29355

Task Status of Volume Sun
------------------------------------------------------------------------------
There are no active volume tasks

[root@casino-vm1 ~]# gluster v remove-brick Sun 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 start; gluster v remove-brick Sun 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 status
volume remove-brick start: success
ID: b0eb939f-3b6c-47fb-b87d-099815954c9e
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             3             0             0          in progress               0.00
                            10.70.35.136                0        0Bytes             0             0             0          in progress               0.00
[root@casino-vm1 ~]#

[root@casino-vm1 ~]# kill -9 25608  


[root@casino-vm1 ~]# gluster v remove-brick Sun 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 status                                                                             Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              122       649.5MB           406            89             0            completed              15.00
                            10.70.35.136                0        0Bytes             0             0             0            completed               0.00
[root@casino-vm1 ~]# 

[root@casino-vm1 ~]# gluster v status Sun                                                                                                             Status of volume: Sun
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.57:/rhs/brick1/s0            N/A       N/A        N       N/A
Brick 10.70.35.136:/rhs/brick1/s0           49180     0          Y       29160
Brick 10.70.35.57:/rhs/brick2/s0            49181     0          Y       25626
Brick 10.70.35.136:/rhs/brick2/s0           49181     0          Y       29178
Brick 10.70.35.57:/rhs/brick3/s0            49182     0          Y       25644
Brick 10.70.35.136:/rhs/brick3/s0           49182     0          Y       29196
Brick 10.70.35.57:/rhs/brick4/s0            49183     0          Y       25662
Brick 10.70.35.136:/rhs/brick4/s0           49183     0          Y       29214
Snapshot Daemon on localhost                49184     0          Y       25849
NFS Server on localhost                     2049      0          Y       25857
Self-heal Daemon on localhost               N/A       N/A        Y       25689
Quota Daemon on localhost                   N/A       N/A        Y       25787
Snapshot Daemon on 10.70.35.136             49184     0          Y       29406
NFS Server on 10.70.35.136                  2049      0          Y       29414
Self-heal Daemon on 10.70.35.136            N/A       N/A        Y       29240
Quota Daemon on 10.70.35.136                N/A       N/A        Y       29355

Task Status of Volume Sun
------------------------------------------------------------------------------
Task                 : Remove brick
ID                   : b0eb939f-3b6c-47fb-b87d-099815954c9e
Removed bricks:
10.70.35.57:/rhs/brick1/s0
10.70.35.136:/rhs/brick1/s0
Status               : completed

[root@casino-vm1 ~]# 


LOG messages:
==================

[2015-07-21 04:22:47.466810] I [dht-rebalance.c:1002:dht_migrate_file] 0-Sun-dht: /193.txt: attempting to move from Sun-replicate-0 to Sun-replicate-1
[2015-07-21 04:22:47.473741] E [MSGID: 101046] [afr-inode-write.c:1534:afr_fsetxattr] 0-Sun-replicate-1: setxattr dict is null
[2015-07-21 04:22:47.478679] E [MSGID: 101046] [afr-inode-write.c:1534:afr_fsetxattr] 0-Sun-replicate-1: setxattr dict is null
[2015-07-21 04:22:47.489583] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /192.txt on Sun-replicate-0
[2015-07-21 04:22:47.490548] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-Sun-dht: failed to set xattr on /193.txt in Sun-replicate-0 (Input/output error)
[2015-07-21 04:22:47.490571] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /193.txt on Sun-replicate-0
[2015-07-21 04:22:47.491618] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 0-Sun-replicate-0: Failing SETXATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error]
[2015-07-21 04:22:47.491992] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-Sun-dht: failed to set xattr on /196.txt in Sun-replicate-0 (Input/output error)
[2015-07-21 04:22:47.492022] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /196.txt on Sun-replicate-0
[2015-07-21 04:22:47.494151] I [dht-rebalance.c:1002:dht_migrate_file] 0-Sun-dht: /198.txt: attempting to move from Sun-replicate-0 to Sun-replicate-1
[2015-07-21 04:22:47.495695] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-Sun-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-07-21 04:22:47.495734] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-Sun-replicate-0: Failing GETXATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error]
[2015-07-21 04:22:47.495772] W [MSGID: 109023] [dht-rebalance.c:1076:dht_migrate_file] 0-Sun-dht: Migrate file failed:/198.txt: failed to get xattr from Sun-replicate-0 (Invalid argument)
[2015-07-21 04:22:47.498800] E [MSGID: 101046] [afr-inode-write.c:1534:afr_fsetxattr] 0-Sun-replicate-1: setxattr dict is null
[2015-07-21 04:22:47.499143] W [MSGID: 109023] [dht-rebalance.c:546:__dht_rebalance_create_dst_file] 0-Sun-dht: /198.txt: failed to set xattr on Sun-replicate-1 (Cannot allocate memory)
[2015-07-21 04:22:47.503755] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 0-Sun-replicate-0: Failing SETXATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error]
[2015-07-21 04:22:47.504157] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-Sun-dht: failed to set xattr on /198.txt in Sun-replicate-0 (Input/output error)
[2015-07-21 04:22:47.504181] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /198.txt on Sun-replicate-0
[2015-07-21 04:22:47.505092] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-Sun-dht: Rebalance is completed. Time taken is 15.00 secs
[2015-07-21 04:22:47.505120] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-Sun-dht: Files migrated: 122, size: 681014120, lookups: 406, failures: 0, skipped: 89
[2015-07-21 04:22:47.505352] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x374b207a51) [0x7f597c820a51] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f597dbff075] -->/usr/sbin/glusterfs(cleanup_and_exit+0x71) [0x7f597dbfeba1] ) 0-: received signum (15), shutting down



Arequal checksum mismatch on mount point observed:
=====================================================

Before remove brcik:
[root@casino-vm5 ~]# ./arequal /fuse

Entry counts
Regular files   : 404
Directories     : 3
Symbolic links  : 0
Other           : 0
Total           : 407

Metadata checksums
Regular files   : 2cb0
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 92a2b620c302e3ed74f4d32523e5b44b
Directories     : 141e1432434b343a
Symbolic links  : 0
Other           : 0
Total           : f2487137a3ac639c
[root@casino-vm5 ~]# 
[root@casino-vm5 ~]# 


After remove brick

[root@casino-vm5 ~]# ./arequal /fuse

Entry counts
Regular files   : 316
Directories     : 3
Symbolic links  : 0
Other           : 0
Total           : 319

Metadata checksums
Regular files   : 2cb0
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 732dae0738e6be8218a86e714468f721
Directories     : 3a2b5a7452771c1c
Symbolic links  : 0
Other           : 0
Total           : 51ae9a022ef955bf
[root@casino-vm5 ~]# 


On Backend:
=============

[root@casino-vm1 ~]# ls /rhs/brick1/s0/
101.txt  133.txt  156.txt  170.txt  196.txt  34.txt  51.txt  81.txt  97.txt       file123.txt  file155.txt  file176.txt  file82.txt
107.txt  13.txt   159.txt  172.txt  198.txt  35.txt  52.txt  83.txt  file107.txt  file126.txt  file157.txt  file180.txt  file90.txt
112.txt  140.txt  160.txt  173.txt  23.txt   37.txt  58.txt  8.txt   file112.txt  file138.txt  file158.txt  file195.txt  file94.txt
114.txt  145.txt  164.txt  176.txt  26.txt   41.txt  66.txt  90.txt  file116.txt  file139.txt  file160.txt  file197.txt  file97.txt
117.txt  146.txt  165.txt  17.txt   28.txt   42.txt  68.txt  91.txt  file118.txt  file145.txt  file161.txt  file198.txt  file99.txt
122.txt  147.txt  166.txt  192.txt  29.txt   46.txt  76.txt  93.txt  file119.txt  file146.txt  file167.txt  file70.txt
132.txt  152.txt  168.txt  193.txt  33.txt   4.txt   79.txt  94.txt  file121.txt  file150.txt  file171.txt  file77.txt
[root@casino-vm1 ~]# ls /rhs/brick1/s0/ | wc -l
89
[root@casino-vm1 ~]# 

On mount point:
====================

[root@casino-vm5 nfs]# ls -la file121.txt
ls: cannot access file121.txt: No such file or directory
[root@casino-vm5 nfs]# ls -la 152.txt
ls: cannot access 152.txt: No such file or directory
[root@casino-vm5 nfs]# ls -la file77.txt
ls: cannot access file77.txt: No such file or directory
[root@casino-vm5 nfs]#

Comment 3 SATHEESARAN 2015-07-21 16:26:20 UTC
I have seen this issue already with RHEV-RHGS integration, where while remove-brick operation lead to Split-brain issue, which in-turn cause application VMs to go to PAUSED state. I have filed a bug - https://bugzilla.redhat.com/show_bug.cgi?id=1243542 - for this issue

But this issue is not accepted as blocker, as there is no real data loss

Comment 6 Susant Kumar Palai 2015-09-24 10:18:42 UTC
moving the component to AFR as it is reported as a split brain issue.

Comment 12 Nag Pavan Chilakam 2018-04-05 09:21:57 UTC
similar to bz#1244197this seems to be more of a dht-rebalance case, Prasad can you check if this problem happens and update with latest results

Comment 14 Anees Patel 2018-12-17 04:38:54 UTC
Re-tried this remove-brick, dht-rebalance scenario on a dist-replicate volume (4X3) with the recent build
# rpm -qa | grep gluster
glusterfs-client-xlators-3.12.2-31.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-31.el7rhgs.x86_64
glusterfs-cli-3.12.2-31.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-libs-3.12.2-31.el7rhgs.x86_64
glusterfs-api-3.12.2-31.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-31.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch


Rebalance was successfully completed and no split-brain was observed,
Issue is no longer seen,


Note You need to log in before you can comment on or make changes to this bug.