Problem Description : ====================== Glusterfs-afr: Remove brick process ends up with split-brain issue along with failures in rebalance. ===observations:== 1. rebalance process completed with few failures. 2. Split brain messages and Input/output errors seen in log messages 3. Checked the gfid of few files for split-brain could not find them in that state. 4. data is available on the backend bricks which were removed. 5. files which were failed with migration were not found on mount point. 6. Even though the other replica pair is available but files migration failed so its completely data loss issue. Version seen: ============== [root@casino-vm1 ~]# rpm -qa | grep gluster gluster-nagios-addons-0.2.3-1.el6rhs.x86_64 glusterfs-api-3.7.1-10.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-10.el6rhs.x86_64 gluster-nagios-common-0.2.0-1.el6rhs.noarch glusterfs-libs-3.7.1-10.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-10.el6rhs.x86_64 glusterfs-fuse-3.7.1-10.el6rhs.x86_64 glusterfs-server-3.7.1-10.el6rhs.x86_64 glusterfs-rdma-3.7.1-10.el6rhs.x86_64 vdsm-gluster-4.16.20-1.1.el6rhs.noarch glusterfs-3.7.1-10.el6rhs.x86_64 glusterfs-cli-3.7.1-10.el6rhs.x86_64 [root@casino-vm1 ~]# Procedure to reproduce: ========================= 1. create 4x2 distrep volume, mount it, add data, 2. remove brick from n1:b1 n2:b1 start 3. kill -9 n1:b1 while remove brick rebalance is happening. 4. rebalance completed but saw some failures and observed split-brains for few files. Output: ========== [root@casino-vm1 ~]# gluster v create Sun replica 2 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 10.70.35.57:/rhs/brick2/s0 10.70.35.136:/rh s/brick2/s0 10.70.35.57:/rhs/brick3/s0 10.70.35.136:/rhs/brick3/s0 10.70.35.57:/rhs/brick4/s0 10.70.35.136:/rhs/brick4/s0 volume create: Sun: success: please start the volume to access data [root@casino-vm1 ~]# gluster v start Sun volume start: Sun: success [root@casino-vm1 ~]# ./options.sh Sun volume set: success volume quota : success volume set: success volume quota : success volume set: success [root@casino-vm1 ~]# gluster v info Sun Volume Name: Sun Type: Distributed-Replicate Volume ID: c574f9fb-0ac6-4211-9a2e-054227810641 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.70.35.57:/rhs/brick1/s0 Brick2: 10.70.35.136:/rhs/brick1/s0 Brick3: 10.70.35.57:/rhs/brick2/s0 Brick4: 10.70.35.136:/rhs/brick2/s0 Brick5: 10.70.35.57:/rhs/brick3/s0 Brick6: 10.70.35.136:/rhs/brick3/s0 Brick7: 10.70.35.57:/rhs/brick4/s0 Brick8: 10.70.35.136:/rhs/brick4/s0 Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.min-free-disk: 10 performance.readdir-ahead: on [root@casino-vm1 ~]# gluster v statusSun unrecognized word: statusSun (position 1) [root@casino-vm1 ~]# gluster v status Sun Status of volume: Sun Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.57:/rhs/brick1/s0 49180 0 Y 25608 Brick 10.70.35.136:/rhs/brick1/s0 49180 0 Y 29160 Brick 10.70.35.57:/rhs/brick2/s0 49181 0 Y 25626 Brick 10.70.35.136:/rhs/brick2/s0 49181 0 Y 29178 Brick 10.70.35.57:/rhs/brick3/s0 49182 0 Y 25644 Brick 10.70.35.136:/rhs/brick3/s0 49182 0 Y 29196 Brick 10.70.35.57:/rhs/brick4/s0 49183 0 Y 25662 Brick 10.70.35.136:/rhs/brick4/s0 49183 0 Y 29214 Snapshot Daemon on localhost 49184 0 Y 25849 NFS Server on localhost 2049 0 Y 25857 Self-heal Daemon on localhost N/A N/A Y 25689 Quota Daemon on localhost N/A N/A Y 25787 Snapshot Daemon on 10.70.35.136 49184 0 Y 29406 NFS Server on 10.70.35.136 2049 0 Y 29414 Self-heal Daemon on 10.70.35.136 N/A N/A Y 29240 Quota Daemon on 10.70.35.136 N/A N/A Y 29355 Task Status of Volume Sun ------------------------------------------------------------------------------ There are no active volume tasks [root@casino-vm1 ~]# gluster v remove-brick Sun 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 start; gluster v remove-brick Sun 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 status volume remove-brick start: success ID: b0eb939f-3b6c-47fb-b87d-099815954c9e Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 3 0 0 in progress 0.00 10.70.35.136 0 0Bytes 0 0 0 in progress 0.00 [root@casino-vm1 ~]# [root@casino-vm1 ~]# kill -9 25608 [root@casino-vm1 ~]# gluster v remove-brick Sun 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 122 649.5MB 406 89 0 completed 15.00 10.70.35.136 0 0Bytes 0 0 0 completed 0.00 [root@casino-vm1 ~]# [root@casino-vm1 ~]# gluster v status Sun Status of volume: Sun Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.57:/rhs/brick1/s0 N/A N/A N N/A Brick 10.70.35.136:/rhs/brick1/s0 49180 0 Y 29160 Brick 10.70.35.57:/rhs/brick2/s0 49181 0 Y 25626 Brick 10.70.35.136:/rhs/brick2/s0 49181 0 Y 29178 Brick 10.70.35.57:/rhs/brick3/s0 49182 0 Y 25644 Brick 10.70.35.136:/rhs/brick3/s0 49182 0 Y 29196 Brick 10.70.35.57:/rhs/brick4/s0 49183 0 Y 25662 Brick 10.70.35.136:/rhs/brick4/s0 49183 0 Y 29214 Snapshot Daemon on localhost 49184 0 Y 25849 NFS Server on localhost 2049 0 Y 25857 Self-heal Daemon on localhost N/A N/A Y 25689 Quota Daemon on localhost N/A N/A Y 25787 Snapshot Daemon on 10.70.35.136 49184 0 Y 29406 NFS Server on 10.70.35.136 2049 0 Y 29414 Self-heal Daemon on 10.70.35.136 N/A N/A Y 29240 Quota Daemon on 10.70.35.136 N/A N/A Y 29355 Task Status of Volume Sun ------------------------------------------------------------------------------ Task : Remove brick ID : b0eb939f-3b6c-47fb-b87d-099815954c9e Removed bricks: 10.70.35.57:/rhs/brick1/s0 10.70.35.136:/rhs/brick1/s0 Status : completed [root@casino-vm1 ~]# LOG messages: ================== [2015-07-21 04:22:47.466810] I [dht-rebalance.c:1002:dht_migrate_file] 0-Sun-dht: /193.txt: attempting to move from Sun-replicate-0 to Sun-replicate-1 [2015-07-21 04:22:47.473741] E [MSGID: 101046] [afr-inode-write.c:1534:afr_fsetxattr] 0-Sun-replicate-1: setxattr dict is null [2015-07-21 04:22:47.478679] E [MSGID: 101046] [afr-inode-write.c:1534:afr_fsetxattr] 0-Sun-replicate-1: setxattr dict is null [2015-07-21 04:22:47.489583] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /192.txt on Sun-replicate-0 [2015-07-21 04:22:47.490548] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-Sun-dht: failed to set xattr on /193.txt in Sun-replicate-0 (Input/output error) [2015-07-21 04:22:47.490571] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /193.txt on Sun-replicate-0 [2015-07-21 04:22:47.491618] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 0-Sun-replicate-0: Failing SETXATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] [2015-07-21 04:22:47.491992] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-Sun-dht: failed to set xattr on /196.txt in Sun-replicate-0 (Input/output error) [2015-07-21 04:22:47.492022] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /196.txt on Sun-replicate-0 [2015-07-21 04:22:47.494151] I [dht-rebalance.c:1002:dht_migrate_file] 0-Sun-dht: /198.txt: attempting to move from Sun-replicate-0 to Sun-replicate-1 [2015-07-21 04:22:47.495695] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-Sun-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2015-07-21 04:22:47.495734] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-Sun-replicate-0: Failing GETXATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] [2015-07-21 04:22:47.495772] W [MSGID: 109023] [dht-rebalance.c:1076:dht_migrate_file] 0-Sun-dht: Migrate file failed:/198.txt: failed to get xattr from Sun-replicate-0 (Invalid argument) [2015-07-21 04:22:47.498800] E [MSGID: 101046] [afr-inode-write.c:1534:afr_fsetxattr] 0-Sun-replicate-1: setxattr dict is null [2015-07-21 04:22:47.499143] W [MSGID: 109023] [dht-rebalance.c:546:__dht_rebalance_create_dst_file] 0-Sun-dht: /198.txt: failed to set xattr on Sun-replicate-1 (Cannot allocate memory) [2015-07-21 04:22:47.503755] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 0-Sun-replicate-0: Failing SETXATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] [2015-07-21 04:22:47.504157] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-Sun-dht: failed to set xattr on /198.txt in Sun-replicate-0 (Input/output error) [2015-07-21 04:22:47.504181] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-Sun-dht: Migrate file failed: failed to open /198.txt on Sun-replicate-0 [2015-07-21 04:22:47.505092] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-Sun-dht: Rebalance is completed. Time taken is 15.00 secs [2015-07-21 04:22:47.505120] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-Sun-dht: Files migrated: 122, size: 681014120, lookups: 406, failures: 0, skipped: 89 [2015-07-21 04:22:47.505352] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x374b207a51) [0x7f597c820a51] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f597dbff075] -->/usr/sbin/glusterfs(cleanup_and_exit+0x71) [0x7f597dbfeba1] ) 0-: received signum (15), shutting down Arequal checksum mismatch on mount point observed: ===================================================== Before remove brcik: [root@casino-vm5 ~]# ./arequal /fuse Entry counts Regular files : 404 Directories : 3 Symbolic links : 0 Other : 0 Total : 407 Metadata checksums Regular files : 2cb0 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 92a2b620c302e3ed74f4d32523e5b44b Directories : 141e1432434b343a Symbolic links : 0 Other : 0 Total : f2487137a3ac639c [root@casino-vm5 ~]# [root@casino-vm5 ~]# After remove brick [root@casino-vm5 ~]# ./arequal /fuse Entry counts Regular files : 316 Directories : 3 Symbolic links : 0 Other : 0 Total : 319 Metadata checksums Regular files : 2cb0 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 732dae0738e6be8218a86e714468f721 Directories : 3a2b5a7452771c1c Symbolic links : 0 Other : 0 Total : 51ae9a022ef955bf [root@casino-vm5 ~]# On Backend: ============= [root@casino-vm1 ~]# ls /rhs/brick1/s0/ 101.txt 133.txt 156.txt 170.txt 196.txt 34.txt 51.txt 81.txt 97.txt file123.txt file155.txt file176.txt file82.txt 107.txt 13.txt 159.txt 172.txt 198.txt 35.txt 52.txt 83.txt file107.txt file126.txt file157.txt file180.txt file90.txt 112.txt 140.txt 160.txt 173.txt 23.txt 37.txt 58.txt 8.txt file112.txt file138.txt file158.txt file195.txt file94.txt 114.txt 145.txt 164.txt 176.txt 26.txt 41.txt 66.txt 90.txt file116.txt file139.txt file160.txt file197.txt file97.txt 117.txt 146.txt 165.txt 17.txt 28.txt 42.txt 68.txt 91.txt file118.txt file145.txt file161.txt file198.txt file99.txt 122.txt 147.txt 166.txt 192.txt 29.txt 46.txt 76.txt 93.txt file119.txt file146.txt file167.txt file70.txt 132.txt 152.txt 168.txt 193.txt 33.txt 4.txt 79.txt 94.txt file121.txt file150.txt file171.txt file77.txt [root@casino-vm1 ~]# ls /rhs/brick1/s0/ | wc -l 89 [root@casino-vm1 ~]# On mount point: ==================== [root@casino-vm5 nfs]# ls -la file121.txt ls: cannot access file121.txt: No such file or directory [root@casino-vm5 nfs]# ls -la 152.txt ls: cannot access 152.txt: No such file or directory [root@casino-vm5 nfs]# ls -la file77.txt ls: cannot access file77.txt: No such file or directory [root@casino-vm5 nfs]#
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1245202/sosreport-casino-vm1.lab.eng.blr.redhat.com.001-20150721022924.tar sosreport uploaded
I have seen this issue already with RHEV-RHGS integration, where while remove-brick operation lead to Split-brain issue, which in-turn cause application VMs to go to PAUSED state. I have filed a bug - https://bugzilla.redhat.com/show_bug.cgi?id=1243542 - for this issue But this issue is not accepted as blocker, as there is no real data loss
moving the component to AFR as it is reported as a split brain issue.
similar to bz#1244197this seems to be more of a dht-rebalance case, Prasad can you check if this problem happens and update with latest results
Re-tried this remove-brick, dht-rebalance scenario on a dist-replicate volume (4X3) with the recent build # rpm -qa | grep gluster glusterfs-client-xlators-3.12.2-31.el7rhgs.x86_64 glusterfs-debuginfo-3.12.2-31.el7rhgs.x86_64 glusterfs-cli-3.12.2-31.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 glusterfs-libs-3.12.2-31.el7rhgs.x86_64 glusterfs-api-3.12.2-31.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-31.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch Rebalance was successfully completed and no split-brain was observed, Issue is no longer seen,