Description of problem: While doing systemic testing, Hit issues where multiple files are pending heal and few files are in split-brain, Version-Release number of selected component (if applicable): # rpm -qa | grep gluster glusterfs-fuse-6.0-3.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-6.0-3.el7rhgs.x86_64 glusterfs-devel-6.0-3.el7rhgs.x86_64 glusterfs-cloudsync-plugins-6.0-3.el7rhgs.x86_64 glusterfs-client-xlators-6.0-3.el7rhgs.x86_64 glusterfs-server-6.0-3.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.9.x86_64 glusterfs-debuginfo-6.0-3.el7rhgs.x86_64 glusterfs-api-6.0-3.el7rhgs.x86_64 glusterfs-geo-replication-6.0-3.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 python2-gluster-6.0-3.el7rhgs.x86_64 glusterfs-libs-6.0-3.el7rhgs.x86_64 glusterfs-rdma-6.0-3.el7rhgs.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch glusterfs-events-6.0-3.el7rhgs.x86_64 glusterfs-cli-6.0-3.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. Create 2 1X3 replicate volumes # gluster v list emptyvol testvol_replicated 2. Write continous IO(All types of FOPs) 3. Executed a script that does the following. 1. gets a list of all bricks for a volume- testvol_replicated (3 bricks initially), 2. kills 2 bricks (b0, b1) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up. 3. kill 2 more bricks (b1, b2) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up using glusterd restart. 4. Repeat the steps 1,2 and 3 multiple times 4. Executed Add-brick to convert volume testvol_replicated to 2X3 5. The script keeps running and now gets a list of all 6bricks and kills 2 bricks at a time in loop. 5. Re-balance was executed and heal was triggerred The test case was executed to catch any dirty xattr's set during brick disconnects to find any races. Actual results: Multiple files pending heal and multiple files in split-brain. Also at 1 node,2 shd daemons have spun up, Expected results: Heal should complete with no files in split-brain. Each node should have 1 shd daemon Additional info:
Hi Anees, Please provide the sos-reports and the volume status output. Regards, Karthik
The steps to reproduce mentioned are: 1. Create 2 1X3 replicate volumes # gluster v list emptyvol testvol_replicated 2. Write continous IO(All types of FOPs) 3. Executed a script that does the following. 1. gets a list of all bricks for a volume- testvol_replicated (3 bricks initially), 2. kills 2 bricks (b0, b1) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up. 3. kill 2 more bricks (b1, b2) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up using glusterd restart. 4. Repeat the steps 1,2 and 3 multiple times 4. Executed Add-brick to convert volume testvol_replicated to 2X3 5. The script keeps running and now gets a list of all 6bricks and kills 2 bricks at a time in loop. 5. Re-balance was executed and heal was triggerred In step 5 where we need to perform rebalance ,as this step is performed with continous IO (crefi) and simultaneously the script in 3 point is being run (which kills 2 bricks from a replica pair) ,the rebalance is failing on multiple nodes which is expected behaviour as quorum is not met. Seeing following errors in rebalance logs: [2019-08-28 07:10:56.327552] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2/dir3/dir4/dir5 [2019-08-28 07:10:56.328646] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2/dir3/dir4 [2019-08-28 07:10:56.366236] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2/dir3 [2019-08-28 07:10:56.367188] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2 [2019-08-28 07:10:56.368057] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1 [2019-08-28 07:10:56.368153] W [MSGID: 114061] [client-common.c:3325:client_pre_readdirp_v2] 0-vol1-client-3: (721fbdd2-abca-4aab-bc58-ab979d19ea0a) remote_fd is -1. EBADFD [File descriptor in bad state] [2019-08-28 07:10:56.383551] I [MSGID: 109081] [dht-common.c:5849:dht_setxattr] 0-vol1-dht: fixing the layout of /dir1 [2019-08-28 07:10:56.388174] E [MSGID: 109119] [dht-lock.c:1084:dht_blocking_inodelk_cbk] 0-vol1-dht: inodelk failed on subvol vol1-replicate-0, gfid:721fbdd2-abca-4aab-bc58-ab979d19ea0a [Transport endpoint is not connected] [2019-08-28 07:10:56.388286] E [MSGID: 109016] [dht-rebalance.c:3944:gf_defrag_fix_layout] 0-vol1-dht: Setxattr failed for /dir1 [Transport endpoint is not connected] [2019-08-28 07:10:56.388342] I [dht-rebalance.c:3297:gf_defrag_process_dir] 0-vol1-dht: migrate data called on /dir1 [2019-08-28 07:10:56.409947] W [dht-rebalance.c:3452:gf_defrag_process_dir] 0-vol1-dht: Found error from gf_defrag_get_entry [2019-08-28 07:10:56.410907] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-vol1-dht: gf_defrag_process_dir failed for directory: /dir1 [2019-08-28 07:10:56.413810] E [MSGID: 101172] [events.c:89:_gf_event] 0-vol1-dht: inet_pton failed with return code 0 [Invalid argument] [2019-08-28 07:10:56.413952] I [MSGID: 109028] [dht-rebalance.c:5059:gf_defrag_status_get] 0-vol1-dht: Rebalance is failed. Time taken is 58.00 secs So,now is that script in point 3 supposed to stopped and then rebalance should be triggered or the reporter failed to add about the expected behaviour i.e. rebalance failures ?
The expected result in the description says "Heal should complete with no files in split-brain". For data & metadata heal to happen we need all 3 bricks to be up, and rebalance will also not succeed when the script keeps on disconnecting the bricks. So the script in point 3 should be stopped.
Created attachment 1611021 [details] Outputs required to move the bug to verified
Steps followed to test the scenario : 1. Create 2 1X3 replicate volumes # gluster v list emptyvol testvol_replicated 2. Write continous IO(All types of FOPs) 3. Executed a script that does the following. 1. gets a list of all bricks for a volume- testvol_replicated (3 bricks initially), 2. kills 2 bricks (b0, b1) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up. 3. kill 2 more bricks (b1, b2) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up using glusterd restart. 4. Repeat the steps 1,2 and 3 multiple times 4. Executed Add-brick to convert volume testvol_replicated to 2X3 5. The script keeps running and now gets a list of all 6bricks and kills 2 bricks at a time in loop. 5. Re-balance was executed and heal was triggered. Heals were completed with no file in pending and no split brain issues seen . The output has been attached in the bug on basis of which the bug has been moved to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249