1715447 – Files in entry split-brain with "type mismatch"

Bug 1715447 - Files in entry split-brain with "type mismatch"

Summary: Files in entry split-brain with "type mismatch"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.5
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Karthik U S
QA Contact:	Mugdha Soni
Docs Contact:
URL:
Whiteboard:
Depends On:	1722507
Blocks:	1696809
TreeView+	depends on / blocked

Reported:	2019-05-30 12:01 UTC by Anees Patel
Modified:	2019-10-30 12:22 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-6.0-8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-30 12:21:50 UTC
Embargoed:
Dependent Products:
Flags:	rkavunga: needinfo- rkavunga: needinfo-

Attachments	(Terms of Use)
Outputs required to move the bug to verified (16.82 KB, text/plain) 2019-09-03 02:56 UTC, Mugdha Soni	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:3249	0	None	None	None	2019-10-30 12:22:07 UTC

Description Anees Patel 2019-05-30 12:01:04 UTC

Description of problem:

While doing systemic testing, Hit issues where multiple files are pending heal and few files are in split-brain,

Version-Release number of selected component (if applicable):
# rpm -qa | grep gluster
glusterfs-fuse-6.0-3.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-6.0-3.el7rhgs.x86_64
glusterfs-devel-6.0-3.el7rhgs.x86_64
glusterfs-cloudsync-plugins-6.0-3.el7rhgs.x86_64
glusterfs-client-xlators-6.0-3.el7rhgs.x86_64
glusterfs-server-6.0-3.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.9.x86_64
glusterfs-debuginfo-6.0-3.el7rhgs.x86_64
glusterfs-api-6.0-3.el7rhgs.x86_64
glusterfs-geo-replication-6.0-3.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
python2-gluster-6.0-3.el7rhgs.x86_64
glusterfs-libs-6.0-3.el7rhgs.x86_64
glusterfs-rdma-6.0-3.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
glusterfs-events-6.0-3.el7rhgs.x86_64
glusterfs-cli-6.0-3.el7rhgs.x86_64

How reproducible:

1/1

Steps to Reproduce:
1. Create 2 1X3 replicate volumes
# gluster v list
emptyvol
testvol_replicated

2. Write continous IO(All types of FOPs)
3. Executed a script that does the following.
1. gets a list of all bricks for a volume- testvol_replicated (3 bricks initially),
2. kills 2 bricks (b0, b1) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up.
3. kill 2 more bricks (b1, b2) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up using glusterd restart.
4. Repeat the steps 1,2 and 3 multiple times
4. Executed Add-brick to convert volume testvol_replicated to 2X3
5. The script keeps running and now gets a list of all 6bricks and kills 2 bricks at a time in loop.
5. Re-balance was executed and heal was triggerred

The test case was executed to catch any dirty xattr's set during brick disconnects to find any races.

Actual results:

Multiple files pending heal and multiple files in split-brain.
Also at 1 node,2 shd daemons have spun up,

Expected results:

Heal should complete with no files in split-brain.
Each node should have 1 shd daemon

Additional info:

Comment 2 Karthik U S 2019-05-30 12:09:27 UTC

Hi Anees,

Please provide the sos-reports and the volume status output.

Regards,
Karthik

Comment 18 Mugdha Soni 2019-08-28 10:52:39 UTC

The steps to reproduce mentioned are:
1. Create 2 1X3 replicate volumes 
# gluster v list
emptyvol
testvol_replicated

2. Write continous IO(All types of FOPs)
3. Executed a script that does the following.
      1. gets a list of all bricks for a volume- testvol_replicated (3 bricks initially),
      2. kills 2 bricks (b0, b1) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up.
      3. kill 2 more bricks (b1, b2) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up using glusterd restart.
      4. Repeat the steps 1,2 and 3 multiple times
4. Executed Add-brick to convert volume testvol_replicated to 2X3
5. The script keeps running and now gets a list of all 6bricks and kills 2 bricks at a time in loop.
5. Re-balance was executed and heal was triggerred



In step 5 where we need to perform rebalance ,as this step is performed with continous IO (crefi) and simultaneously the script in 3 point is being run (which kills 2 bricks from a replica pair) ,the rebalance is failing on multiple nodes which is expected behaviour as quorum is not met. 

Seeing following errors in rebalance logs:
[2019-08-28 07:10:56.327552] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2/dir3/dir4/dir5
[2019-08-28 07:10:56.328646] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2/dir3/dir4
[2019-08-28 07:10:56.366236] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2/dir3
[2019-08-28 07:10:56.367188] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1/dir2
[2019-08-28 07:10:56.368057] E [MSGID: 109016] [dht-rebalance.c:3910:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir1/dir1
[2019-08-28 07:10:56.368153] W [MSGID: 114061] [client-common.c:3325:client_pre_readdirp_v2] 0-vol1-client-3:  (721fbdd2-abca-4aab-bc58-ab979d19ea0a) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-08-28 07:10:56.383551] I [MSGID: 109081] [dht-common.c:5849:dht_setxattr] 0-vol1-dht: fixing the layout of /dir1
[2019-08-28 07:10:56.388174] E [MSGID: 109119] [dht-lock.c:1084:dht_blocking_inodelk_cbk] 0-vol1-dht: inodelk failed on subvol vol1-replicate-0, gfid:721fbdd2-abca-4aab-bc58-ab979d19ea0a [Transport endpoint is not connected]
[2019-08-28 07:10:56.388286] E [MSGID: 109016] [dht-rebalance.c:3944:gf_defrag_fix_layout] 0-vol1-dht: Setxattr failed for /dir1 [Transport endpoint is not connected]
[2019-08-28 07:10:56.388342] I [dht-rebalance.c:3297:gf_defrag_process_dir] 0-vol1-dht: migrate data called on /dir1
[2019-08-28 07:10:56.409947] W [dht-rebalance.c:3452:gf_defrag_process_dir] 0-vol1-dht: Found error from gf_defrag_get_entry
[2019-08-28 07:10:56.410907] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-vol1-dht: gf_defrag_process_dir failed for directory: /dir1
[2019-08-28 07:10:56.413810] E [MSGID: 101172] [events.c:89:_gf_event] 0-vol1-dht: inet_pton failed with return code 0 [Invalid argument]
[2019-08-28 07:10:56.413952] I [MSGID: 109028] [dht-rebalance.c:5059:gf_defrag_status_get] 0-vol1-dht: Rebalance is failed. Time taken is 58.00 secs

So,now is that script in point 3 supposed to stopped and then rebalance should be triggered or the reporter failed to add about the expected behaviour i.e. rebalance failures ?

Comment 19 Karthik U S 2019-08-28 11:15:29 UTC

The expected result in the description says "Heal should complete with no files in split-brain".
For data & metadata heal to happen we need all 3 bricks to be up, and rebalance will also not succeed when the script keeps on disconnecting the bricks. So the script in point 3 should be stopped.

Comment 20 Mugdha Soni 2019-09-03 02:56:20 UTC

Created attachment 1611021 [details]
Outputs required to move the bug to verified

Comment 21 Mugdha Soni 2019-09-03 02:57:04 UTC

Steps followed to test the scenario :
1. Create 2 1X3 replicate volumes 
# gluster v list
emptyvol
testvol_replicated

2. Write continous IO(All types of FOPs)
3. Executed a script that does the following.
      1. gets a list of all bricks for a volume- testvol_replicated (3 bricks initially),
      2. kills 2 bricks (b0, b1) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up.
      3. kill 2 more bricks (b1, b2) one after the other (with millisecond difference) and sleep for 3 seconds, brick back bricks up using glusterd restart.
      4. Repeat the steps 1,2 and 3 multiple times
4. Executed Add-brick to convert volume testvol_replicated to 2X3
5. The script keeps running and now gets a list of all 6bricks and kills 2 bricks at a time in loop.
5. Re-balance was executed and heal was triggered.

Heals were completed with no file in pending and no split brain issues seen .
The output has been attached in the bug on basis of which the bug has been moved to verified state.

Comment 23 errata-xmlrpc 2019-10-30 12:21:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249

Note You need to log in before you can comment on or make changes to this bug.