1427159 – possible repeatedly recursive healing of same file with background heal not happening when IO is going on

Bug 1427159 - possible repeatedly recursive healing of same file with background heal not happening when IO is going on

Summary: possible repeatedly recursive healing of same file with background heal not ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Ashish Pandey
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:	1428673 1459392
Blocks:	1417147
TreeView+	depends on / blocked

Reported:	2017-02-27 13:44 UTC by Nag Pavan Chilakam
Modified:	2017-09-21 04:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-28
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1428673 (view as bug list)
Environment:
Last Closed:	2017-09-21 04:33:25 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Nag Pavan Chilakam 2017-02-27 13:44:13 UTC

Description of problem:
=========================
When a file requires heal and there is a continuous IO happening, the heal never seems to get over, the problem is the heal happens again and again untill all IOs be it read or write is stopped.This problem will be more serious when IOs are going on , say appends
The problem with this is as below:
1)same file requiring heal many times ,hence taking a very huge amount of time , ie in an ideal case a 1GB heal gets over in about 2 min, but there it can take hours
2) unnessary cpu cycles are spent on healing the same file again and again



1) started append to a file on 2x(4+2) volume  from fuse client as below:
 dd if=/dev/urandom bs=1MB count=10000 >>ddfile2
2)getfattr of one of the bricks(all are healthy) (eager lock so dirty set to 1 as lock not getting released, due to no other request)
# file: ddfile2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.bit-rot.version=0x020000000000000058b3f6720001a78c
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000000
trusted.gfid=0x5637add23aba4f7a9c3b9535dd6639a3

3) brick size from one of the client before bringing b2 down
[root@dhcp35-45 ecvol]# for i in {1..100};do du -sh ddfile2 ;echo "##########";ls -lh ddfile2 ;sleep 30;done
1.0G	ddfile2
##########
-rw-r--r--. 2 root root 1012M Feb 27 18:46 ddfile2

4)brought down b2

5) ec attributes get updated periodically on all healthy bricks:

# file: ddfile2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.bit-rot.version=0x020000000000000058b3f6720001a78c
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x000000000000000c000000000000000c
trusted.ec.size=0x0000000119ded040
trusted.ec.version=0x00000000000093c800000000000093c8
trusted.gfid=0x5637add23aba4f7a9c3b9535dd6639a3



6)heal info as below:
Every 2.0s: gluster v heal ecvol info                                                         Mon Feb 27 18:48:13 2017

Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick3/ecvol
/ddfile2
Status: Connected
Number of entries: 1

Brick dhcp35-130.lab.eng.blr.redhat.com:/rhs/brick3/ecvol
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/ecvol
/ddfile2
Status: Connected
Number of entries: 1

Brick dhcp35-23.lab.eng.blr.redhat.com:/rhs/brick3/ecvol
/ddfile2
Status: Connected
Number of entries: 1

Brick dhcp35-112.lab.eng.blr.redhat.com:/rhs/brick3/ecvol
/ddfile2
Status: Connected
Number of entries: 1

Brick dhcp35-138.lab.eng.blr.redhat.com:/rhs/brick3/ecvol
/ddfile2
Status: Connected
Number of entries: 1




brought back brick up:
healthy brick xattr
Every 2.0s: getfattr -d -m . -e hex ddfile2                                                   Mon Feb 27 18:50:16 2017

# file: ddfile2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.bit-rot.version=0x020000000000000058b3f6720001a78c
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x000000000000000c0000000000000000
trusted.ec.size=0x000000012a05f200
trusted.ec.version=0x0000000000009c400000000000009c40
trusted.gfid=0x5637add23aba4f7a9c3b9535dd6639a3




now the IOs are still going on, if we check the the xattrs of both healthy brick and brick requiring heal, the METADATA healing completes immediately, but the data heal keeps happening again and again, untill all IOs to that file are stopped

If we see the file size (using du -sh) the file seems to be getting healed again and again(the ls -lh shows the size in a consistent manner ie because the metadata heal is completed)


=====brick requiring heal ====
(note , I reran the case for showing it to dev and in this case the file is testme, but the problem is seen consistently)
^[[A^[[B^[[B#########brick is down
256M	testme
##########
-rw-r--r--. 2 root root 248M Feb 27 18:56 testme
4.7M	testme
##########
-rw-r--r--. 2 root root 611M Feb 27 18:58 testme
477M	testme
##########
-rw-r--r--. 2 root root 705M Feb 27 18:59 testme
110M	testme
##########
-rw-r--r--. 2 root root 798M Feb 27 18:59 testme
552M	testme
##########
-rw-r--r--. 2 root root 891M Feb 27 19:00 testme
1019M	testme
##########
-rw-r--r--. 2 root root 985M Feb 27 19:00 testme
442M	testme
##########
-rw-r--r--. 2 root root 1.1G Feb 27 19:01 testme
899M	testme
##########
-rw-r--r--. 2 root root 1.2G Feb 27 19:01 testme
1.5G	testme
##########
-rw-r--r--. 2 root root 1.3G Feb 27 19:02 testme
458M	testme
##########
-rw-r--r--. 2 root root 1.4G Feb 27 19:02 testme
935M	testme
##########
-rw-r--r--. 2 root root 1.5G Feb 27 19:03 testme
1.6G	testme
##########
-rw-r--r--. 2 root root 1.6G Feb 27 19:03 testme
124M	testme
##########
-rw-r--r--. 2 root root 1.6G Feb 27 19:04 testme
558M	testme
##########
-rw-r--r--. 2 root root 1.7G Feb 27 19:04 testme
1.1G	testme
##########
-rw-r--r--. 2 root root 1.8G Feb 27 19:05 testme
^C



=====healthy brick b1====


^[[A^[[B^[[B#########brick is down
1.0G	testme
##########
-rw-r--r--. 2 root root 516M Feb 27 18:58 testme
1.0G	testme
##########
-rw-r--r--. 2 root root 611M Feb 27 18:58 testme
1.0G	testme
##########
-rw-r--r--. 2 root root 705M Feb 27 18:59 testme
1.0G	testme
##########
-rw-r--r--. 2 root root 798M Feb 27 18:59 testme
1.0G	testme
##########
-rw-r--r--. 2 root root 891M Feb 27 19:00 testme
1.0G	testme
##########
-rw-r--r--. 2 root root 985M Feb 27 19:00 testme
2.0G	testme
##########
-rw-r--r--. 2 root root 1.1G Feb 27 19:01 testme
2.0G	testme
##########
-rw-r--r--. 2 root root 1.2G Feb 27 19:01 testme
2.0G	testme
##########
-rw-r--r--. 2 root root 1.3G Feb 27 19:02 testme
2.0G	testme
##########
-rw-r--r--. 2 root root 1.4G Feb 27 19:02 testme
2.0G	testme
##########
-rw-r--r--. 2 root root 1.5G Feb 27 19:03 testme
2.0G	testme
##########
-rw-r--r--. 2 root root 1.6G Feb 27 19:03 testme
2.0G	testme
##########



now the append is complete, the xattrs show the size as different as below, eventhough data/metadata dirty bits are cleaned up

Comment 2 Nag Pavan Chilakam 2017-02-27 13:49:45 UTC

note that I used service glusterd restart to bring the brick online, inorder to avoid restart of all shds using start force

Comment 3 Nag Pavan Chilakam 2017-02-27 14:19:12 UTC

with very high probability, this seems to me like a regression introduced in b/w dev builds of 3.2 
Also, note that this problem can miss the eye due to spurious heal info we have in ec
https://bugzilla.redhat.com/show_bug.cgi?id=1347257#c9
 https://bugzilla.redhat.com/show_bug.cgi?id=1347251#c7

Comment 8 Atin Mukherjee 2017-05-08 05:35:51 UTC

upstream patch : https://review.gluster.org/#/c/16985/

Comment 11 Atin Mukherjee 2017-06-07 07:52:15 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/108397/

Comment 13 Nag Pavan Chilakam 2017-07-27 11:29:31 UTC

on_qa validation:
1)in a 4+2, when we bring down one brick and bring back it online after some time,while append is happening, I don't see file getting recurssively healed again and again. The file gets healed completely , hence marking this case as PASS
2)seeing an issue, where file never completes healing(but no problem of recursively healing), for which i raised a bug#1475789  

Moving this bug as verified as the main problem mentioned in description(also same as in case 1 here) is fixed


version:3.8.4-35

Comment 15 errata-xmlrpc 2017-09-21 04:33:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 16 errata-xmlrpc 2017-09-21 04:57:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.