Description of problem: ========================= When a file requires heal and there is a continuous IO happening, the heal never seems to get over, the problem is the heal happens again and again untill all IOs be it read or write is stopped.This problem will be more serious when IOs are going on , say appends The problem with this is as below: 1)same file requiring heal many times ,hence taking a very huge amount of time , ie in an ideal case a 1GB heal gets over in about 2 min, but there it can take hours 2) unnessary cpu cycles are spent on healing the same file again and again 1) started append to a file on 2x(4+2) volume from fuse client as below: dd if=/dev/urandom bs=1MB count=10000 >>ddfile2 2)getfattr of one of the bricks(all are healthy) (eager lock so dirty set to 1 as lock not getting released, due to no other request) # file: ddfile2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.bit-rot.version=0x020000000000000058b3f6720001a78c trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000010000000000000001 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000000 trusted.gfid=0x5637add23aba4f7a9c3b9535dd6639a3 3) brick size from one of the client before bringing b2 down [root@dhcp35-45 ecvol]# for i in {1..100};do du -sh ddfile2 ;echo "##########";ls -lh ddfile2 ;sleep 30;done 1.0G ddfile2 ########## -rw-r--r--. 2 root root 1012M Feb 27 18:46 ddfile2 4)brought down b2 5) ec attributes get updated periodically on all healthy bricks: # file: ddfile2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.bit-rot.version=0x020000000000000058b3f6720001a78c trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x000000000000000c000000000000000c trusted.ec.size=0x0000000119ded040 trusted.ec.version=0x00000000000093c800000000000093c8 trusted.gfid=0x5637add23aba4f7a9c3b9535dd6639a3 6)heal info as below: Every 2.0s: gluster v heal ecvol info Mon Feb 27 18:48:13 2017 Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick3/ecvol /ddfile2 Status: Connected Number of entries: 1 Brick dhcp35-130.lab.eng.blr.redhat.com:/rhs/brick3/ecvol Status: Transport endpoint is not connected Number of entries: - Brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/ecvol /ddfile2 Status: Connected Number of entries: 1 Brick dhcp35-23.lab.eng.blr.redhat.com:/rhs/brick3/ecvol /ddfile2 Status: Connected Number of entries: 1 Brick dhcp35-112.lab.eng.blr.redhat.com:/rhs/brick3/ecvol /ddfile2 Status: Connected Number of entries: 1 Brick dhcp35-138.lab.eng.blr.redhat.com:/rhs/brick3/ecvol /ddfile2 Status: Connected Number of entries: 1 brought back brick up: healthy brick xattr Every 2.0s: getfattr -d -m . -e hex ddfile2 Mon Feb 27 18:50:16 2017 # file: ddfile2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.bit-rot.version=0x020000000000000058b3f6720001a78c trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x000000000000000c0000000000000000 trusted.ec.size=0x000000012a05f200 trusted.ec.version=0x0000000000009c400000000000009c40 trusted.gfid=0x5637add23aba4f7a9c3b9535dd6639a3 now the IOs are still going on, if we check the the xattrs of both healthy brick and brick requiring heal, the METADATA healing completes immediately, but the data heal keeps happening again and again, untill all IOs to that file are stopped If we see the file size (using du -sh) the file seems to be getting healed again and again(the ls -lh shows the size in a consistent manner ie because the metadata heal is completed) =====brick requiring heal ==== (note , I reran the case for showing it to dev and in this case the file is testme, but the problem is seen consistently) ^[[A^[[B^[[B#########brick is down 256M testme ########## -rw-r--r--. 2 root root 248M Feb 27 18:56 testme 4.7M testme ########## -rw-r--r--. 2 root root 611M Feb 27 18:58 testme 477M testme ########## -rw-r--r--. 2 root root 705M Feb 27 18:59 testme 110M testme ########## -rw-r--r--. 2 root root 798M Feb 27 18:59 testme 552M testme ########## -rw-r--r--. 2 root root 891M Feb 27 19:00 testme 1019M testme ########## -rw-r--r--. 2 root root 985M Feb 27 19:00 testme 442M testme ########## -rw-r--r--. 2 root root 1.1G Feb 27 19:01 testme 899M testme ########## -rw-r--r--. 2 root root 1.2G Feb 27 19:01 testme 1.5G testme ########## -rw-r--r--. 2 root root 1.3G Feb 27 19:02 testme 458M testme ########## -rw-r--r--. 2 root root 1.4G Feb 27 19:02 testme 935M testme ########## -rw-r--r--. 2 root root 1.5G Feb 27 19:03 testme 1.6G testme ########## -rw-r--r--. 2 root root 1.6G Feb 27 19:03 testme 124M testme ########## -rw-r--r--. 2 root root 1.6G Feb 27 19:04 testme 558M testme ########## -rw-r--r--. 2 root root 1.7G Feb 27 19:04 testme 1.1G testme ########## -rw-r--r--. 2 root root 1.8G Feb 27 19:05 testme ^C =====healthy brick b1==== ^[[A^[[B^[[B#########brick is down 1.0G testme ########## -rw-r--r--. 2 root root 516M Feb 27 18:58 testme 1.0G testme ########## -rw-r--r--. 2 root root 611M Feb 27 18:58 testme 1.0G testme ########## -rw-r--r--. 2 root root 705M Feb 27 18:59 testme 1.0G testme ########## -rw-r--r--. 2 root root 798M Feb 27 18:59 testme 1.0G testme ########## -rw-r--r--. 2 root root 891M Feb 27 19:00 testme 1.0G testme ########## -rw-r--r--. 2 root root 985M Feb 27 19:00 testme 2.0G testme ########## -rw-r--r--. 2 root root 1.1G Feb 27 19:01 testme 2.0G testme ########## -rw-r--r--. 2 root root 1.2G Feb 27 19:01 testme 2.0G testme ########## -rw-r--r--. 2 root root 1.3G Feb 27 19:02 testme 2.0G testme ########## -rw-r--r--. 2 root root 1.4G Feb 27 19:02 testme 2.0G testme ########## -rw-r--r--. 2 root root 1.5G Feb 27 19:03 testme 2.0G testme ########## -rw-r--r--. 2 root root 1.6G Feb 27 19:03 testme 2.0G testme ########## now the append is complete, the xattrs show the size as different as below, eventhough data/metadata dirty bits are cleaned up
note that I used service glusterd restart to bring the brick online, inorder to avoid restart of all shds using start force
with very high probability, this seems to me like a regression introduced in b/w dev builds of 3.2 Also, note that this problem can miss the eye due to spurious heal info we have in ec https://bugzilla.redhat.com/show_bug.cgi?id=1347257#c9 https://bugzilla.redhat.com/show_bug.cgi?id=1347251#c7
upstream patch : https://review.gluster.org/#/c/16985/
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/108397/
on_qa validation: 1)in a 4+2, when we bring down one brick and bring back it online after some time,while append is happening, I don't see file getting recurssively healed again and again. The file gets healed completely , hence marking this case as PASS 2)seeing an issue, where file never completes healing(but no problem of recursively healing), for which i raised a bug#1475789 Moving this bug as verified as the main problem mentioned in description(also same as in case 1 here) is fixed version:3.8.4-35
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774