+++ This bug was initially created as a clone of Bug #1739424 +++ +++ This bug was initially created as a clone of Bug #1727081 +++ Description of problem: LTP ftestxx tests reports data corruption at a 4+2 disperse volume. <<<test_output>>> ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 for val 2 count 487 xfr 2048 file_max 0xfa000. ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 ftest05 0 TINFO : Stat: size=fa000, ino=120399ba ftest05 0 TINFO : Buf: ftest05 0 TINFO : 64*0, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : ... more ftest05 0 TINFO : Bits array: ftest05 0 TINFO : 0: ftest05 0 TINFO : 0: ftest05 0 TINFO : ddx ftest05 0 TINFO : 8: ftest05 0 TINFO : ecx Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-07-05 00:06:33 UTC --- REVIEW: https://review.gluster.org/22999 (cluster/ec: inherit right mask from top parent) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-07-08 13:27:01 UTC --- REVIEW: https://review.gluster.org/23010 (cluster/ec: inherit healing from lock which has info) posted (#1) for review on master by Kinglong Mee --- Additional comment from Pranith Kumar K on 2019-07-10 10:30:39 UTC --- (In reply to Kinglong Mee from comment #0) > Description of problem: > > LTP ftestxx tests reports data corruption at a 4+2 disperse volume. > > <<<test_output>>> > ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 > for val 2 count 487 xfr 2048 file_max 0xfa000. > ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 > ftest05 0 TINFO : Stat: size=fa000, ino=120399ba > ftest05 0 TINFO : Buf: > ftest05 0 TINFO : 64*0, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : ... more > ftest05 0 TINFO : Bits array: > ftest05 0 TINFO : 0: > ftest05 0 TINFO : 0: > ftest05 0 TINFO : ddx > ftest05 0 TINFO : 8: > ftest05 0 TINFO : ecx When I try to run this test, it is choosing /tmp as the directory where the file is created. How to change it to the mount directory? root@localhost - /mnt/ec2 15:11:08 :( ⚡ /opt/ltp/testcases/bin/ftest05 ftest05 1 TPASS : Test passed. > > > Version-Release number of selected component (if applicable): > > > How reproducible: > > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: --- Additional comment from Kinglong Mee on 2019-07-10 12:47:01 UTC --- (In reply to Pranith Kumar K from comment #3) > (In reply to Kinglong Mee from comment #0) > > Description of problem: > > > > LTP ftestxx tests reports data corruption at a 4+2 disperse volume. > > > > <<<test_output>>> > > ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 > > for val 2 count 487 xfr 2048 file_max 0xfa000. > > ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 > > ftest05 0 TINFO : Stat: size=fa000, ino=120399ba > > ftest05 0 TINFO : Buf: > > ftest05 0 TINFO : 64*0, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : ... more > > ftest05 0 TINFO : Bits array: > > ftest05 0 TINFO : 0: > > ftest05 0 TINFO : 0: > > ftest05 0 TINFO : ddx > > ftest05 0 TINFO : 8: > > ftest05 0 TINFO : ecx > > > When I try to run this test, it is choosing /tmp as the directory where the > file is created. How to change it to the mount directory? > root@localhost - /mnt/ec2 > 15:11:08 :( ⚡ /opt/ltp/testcases/bin/ftest05 > ftest05 1 TPASS : Test passed. You can run as, ./runltp -p -l /tmp/resut.log -o /tmp/output.log -C /tmp/failed.log -d /mnt/nfs/ -f casefilename-under-runtest When running the test at nfs client, there is a bash scripts running which reboot one node(the cluster node Ganesha.nfsd is not running on) every 600s. --- Additional comment from Kinglong Mee on 2019-07-11 10:32:22 UTC --- valgrind reports some memory leak, ==7925== 300 bytes in 6 blocks are possibly lost in loss record 880 of 1,436 ==7925== at 0x4C29BC3: malloc (vg_replace_malloc.c:299) ==7925== by 0x71828BF: __gf_default_malloc (mem-pool.h:112) ==7925== by 0x7183182: __gf_malloc (mem-pool.c:131) ==7925== by 0x713FB65: gf_strndup (mem-pool.h:189) ==7925== by 0x713FBD5: gf_strdup (mem-pool.h:206) ==7925== by 0x7144465: loc_copy (xlator.c:1276) ==7925== by 0x18EDBF1C: ec_loc_from_loc (ec-helpers.c:760) ==7925== by 0x18F02FE5: ec_manager_open (ec-inode-read.c:778) ==7925== by 0x18EE4905: __ec_manager (ec-common.c:3094) ==7925== by 0x18EE4A0F: ec_manager (ec-common.c:3112) ==7925== by 0x18F037F3: ec_open (ec-inode-read.c:929) ==7925== by 0x18ED5E85: ec_gf_open (ec.c:1146) --- Additional comment from Worker Ant on 2019-07-11 11:05:58 UTC --- REVIEW: https://review.gluster.org/23029 (cluster/ec: do loc_copy from ctx->loc in fd->lock) posted (#1) for review on master by Kinglong Mee --- Additional comment from Kinglong Mee on 2019-07-12 00:47:27 UTC --- ganesha.nfsd crash when healing name, Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N N'. Program terminated with signal 11, Segmentation fault. #0 0x00007f0d5ae8c5a9 in ec_heal_name (frame=0x7f0d57c6ca28, ec=0x7f0d5b62d280, parent=0x0, name=0x7f0d57537d31 "b", participants=0x7f0d0dfffe30 "\001\001\001") at ec-heal.c:1685 1685 loc.inode = inode_new(parent->table); Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 dbus-libs-1.10.24-12.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 gssproxy-0.7.0-21.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-59.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 lz4-1.7.5-2.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007f0d5ae8c5a9 in ec_heal_name (frame=0x7f0d57c6ca28, ec=0x7f0d5b62d280, parent=0x0, name=0x7f0d57537d31 "b", participants=0x7f0d0dfffe30 "\001\001\001") at ec-heal.c:1685 #1 0x00007f0d5ae93cae in ec_heal_do (this=0x7f0d5b65ac00, data=0x7f0d24e3c028, loc=0x7f0d24e3c358, partial=0) at ec-heal.c:3050 #2 0x00007f0d5ae94455 in ec_synctask_heal_wrap (opaque=0x7f0d24e3c028) at ec-heal.c:3139 #3 0x00007f0d6d1268c9 in synctask_wrap () at syncop.c:369 #4 0x00007f0d6c6bf010 in ?? () from /lib64/libc.so.6 #5 0x0000000000000000 in ?? () (gdb) frame 1 #1 0x00007f0d5ae93cae in ec_heal_do (this=0x7f0d5b65ac00, data=0x7f0d24e3c028, loc=0x7f0d24e3c358, partial=0) at ec-heal.c:3050 3050 ret = ec_heal_name(frame, ec, loc->parent, (char *)loc->name, (gdb) p loc $1 = (loc_t *) 0x7f0d24e3c358 (gdb) p *loc $2 = { path = 0x7f0d57537d00 "/nfsshare/ltp-eZQlnozjnX/ftegVRmbT/ftest05.20436/b", name = 0x7f0d57537d31 "b", inode = 0x7f0d24255b28, parent = 0x0, gfid = "\263\341\223\031\301\245I\260\234\334\017\to%\305^", pargfid = '\000' <repeats 15 times>} --- Additional comment from Xavi Hernandez on 2019-07-13 14:09:15 UTC --- Please, don't use the same bug for different issues. --- Additional comment from Worker Ant on 2019-07-14 13:03:20 UTC --- REVISION POSTED: https://review.gluster.org/23029 (cluster/ec: do loc_copy from ctx->loc in fd->lock) posted (#2) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-07-16 17:54:25 UTC --- REVIEW: https://review.gluster.org/23010 (cluster/ec: inherit healing from lock when it has info) merged (#4) on master by Amar Tumballi --- Additional comment from Ashish Pandey on 2019-07-17 05:27:44 UTC --- There are two patches associated with this BZ - https://review.gluster.org/#/c/glusterfs/+/22999/ - No merged and under review https://review.gluster.org/#/c/glusterfs/+/23010/ - Merged I would like to keep this bug open till both the patches get merged. -- Ashish --- Additional comment from Worker Ant on 2019-07-18 07:28:12 UTC --- REVIEW: https://review.gluster.org/23069 ((WIP)cluster/ec: Always read from good-mask) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-23 06:20:08 UTC --- REVIEW: https://review.gluster.org/23073 (cluster/ec: fix data corruption) posted (#4) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-26 07:11:59 UTC --- REVIEW: https://review.gluster.org/23069 (cluster/ec: Always read from good-mask) merged (#6) on master by Pranith Kumar Karampuri --- Additional comment from Pranith Kumar K on 2019-08-02 07:35:34 UTC --- Found one case which needs to be fixed. --- Additional comment from Worker Ant on 2019-08-02 07:38:12 UTC --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-07 06:15:15 UTC --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) merged (#2) on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:07:23 UTC --- REVIEW: https://review.gluster.org/23188 (cluster/ec: inherit healing from lock when it has info) posted (#1) for review on release-7 by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:09:31 UTC --- REVIEW: https://review.gluster.org/23190 (cluster/ec: Always read from good-mask) posted (#1) for review on release-7 by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:11:40 UTC --- REVIEW: https://review.gluster.org/23192 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on release-7 by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23200 (cluster/ec: inherit healing from lock when it has info) posted (#1) for review on release-6 by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23201 (cluster/ec: Always read from good-mask) posted (#1) for review on release-6 by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23203 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on release-6 by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23200 (cluster/ec: inherit healing from lock when it has info) merged (#2) on release-6 by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23201 (cluster/ec: Always read from good-mask) merged (#2) on release-6 by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23203 (cluster/ec: Update lock->good_mask on parent fop failure) merged (#2) on release-6 by Pranith Kumar Karampuri