Description of problem: If you can, try to following steps on your test system: - create test dataset using 'uuidgen' command' - enable bitrot on the volume - change a single file in the dataset that was generated before bitrot=on - try running grep on all dataset. These are my settings for 'bitrot' # gluster v get memori1 all|grep bitrot features.bitrot on # gluster v get memori1 all|grep scrub features.scrub-throttle lazy features.scrub-freq biweekly features.scrub Active Scrub is active, but files count is '0' on all nodes. I started manual scrub 'gluster v bitrot memori1 scrub ondemand'. It's finished (but files still don't have 'bitrot' attributes) (I'm not sure why some files were skipped) (getfattr -d -m . -e hex on the brick data still doesn't show 'trusted.bit-rot*' attributes after scrub completed) Version-Release number of selected component (if applicable): RHGS 3.3 How reproducible: Its not reproducible for me, but almost every time for the customer Steps to Reproduce: 1. for i in $(seq 40); do uuidgen | awk {'print "mkdir "$1"; echo test >> "$1"/"$1".meta"'}; done | sh 2. grep test */*.meta 3. <crashes> Actual results: [2017-10-09 14:47:38.652652] W [MSGID: 122033] [ec-common.c:1542:ec_locked] 0-memori1-disperse-0: Failed to complete preop lock [Stale file handle] [2017-10-09 14:47:38.822968] W [MSGID: 114031] [client-rpc-fops.c:2211:client3_3_seek_cbk] 0-memori1-client-2: remote operation failed [No such device or address] The message "W [MSGID: 122033] [ec-common.c:1542:ec_locked] 0-memori1-disperse-0: Failed to complete preop lock [Stale file handle]" repeated 21 times between [2017-10-09 14:47:38.652652] and [2017-10-09 14:47:38.813231] pending frames: frame : type(1) op(SEEK) frame : type(1) op(SEEK) frame : type(1) op(READ) frame : type(1) op(OPEN) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(FSTAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(FSTAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-10-09 14:47:38 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f982ca157d2] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f982ca1f304] /lib64/libc.so.6(+0x35270)[0x7f982b07e270] /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x25704)[0x7f98253b8704] /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb)[0x7f98253a09bb] /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98)[0x7f98253a0b98] /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdcbf)[0x7f98253a0cbf] /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2240b)[0x7f98253b540b] /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x1ec97)[0x7f982561cc97] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f982c7de840] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7f982c7deb27] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f982c7da9e3] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x73d6)[0x7f98279043d6] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x997c)[0x7f982790697c] /lib64/libglusterfs.so.0(+0x851e6)[0x7f982ca701e6] /lib64/libpthread.so.0(+0x7e25)[0x7f982b874e25] /lib64/libc.so.6(clone+0x6d)[0x7f982b14134d] --------- Expected results: You shouldn't have to. Did you update the OS as well? Let me try the repro again, can you confirm this is correct for me: [root@dell-r730-4 gluster-mount]# rm -rf ./* [root@dell-r730-4 gluster-mount]# for i in $(seq 40); do uuidgen | awk {'print "mkdir "$1"; echo test >> "$1"/"$1".meta"'}; done | sh [root@dell-r730-4 gluster-mount]# grep test */*.meta 01293de4-5c8e-41e7-bed2-dbab695d12ce/01293de4-5c8e-41e7-bed2-dbab695d12ce.meta:test 0723f1c1-c019-4dc9-bc70-7e14eb3a2769/0723f1c1-c019-4dc9-bc70-7e14eb3a2769.meta:test 080907ca-578a-4555-b418-4b2bc58872c0/080907ca-578a-4555-b418-4b2bc58872c0.meta:test 0ce2b70f-7960-4a7f-9286-a402bf1e97e0/0ce2b70f-7960-4a7f-9286-a402bf1e97e0.meta:test 0cea6628-ce0e-4d92-8869-feded3f0faeb/0cea6628-ce0e-4d92-8869-feded3f0faeb.meta:test 1a3cecd7-edbb-4e07-90e6-cc6740314063/1a3cecd7-edbb-4e07-90e6-cc6740314063.meta:test 1a49b76f-d658-4b3f-a10f-fa9044557a68/1a49b76f-d658-4b3f-a10f-fa9044557a68.meta:test 1b2a9b35-2f10-4b13-84cb-bbcd3d6b6ede/1b2a9b35-2f10-4b13-84cb-bbcd3d6b6ede.meta:test 23f5de3d-555e-4cd7-8e43-72e677be9ff1/23f5de3d-555e-4cd7-8e43-72e677be9ff1.meta:test 24ff38f0-cf1f-4f3e-b7fd-de3b731be99a/24ff38f0-cf1f-4f3e-b7fd-de3b731be99a.meta:test 300d4942-a1c6-474f-a574-72883ad75437/300d4942-a1c6-474f-a574-72883ad75437.meta:test 3239a15d-e99c-4261-a181-e8af3f409be4/3239a15d-e99c-4261-a181-e8af3f409be4.meta:test 36f29a83-290b-4b14-848d-7d781f04bdb2/36f29a83-290b-4b14-848d-7d781f04bdb2.meta:test 44ffd4de-8d72-4fcf-94db-029cfc05acc8/44ffd4de-8d72-4fcf-94db-029cfc05acc8.meta:test 459d926e-3292-481a-ba19-09ba5673b15e/459d926e-3292-481a-ba19-09ba5673b15e.meta:test 4b23847a-34b1-443e-a460-976b6b536c65/4b23847a-34b1-443e-a460-976b6b536c65.meta:test 59fbfe5c-515a-47ed-bdfb-cf1bd5f99037/59fbfe5c-515a-47ed-bdfb-cf1bd5f99037.meta:test 5a97c5ed-a738-4886-bfa6-e9f3cc997203/5a97c5ed-a738-4886-bfa6-e9f3cc997203.meta:test 6246be8b-ce30-4d2c-8fbf-7aeab368d8c5/6246be8b-ce30-4d2c-8fbf-7aeab368d8c5.meta:test 626e0c97-ed67-4aee-91e5-11709f3627d8/626e0c97-ed67-4aee-91e5-11709f3627d8.meta:test 7428fce2-07c3-4cf2-8d3f-f53b9a63cba7/7428fce2-07c3-4cf2-8d3f-f53b9a63cba7.meta:test 7540b726-e36b-462a-adb9-e72c6737e466/7540b726-e36b-462a-adb9-e72c6737e466.meta:test 8dfd2a6e-ff67-4558-87ee-727307be9cb7/8dfd2a6e-ff67-4558-87ee-727307be9cb7.meta:test 9330ae14-1906-4617-8d9e-817eb6ac7203/9330ae14-1906-4617-8d9e-817eb6ac7203.meta:test a4145620-0f30-454e-a54c-f30b0ea60f81/a4145620-0f30-454e-a54c-f30b0ea60f81.meta:test aacdc304-650f-43c6-a7b0-6b2077d0c2ba/aacdc304-650f-43c6-a7b0-6b2077d0c2ba.meta:test aacf83ad-b641-4381-9baf-42eaf72ede09/aacf83ad-b641-4381-9baf-42eaf72ede09.meta:test ad3c3114-5d9a-4588-88bb-54e0572918a0/ad3c3114-5d9a-4588-88bb-54e0572918a0.meta:test b7d1626f-00fa-4d22-bb69-9b1598443838/b7d1626f-00fa-4d22-bb69-9b1598443838.meta:test be7062d5-1462-4e38-812b-e4184e31891d/be7062d5-1462-4e38-812b-e4184e31891d.meta:test c2c4b900-d9a2-417e-92bd-b9b55a591de4/c2c4b900-d9a2-417e-92bd-b9b55a591de4.meta:test ccce75e6-1079-49d1-91e6-f184de47578a/ccce75e6-1079-49d1-91e6-f184de47578a.meta:test cd3e9f67-4622-446d-ad8b-cf3928e7fc2b/cd3e9f67-4622-446d-ad8b-cf3928e7fc2b.meta:test cd9e3d9e-6a02-4579-acb6-258bfc244f01/cd9e3d9e-6a02-4579-acb6-258bfc244f01.meta:test ceccd07e-26dc-4275-a605-e5a292192277/ceccd07e-26dc-4275-a605-e5a292192277.meta:test d15332a0-f08f-4548-b43d-441b9823d231/d15332a0-f08f-4548-b43d-441b9823d231.meta:test d8c02b45-987a-4fb4-8ab6-140c1464c180/d8c02b45-987a-4fb4-8ab6-140c1464c180.meta:test de52e9e6-b755-4920-b095-c4ef71c22ae6/de52e9e6-b755-4920-b095-c4ef71c22ae6.meta:test e2df5921-f8be-4044-98ae-8455c9bfeaa0/e2df5921-f8be-4044-98ae-8455c9bfeaa0.meta:test ea033d20-f321-4566-bab0-e03e8fed7d81/ea033d20-f321-4566-bab0-e03e8fed7d81.meta:test Additional info:
Some info, here is the env I tried to repro on: [root@dell-r730-3 ~]# gluster v info ecvol2 Volume Name: ecvol2 Type: Distributed-Disperse Volume ID: fd620567-4381-406b-a3ef-d8fdf824b358 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (8 + 4) = 36 Transport-type: tcp Bricks: Brick1: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-1 Brick2: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-1 Brick3: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-1 Brick4: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-2 Brick5: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-2 Brick6: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-2 Brick7: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-3 Brick8: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-3 Brick9: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-3 Brick10: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-4 Brick11: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-4 Brick12: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-4 Brick13: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-5 Brick14: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-5 Brick15: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-5 Brick16: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-6 Brick17: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-6 Brick18: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-6 Brick19: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-7 Brick20: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-7 Brick21: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-7 Brick22: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-8 Brick23: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-8 Brick24: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-8 Brick25: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-9 Brick26: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-9 Brick27: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-9 Brick28: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-10 Brick29: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-10 Brick30: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-10 Brick31: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-11 Brick32: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-11 Brick33: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-11 Brick34: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-12 Brick35: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-12 Brick36: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-12 Options Reconfigured: server.allow-insecure: on user.cifs: off network.remote-dio: on network.ping-timeout: 30 performance.strict-o-direct: on performance.io-thread-count: 64 performance.cache-size: 256MB performance.read-ahead: off performance.client-io-threads: on performance.write-behind-window-size: 1MB network.inode-lru-limit: 90000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on cluster.locking-scheme: granular cluster.readdir-optimize: on cluster.lookup-optimize: on cluster.server-quorum-type: server features.scrub: pause features.bitrot: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet nfs.disable: on Here is the customer's environment: Volume Name: memori1 Type: Distributed-Disperse Volume ID: b870f631-3a15-45c1-830e-a05849084b6d Status: Started Snapshot Count: 0 Number of Bricks: 19 x (8 + 4) = 228 Transport-type: tcp Bricks: Brick1: MMR01:/rhgs/b0/memori1 Brick2: MMR02:/rhgs/b0/memori1 Brick3: MMR03:/rhgs/b0/memori1 Brick4: MMR04:/rhgs/b0/memori1 Brick5: MMR05:/rhgs/b0/memori1 Brick6: MMR06:/rhgs/b0/memori1 Brick7: MMR07:/rhgs/b0/memori1 Brick8: MMR08:/rhgs/b0/memori1 Brick9: MMR09:/rhgs/b0/memori1 Brick10: MMR10:/rhgs/b0/memori1 Brick11: MMR11:/rhgs/b0/memori1 Brick12: MMR12:/rhgs/b0/memori1 Brick13: MMR01:/rhgs/b1/memori1 Brick14: MMR02:/rhgs/b1/memori1 Brick15: MMR03:/rhgs/b1/memori1 Brick16: MMR04:/rhgs/b1/memori1 Brick17: MMR05:/rhgs/b1/memori1 Brick18: MMR06:/rhgs/b1/memori1 Brick19: MMR07:/rhgs/b1/memori1 Brick20: MMR08:/rhgs/b1/memori1 Brick21: MMR09:/rhgs/b1/memori1 Brick22: MMR10:/rhgs/b1/memori1 Brick23: MMR11:/rhgs/b1/memori1 Brick24: MMR12:/rhgs/b1/memori1 Brick25: MMR01:/rhgs/b2/memori1 Brick26: MMR02:/rhgs/b2/memori1 Brick27: MMR03:/rhgs/b2/memori1 Brick28: MMR04:/rhgs/b2/memori1 Brick29: MMR05:/rhgs/b2/memori1 Brick30: MMR06:/rhgs/b2/memori1 Brick31: MMR07:/rhgs/b2/memori1 Brick32: MMR08:/rhgs/b2/memori1 Brick33: MMR09:/rhgs/b2/memori1 Brick34: MMR10:/rhgs/b2/memori1 Brick35: MMR11:/rhgs/b2/memori1 Brick36: MMR12:/rhgs/b2/memori1 Brick37: MMR01:/rhgs/b3/memori1 Brick38: MMR02:/rhgs/b3/memori1 Brick39: MMR03:/rhgs/b3/memori1 Brick40: MMR04:/rhgs/b3/memori1 Brick41: MMR05:/rhgs/b3/memori1 Brick42: MMR06:/rhgs/b3/memori1 Brick43: MMR07:/rhgs/b3/memori1 Brick44: MMR08:/rhgs/b3/memori1 Brick45: MMR09:/rhgs/b3/memori1 Brick46: MMR10:/rhgs/b3/memori1 Brick47: MMR11:/rhgs/b3/memori1 Brick48: MMR12:/rhgs/b3/memori1 Brick49: MMR01:/rhgs/b4/memori1 Brick50: MMR02:/rhgs/b4/memori1 Brick51: MMR03:/rhgs/b4/memori1 Brick52: MMR04:/rhgs/b4/memori1 Brick53: MMR05:/rhgs/b4/memori1 Brick54: MMR06:/rhgs/b4/memori1 Brick55: MMR07:/rhgs/b4/memori1 Brick56: MMR08:/rhgs/b4/memori1 Brick57: MMR09:/rhgs/b4/memori1 Brick58: MMR10:/rhgs/b4/memori1 Brick59: MMR11:/rhgs/b4/memori1 Brick60: MMR12:/rhgs/b4/memori1 Brick61: MMR01:/rhgs/b5/memori1 Brick62: MMR02:/rhgs/b5/memori1 Brick63: MMR03:/rhgs/b5/memori1 Brick64: MMR04:/rhgs/b5/memori1 Brick65: MMR05:/rhgs/b5/memori1 Brick66: MMR06:/rhgs/b5/memori1 Brick67: MMR07:/rhgs/b5/memori1 Brick68: MMR08:/rhgs/b5/memori1 Brick69: MMR09:/rhgs/b5/memori1 Brick70: MMR10:/rhgs/b5/memori1 Brick71: MMR11:/rhgs/b5/memori1 Brick72: MMR12:/rhgs/b5/memori1 Brick73: MMR01:/rhgs/b6/memori1 Brick74: MMR02:/rhgs/b6/memori1 Brick75: MMR03:/rhgs/b6/memori1 Brick76: MMR04:/rhgs/b6/memori1 Brick77: MMR05:/rhgs/b6/memori1 Brick78: MMR06:/rhgs/b6/memori1 Brick79: MMR07:/rhgs/b6/memori1 Brick80: MMR08:/rhgs/b6/memori1 Brick81: MMR09:/rhgs/b6/memori1 Brick82: MMR10:/rhgs/b6/memori1 Brick83: MMR11:/rhgs/b6/memori1 Brick84: MMR12:/rhgs/b6/memori1 Brick85: MMR01:/rhgs/b7/memori1 Brick86: MMR02:/rhgs/b7/memori1 Brick87: MMR03:/rhgs/b7/memori1 Brick88: MMR04:/rhgs/b7/memori1 Brick89: MMR05:/rhgs/b7/memori1 Brick90: MMR06:/rhgs/b7/memori1 Brick91: MMR07:/rhgs/b7/memori1 Brick92: MMR08:/rhgs/b7/memori1 Brick93: MMR09:/rhgs/b7/memori1 Brick94: MMR10:/rhgs/b7/memori1 Brick95: MMR11:/rhgs/b7/memori1 Brick96: MMR12:/rhgs/b7/memori1 Brick97: MMR01:/rhgs/b8/memori1 Brick98: MMR02:/rhgs/b8/memori1 Brick99: MMR03:/rhgs/b8/memori1 Brick100: MMR04:/rhgs/b8/memori1 Brick101: MMR05:/rhgs/b8/memori1 Brick102: MMR06:/rhgs/b8/memori1 Brick103: MMR07:/rhgs/b8/memori1 Brick104: MMR08:/rhgs/b8/memori1 Brick105: MMR09:/rhgs/b8/memori1 Brick106: MMR10:/rhgs/b8/memori1 Brick107: MMR11:/rhgs/b8/memori1 Brick108: MMR12:/rhgs/b8/memori1 Brick109: MMR01:/rhgs/b9/memori1 Brick110: MMR02:/rhgs/b9/memori1 Brick111: MMR03:/rhgs/b9/memori1 Brick112: MMR04:/rhgs/b9/memori1 Brick113: MMR05:/rhgs/b9/memori1 Brick114: MMR06:/rhgs/b9/memori1 Brick115: MMR07:/rhgs/b9/memori1 Brick116: MMR08:/rhgs/b9/memori1 Brick117: MMR09:/rhgs/b9/memori1 Brick118: MMR10:/rhgs/b9/memori1 Brick119: MMR11:/rhgs/b9/memori1 Brick120: MMR12:/rhgs/b9/memori1 Brick121: MMR01:/rhgs/b10/memori1 Brick122: MMR02:/rhgs/b10/memori1 Brick123: MMR03:/rhgs/b10/memori1 Brick124: MMR04:/rhgs/b10/memori1 Brick125: MMR05:/rhgs/b10/memori1 Brick126: MMR06:/rhgs/b10/memori1 Brick127: MMR07:/rhgs/b10/memori1 Brick128: MMR08:/rhgs/b10/memori1 Brick129: MMR09:/rhgs/b10/memori1 Brick130: MMR10:/rhgs/b10/memori1 Brick131: MMR11:/rhgs/b10/memori1 Brick132: MMR12:/rhgs/b10/memori1 Brick133: MMR01:/rhgs/b11/memori1 Brick134: MMR02:/rhgs/b11/memori1 Brick135: MMR03:/rhgs/b11/memori1 Brick136: MMR04:/rhgs/b11/memori1 Brick137: MMR05:/rhgs/b11/memori1 Brick138: MMR06:/rhgs/b11/memori1 Brick139: MMR07:/rhgs/b11/memori1 Brick140: MMR08:/rhgs/b11/memori1 Brick141: MMR09:/rhgs/b11/memori1 Brick142: MMR10:/rhgs/b11/memori1 Brick143: MMR11:/rhgs/b11/memori1 Brick144: MMR12:/rhgs/b11/memori1 Brick145: MMR01:/rhgs/b12/memori1 Brick146: MMR02:/rhgs/b12/memori1 Brick147: MMR03:/rhgs/b12/memori1 Brick148: MMR04:/rhgs/b12/memori1 Brick149: MMR05:/rhgs/b12/memori1 Brick150: MMR06:/rhgs/b12/memori1 Brick151: MMR07:/rhgs/b12/memori1 Brick152: MMR08:/rhgs/b12/memori1 Brick153: MMR09:/rhgs/b12/memori1 Brick154: MMR10:/rhgs/b12/memori1 Brick155: MMR11:/rhgs/b12/memori1 Brick156: MMR12:/rhgs/b12/memori1 Brick157: MMR01:/rhgs/b13/memori1 Brick158: MMR02:/rhgs/b13/memori1 Brick159: MMR03:/rhgs/b13/memori1 Brick160: MMR04:/rhgs/b13/memori1 Brick161: MMR05:/rhgs/b13/memori1 Brick162: MMR06:/rhgs/b13/memori1 Brick163: MMR07:/rhgs/b13/memori1 Brick164: MMR08:/rhgs/b13/memori1 Brick165: MMR09:/rhgs/b13/memori1 Brick166: MMR10:/rhgs/b13/memori1 Brick167: MMR11:/rhgs/b13/memori1 Brick168: MMR12:/rhgs/b13/memori1 Brick169: MMR01:/rhgs/b14/memori1 Brick170: MMR02:/rhgs/b14/memori1 Brick171: MMR03:/rhgs/b14/memori1 Brick172: MMR04:/rhgs/b14/memori1 Brick173: MMR05:/rhgs/b14/memori1 Brick174: MMR06:/rhgs/b14/memori1 Brick175: MMR07:/rhgs/b14/memori1 Brick176: MMR08:/rhgs/b14/memori1 Brick177: MMR09:/rhgs/b14/memori1 Brick178: MMR10:/rhgs/b14/memori1 Brick179: MMR11:/rhgs/b14/memori1 Brick180: MMR12:/rhgs/b14/memori1 Brick181: MMR01:/rhgs/b15/memori1 Brick182: MMR02:/rhgs/b15/memori1 Brick183: MMR03:/rhgs/b15/memori1 Brick184: MMR04:/rhgs/b15/memori1 Brick185: MMR05:/rhgs/b15/memori1 Brick186: MMR06:/rhgs/b15/memori1 Brick187: MMR07:/rhgs/b15/memori1 Brick188: MMR08:/rhgs/b15/memori1 Brick189: MMR09:/rhgs/b15/memori1 Brick190: MMR10:/rhgs/b15/memori1 Brick191: MMR11:/rhgs/b15/memori1 Brick192: MMR12:/rhgs/b15/memori1 Brick193: MMR01:/rhgs/b16/memori1 Brick194: MMR02:/rhgs/b16/memori1 Brick195: MMR03:/rhgs/b16/memori1 Brick196: MMR04:/rhgs/b16/memori1 Brick197: MMR05:/rhgs/b16/memori1 Brick198: MMR06:/rhgs/b16/memori1 Brick199: MMR07:/rhgs/b16/memori1 Brick200: MMR08:/rhgs/b16/memori1 Brick201: MMR09:/rhgs/b16/memori1 Brick202: MMR10:/rhgs/b16/memori1 Brick203: MMR11:/rhgs/b16/memori1 Brick204: MMR12:/rhgs/b16/memori1 Brick205: MMR01:/rhgs/b17/memori1 Brick206: MMR02:/rhgs/b17/memori1 Brick207: MMR03:/rhgs/b17/memori1 Brick208: MMR04:/rhgs/b17/memori1 Brick209: MMR05:/rhgs/b17/memori1 Brick210: MMR06:/rhgs/b17/memori1 Brick211: MMR07:/rhgs/b17/memori1 Brick212: MMR08:/rhgs/b17/memori1 Brick213: MMR09:/rhgs/b17/memori1 Brick214: MMR10:/rhgs/b17/memori1 Brick215: MMR11:/rhgs/b17/memori1 Brick216: MMR12:/rhgs/b17/memori1 Brick217: MMR01:/rhgs/b18/memori1 Brick218: MMR02:/rhgs/b18/memori1 Brick219: MMR03:/rhgs/b18/memori1 Brick220: MMR04:/rhgs/b18/memori1 Brick221: MMR05:/rhgs/b18/memori1 Brick222: MMR06:/rhgs/b18/memori1 Brick223: MMR07:/rhgs/b18/memori1 Brick224: MMR08:/rhgs/b18/memori1 Brick225: MMR09:/rhgs/b18/memori1 Brick226: MMR10:/rhgs/b18/memori1 Brick227: MMR11:/rhgs/b18/memori1 Brick228: MMR12:/rhgs/b18/memori1 Options Reconfigured: cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable performance.low-prio-threads: 32 performance.io-cache: off performance.quick-read: off cluster.tier-demote-frequency: 3600 nfs.disable: on performance.readdir-ahead: enable transport.address-family: inet client.event-threads: 32 server.event-threads: 32 cluster.lookup-optimize: on cluster.readdir-optimize: on cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.data-self-heal-algorithm: full features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: off performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 90000 performance.write-behind-window-size: 1MB performance.client-io-threads: on performance.read-ahead: off performance.cache-size: 256MB performance.io-thread-count: 64 performance.strict-o-direct: on network.ping-timeout: 30 network.remote-dio: enable user.cifs: off diagnostics.client-log-level: WARNING features.quota: off features.inode-quota: off server.allow-insecure: on cluster.watermark-low: 1 cluster.watermark-hi: 1 diagnostics.brick-sys-log-level: INFO diagnostics.brick-log-level: INFO cluster.server-quorum-ratio: 51%
Some observations from Dmitri: I'm taking a closer look at files w/o bitrot attributes: These files have 'trusted.glusterfs.dht.linkto', the location in the attribute is empty. Is it normal? I'm looking in a wrong place? # getfattr -d -m . -e hex /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta getfattr: Removing leading '/' from absolute path names # file: rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta trusted.ec.config=0x0000080c04000200 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000000 trusted.gfid=0x38229d9705b442658af0ffbb37051074 trusted.glusterfs.dht.linkto=0x6d656d6f7269312d64697370657273652d3800 (check location in trusted.gfid, looks OK): [root@MMR01 ~]# find /rhgs/b0/memori1/.glusterfs/38/22/ /rhgs/b0/memori1/.glusterfs/38/22/ /rhgs/b0/memori1/.glusterfs/38/22/38229d97-05b4-4265-8af0-ffbb37051074 (check location in 'trusted.glusterfs.dht.linkto', no files in the dir): [root@MMR01 ~]# find /rhgs/b0/memori1/.glusterfs/6d/65 /rhgs/b0/memori1/.glusterfs/6d/65 I'm seeing a pattern here: All 'broken' files seems to be stored twice on nodes. I.e. Client crashes when running: # grep test f33fa16e-7c15-4b91-af70-cc8d3133bdd4/*.meta When I search for the file on bricks, it's stored on each server on two different bricks: $ ansible memori-pool1 -m shell -a 'find /rhgs/*/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4 -name '*.meta'' MMR03.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR01.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR04.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR02.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR05.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR07.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR06.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR08.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR09.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR10.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR11.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta MMR12.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta /rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta The following is example for a 'good' file (i.e. grep doesn't crash): $ ansible memori-pool1 -m shell -a 'find /rhgs/*/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3 -name '*.meta'' MMR02.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR01.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR04.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR06.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR07.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR03.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR05.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR08.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR10.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR09.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR11.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta MMR12.ORC | SUCCESS | rc=0 >> /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta
Here is the BT: \warning: core file may not match specified executable file. [New LWP 4755] [New LWP 4750] [New LWP 4749] [New LWP 4748] [New LWP 4760] [New LWP 4751] [New LWP 4757] [New LWP 4756] [New LWP 4752] [New LWP 4753] [New LWP 4761] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --volfile-server=MMR04 --volfile-id=/memori2 /mnt/gluster/m'. Program terminated with signal 11, Segmentation fault. #0 0x00007f480747d704 in ec_manager_seek () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.8.4-44.el7rhgs.x86_64 (gdb) bt #0 0x00007f480747d704 in ec_manager_seek () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so #1 0x00007f48074659bb in __ec_manager () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so #2 0x00007f4807465b98 in ec_resume () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so #3 0x00007f4807465cbf in ec_complete () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so #4 0x00007f480747a40b in ec_seek_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so #5 0x00007f48076e1c97 in client3_3_seek_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so #6 0x00007f4814fd5840 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #7 0x00007f4814fd5b27 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #8 0x00007f4814fd19e3 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #9 0x00007f4809bd23d6 in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #10 0x00007f4809bd497c in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #11 0x00007f48152671e6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #12 0x00007f481406be25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f481393834d in clone () from /lib64/libc.so.6
And the second core with all the debuginfos installed: [root@dell-per730-01 ~]# gdb /usr/sbin/glusterfs core.135829 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. warning: core file may not match specified executable file. [New LWP 135838] [New LWP 135843] [New LWP 135831] [New LWP 135837] [New LWP 135834] [New LWP 135842] [New LWP 135836] [New LWP 135835] [New LWP 135832] [New LWP 135833] [New LWP 135829] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --volfile-server=MMR01 --volfile-id=/memori2 /mnt/gluster/m'. Program terminated with signal 11, Segmentation fault. #0 0x00007f9384847704 in ec_manager_seek (fop=0x7f936c0ccf80, state=<optimized out>) at ec-inode-read.c:1592 1592 if (cbk->op_ret >= 0) { (gdb) bt #0 0x00007f9384847704 in ec_manager_seek (fop=0x7f936c0ccf80, state=<optimized out>) at ec-inode-read.c:1592 #1 0x00007f938482f9bb in __ec_manager (fop=0x7f936c0ccf80, error=0) at ec-common.c:2384 #2 0x00007f938482fb98 in ec_resume (fop=0x7f936c0ccf80, error=0) at ec-common.c:334 #3 0x00007f938482fcbf in ec_complete (fop=0x7f936c0ccf80) at ec-common.c:407 #4 0x00007f938484440b in ec_seek_cbk (frame=<optimized out>, cookie=0x2, this=0x7f938019ea60, op_ret=-1, op_errno=6, offset=0, xdata=0x0) at ec-inode-read.c:1549 #5 0x00007f9384aabc97 in client3_3_seek_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f936c0cdfa0) at client-rpc-fops.c:2213 #6 0x00007f9392196840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f938062faf0, pollin=pollin@entry=0x7f93706bf7c0) at rpc-clnt.c:794 #7 0x00007f9392196b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f938062fb20, event=<optimized out>, data=0x7f93706bf7c0) at rpc-clnt.c:987 #8 0x00007f93921929e3 in rpc_transport_notify (this=this@entry=0x7f938062fc90, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f93706bf7c0) at rpc-transport.c:538 #9 0x00007f9386d933d6 in socket_event_poll_in (this=this@entry=0x7f938062fc90, notify_handled=<optimized out>) at socket.c:2306 #10 0x00007f9386d9597c in socket_event_handler (fd=17, idx=5, gen=1, data=0x7f938062fc90, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #11 0x00007f93924281e6 in event_dispatch_epoll_handler (event=0x7f937d0f1e80, event_pool=0x555c30037800) at event-epoll.c:572 #12 event_dispatch_epoll_worker (data=0x7f93802869f0) at event-epoll.c:648 #13 0x00007f939122ce25 in start_thread (arg=0x7f937d0f2700) at pthread_create.c:308 #14 0x00007f9390af934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb)
The error is caused by an incorrect check in the seek cbk when an error is detected. Most probably this is the same problem detected in bug #1439068. There's a patch. What I don't know is what is generating those 'seek' requests. The returned error for seek is ENXIO, which can only be returned if seek is being used with SEEK_DATA or SEEK_HOLE. AFAIK, bit-rot doesn't use seek with SEEK_DATA nor SEEK_HOLE. There's also a problem causing ESTALE errors on some files. This is not necessarily a bad error if other operations are running on the volume. Maybe it's related to what is sending the seek requests.
We should determine if the cause of the crash is the same that causes the bit-rot issues. Currently I'm unable to reproduce the problem. Those seek requests do not appear on my tests. The only possibility I see if there's really nothing else accessing the volume is that the implementation of grep that customer is using does use seek to improve performance on sparse files. Could that be verified ? An strace of the grep execution could also give us light about this.
Fix for this issue as per the comment 9 : https://review.gluster.org/#/c/16998/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607