Bug 1502812 - [GSS] Client segfaults when grepping $UUID.meta files on EC vol.
Summary: [GSS] Client segfaults when grepping $UUID.meta files on EC vol.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.3
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: RHGS 3.4.0
Assignee: Xavi Hernandez
QA Contact: Upasana
URL:
Whiteboard: rebase
Depends On: 1439068
Blocks: 1503135
TreeView+ depends on / blocked
 
Reported: 2017-10-16 18:35 UTC by Ben Turner
Modified: 2021-09-09 12:43 UTC (History)
15 users (show)

Fixed In Version: glusterfs-3.12.2-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-04 06:36:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:38:34 UTC

Internal Links: 1579981

Description Ben Turner 2017-10-16 18:35:12 UTC
Description of problem:

If you can, try to following steps on your test system:

- create test dataset using 'uuidgen' command'
- enable bitrot on the volume
- change a single file in the dataset that was generated before bitrot=on
- try running grep on all dataset.

These are my settings for 'bitrot'

# gluster v get memori1 all|grep bitrot
features.bitrot                         on

# gluster v get memori1 all|grep scrub
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.scrub                          Active

Scrub is active, but files count is '0' on all nodes.

I started manual scrub 'gluster v bitrot memori1 scrub ondemand'.
It's finished (but files still don't have 'bitrot' attributes)
(I'm not sure why some files were skipped)
(getfattr -d -m . -e hex on the brick data still doesn't show 'trusted.bit-rot*' attributes after scrub completed)

Version-Release number of selected component (if applicable):

RHGS 3.3

How reproducible:

Its not reproducible for me, but almost every time for the customer

Steps to Reproduce:
1.  for i in $(seq 40); do uuidgen | awk {'print "mkdir "$1"; echo test >> "$1"/"$1".meta"'}; done | sh
2.  grep test */*.meta
3.  <crashes>

Actual results:

[2017-10-09 14:47:38.652652] W [MSGID: 122033] [ec-common.c:1542:ec_locked] 0-memori1-disperse-0: Failed to complete preop lock [Stale file handle]
[2017-10-09 14:47:38.822968] W [MSGID: 114031] [client-rpc-fops.c:2211:client3_3_seek_cbk] 0-memori1-client-2: remote operation failed [No such device or address]
The message "W [MSGID: 122033] [ec-common.c:1542:ec_locked] 0-memori1-disperse-0: Failed to complete preop lock [Stale file handle]" repeated 21 times between [2017-10-09 14:47:38.652652] and [2017-10-09 14:47:38.813231]
pending frames:
frame : type(1) op(SEEK)
frame : type(1) op(SEEK)
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(FSTAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(FSTAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-10-09 14:47:38
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f982ca157d2]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f982ca1f304]
/lib64/libc.so.6(+0x35270)[0x7f982b07e270]
/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x25704)[0x7f98253b8704]
/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb)[0x7f98253a09bb]
/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98)[0x7f98253a0b98]
/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdcbf)[0x7f98253a0cbf]
/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2240b)[0x7f98253b540b]
/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x1ec97)[0x7f982561cc97]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f982c7de840]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7f982c7deb27]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f982c7da9e3]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x73d6)[0x7f98279043d6]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x997c)[0x7f982790697c]
/lib64/libglusterfs.so.0(+0x851e6)[0x7f982ca701e6]
/lib64/libpthread.so.0(+0x7e25)[0x7f982b874e25]
/lib64/libc.so.6(clone+0x6d)[0x7f982b14134d]
---------

Expected results:

You shouldn't have to.  Did you update the OS as well?  Let me try the repro again, can you confirm this is correct for me:

[root@dell-r730-4 gluster-mount]# rm -rf ./*
[root@dell-r730-4 gluster-mount]# for i in $(seq 40); do uuidgen | awk {'print "mkdir "$1"; echo test >> "$1"/"$1".meta"'}; done | sh
[root@dell-r730-4 gluster-mount]# grep test */*.meta
01293de4-5c8e-41e7-bed2-dbab695d12ce/01293de4-5c8e-41e7-bed2-dbab695d12ce.meta:test
0723f1c1-c019-4dc9-bc70-7e14eb3a2769/0723f1c1-c019-4dc9-bc70-7e14eb3a2769.meta:test
080907ca-578a-4555-b418-4b2bc58872c0/080907ca-578a-4555-b418-4b2bc58872c0.meta:test
0ce2b70f-7960-4a7f-9286-a402bf1e97e0/0ce2b70f-7960-4a7f-9286-a402bf1e97e0.meta:test
0cea6628-ce0e-4d92-8869-feded3f0faeb/0cea6628-ce0e-4d92-8869-feded3f0faeb.meta:test
1a3cecd7-edbb-4e07-90e6-cc6740314063/1a3cecd7-edbb-4e07-90e6-cc6740314063.meta:test
1a49b76f-d658-4b3f-a10f-fa9044557a68/1a49b76f-d658-4b3f-a10f-fa9044557a68.meta:test
1b2a9b35-2f10-4b13-84cb-bbcd3d6b6ede/1b2a9b35-2f10-4b13-84cb-bbcd3d6b6ede.meta:test
23f5de3d-555e-4cd7-8e43-72e677be9ff1/23f5de3d-555e-4cd7-8e43-72e677be9ff1.meta:test
24ff38f0-cf1f-4f3e-b7fd-de3b731be99a/24ff38f0-cf1f-4f3e-b7fd-de3b731be99a.meta:test
300d4942-a1c6-474f-a574-72883ad75437/300d4942-a1c6-474f-a574-72883ad75437.meta:test
3239a15d-e99c-4261-a181-e8af3f409be4/3239a15d-e99c-4261-a181-e8af3f409be4.meta:test
36f29a83-290b-4b14-848d-7d781f04bdb2/36f29a83-290b-4b14-848d-7d781f04bdb2.meta:test
44ffd4de-8d72-4fcf-94db-029cfc05acc8/44ffd4de-8d72-4fcf-94db-029cfc05acc8.meta:test
459d926e-3292-481a-ba19-09ba5673b15e/459d926e-3292-481a-ba19-09ba5673b15e.meta:test
4b23847a-34b1-443e-a460-976b6b536c65/4b23847a-34b1-443e-a460-976b6b536c65.meta:test
59fbfe5c-515a-47ed-bdfb-cf1bd5f99037/59fbfe5c-515a-47ed-bdfb-cf1bd5f99037.meta:test
5a97c5ed-a738-4886-bfa6-e9f3cc997203/5a97c5ed-a738-4886-bfa6-e9f3cc997203.meta:test
6246be8b-ce30-4d2c-8fbf-7aeab368d8c5/6246be8b-ce30-4d2c-8fbf-7aeab368d8c5.meta:test
626e0c97-ed67-4aee-91e5-11709f3627d8/626e0c97-ed67-4aee-91e5-11709f3627d8.meta:test
7428fce2-07c3-4cf2-8d3f-f53b9a63cba7/7428fce2-07c3-4cf2-8d3f-f53b9a63cba7.meta:test
7540b726-e36b-462a-adb9-e72c6737e466/7540b726-e36b-462a-adb9-e72c6737e466.meta:test
8dfd2a6e-ff67-4558-87ee-727307be9cb7/8dfd2a6e-ff67-4558-87ee-727307be9cb7.meta:test
9330ae14-1906-4617-8d9e-817eb6ac7203/9330ae14-1906-4617-8d9e-817eb6ac7203.meta:test
a4145620-0f30-454e-a54c-f30b0ea60f81/a4145620-0f30-454e-a54c-f30b0ea60f81.meta:test
aacdc304-650f-43c6-a7b0-6b2077d0c2ba/aacdc304-650f-43c6-a7b0-6b2077d0c2ba.meta:test
aacf83ad-b641-4381-9baf-42eaf72ede09/aacf83ad-b641-4381-9baf-42eaf72ede09.meta:test
ad3c3114-5d9a-4588-88bb-54e0572918a0/ad3c3114-5d9a-4588-88bb-54e0572918a0.meta:test
b7d1626f-00fa-4d22-bb69-9b1598443838/b7d1626f-00fa-4d22-bb69-9b1598443838.meta:test
be7062d5-1462-4e38-812b-e4184e31891d/be7062d5-1462-4e38-812b-e4184e31891d.meta:test
c2c4b900-d9a2-417e-92bd-b9b55a591de4/c2c4b900-d9a2-417e-92bd-b9b55a591de4.meta:test
ccce75e6-1079-49d1-91e6-f184de47578a/ccce75e6-1079-49d1-91e6-f184de47578a.meta:test
cd3e9f67-4622-446d-ad8b-cf3928e7fc2b/cd3e9f67-4622-446d-ad8b-cf3928e7fc2b.meta:test
cd9e3d9e-6a02-4579-acb6-258bfc244f01/cd9e3d9e-6a02-4579-acb6-258bfc244f01.meta:test
ceccd07e-26dc-4275-a605-e5a292192277/ceccd07e-26dc-4275-a605-e5a292192277.meta:test
d15332a0-f08f-4548-b43d-441b9823d231/d15332a0-f08f-4548-b43d-441b9823d231.meta:test
d8c02b45-987a-4fb4-8ab6-140c1464c180/d8c02b45-987a-4fb4-8ab6-140c1464c180.meta:test
de52e9e6-b755-4920-b095-c4ef71c22ae6/de52e9e6-b755-4920-b095-c4ef71c22ae6.meta:test
e2df5921-f8be-4044-98ae-8455c9bfeaa0/e2df5921-f8be-4044-98ae-8455c9bfeaa0.meta:test
ea033d20-f321-4566-bab0-e03e8fed7d81/ea033d20-f321-4566-bab0-e03e8fed7d81.meta:test

Additional info:

Comment 2 Ben Turner 2017-10-16 18:41:42 UTC
Some info, here is the env I tried to repro on:

[root@dell-r730-3 ~]# gluster v info ecvol2
 
Volume Name: ecvol2
Type: Distributed-Disperse
Volume ID: fd620567-4381-406b-a3ef-d8fdf824b358
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (8 + 4) = 36
Transport-type: tcp
Bricks:
Brick1: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-1
Brick2: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-1
Brick3: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-1
Brick4: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-2
Brick5: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-2
Brick6: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-2
Brick7: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-3
Brick8: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-3
Brick9: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-3
Brick10: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-4
Brick11: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-4
Brick12: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-4
Brick13: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-5
Brick14: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-5
Brick15: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-5
Brick16: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-6
Brick17: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-6
Brick18: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-6
Brick19: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-7
Brick20: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-7
Brick21: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-7
Brick22: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-8
Brick23: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-8
Brick24: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-8
Brick25: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-9
Brick26: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-9
Brick27: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-9
Brick28: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-10
Brick29: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-10
Brick30: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-10
Brick31: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-11
Brick32: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-11
Brick33: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-11
Brick34: dell-r730-1.gsslab.rdu2.redhat.com:/bricks/ec-12
Brick35: dell-r730-2.gsslab.rdu2.redhat.com:/bricks/ec-12
Brick36: dell-r730-3.gsslab.rdu2.redhat.com:/bricks/ec-12
Options Reconfigured:
server.allow-insecure: on
user.cifs: off
network.remote-dio: on
network.ping-timeout: 30
performance.strict-o-direct: on
performance.io-thread-count: 64
performance.cache-size: 256MB
performance.read-ahead: off
performance.client-io-threads: on
performance.write-behind-window-size: 1MB
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.locking-scheme: granular
cluster.readdir-optimize: on
cluster.lookup-optimize: on
cluster.server-quorum-type: server
features.scrub: pause
features.bitrot: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
nfs.disable: on

Here is the customer's environment:

Volume Name: memori1
Type: Distributed-Disperse
Volume ID: b870f631-3a15-45c1-830e-a05849084b6d
Status: Started
Snapshot Count: 0
Number of Bricks: 19 x (8 + 4) = 228
Transport-type: tcp
Bricks:
Brick1: MMR01:/rhgs/b0/memori1
Brick2: MMR02:/rhgs/b0/memori1
Brick3: MMR03:/rhgs/b0/memori1
Brick4: MMR04:/rhgs/b0/memori1
Brick5: MMR05:/rhgs/b0/memori1
Brick6: MMR06:/rhgs/b0/memori1
Brick7: MMR07:/rhgs/b0/memori1
Brick8: MMR08:/rhgs/b0/memori1
Brick9: MMR09:/rhgs/b0/memori1
Brick10: MMR10:/rhgs/b0/memori1
Brick11: MMR11:/rhgs/b0/memori1
Brick12: MMR12:/rhgs/b0/memori1
Brick13: MMR01:/rhgs/b1/memori1
Brick14: MMR02:/rhgs/b1/memori1
Brick15: MMR03:/rhgs/b1/memori1
Brick16: MMR04:/rhgs/b1/memori1
Brick17: MMR05:/rhgs/b1/memori1
Brick18: MMR06:/rhgs/b1/memori1
Brick19: MMR07:/rhgs/b1/memori1
Brick20: MMR08:/rhgs/b1/memori1
Brick21: MMR09:/rhgs/b1/memori1
Brick22: MMR10:/rhgs/b1/memori1
Brick23: MMR11:/rhgs/b1/memori1
Brick24: MMR12:/rhgs/b1/memori1
Brick25: MMR01:/rhgs/b2/memori1
Brick26: MMR02:/rhgs/b2/memori1
Brick27: MMR03:/rhgs/b2/memori1
Brick28: MMR04:/rhgs/b2/memori1
Brick29: MMR05:/rhgs/b2/memori1
Brick30: MMR06:/rhgs/b2/memori1
Brick31: MMR07:/rhgs/b2/memori1
Brick32: MMR08:/rhgs/b2/memori1
Brick33: MMR09:/rhgs/b2/memori1
Brick34: MMR10:/rhgs/b2/memori1
Brick35: MMR11:/rhgs/b2/memori1
Brick36: MMR12:/rhgs/b2/memori1
Brick37: MMR01:/rhgs/b3/memori1
Brick38: MMR02:/rhgs/b3/memori1
Brick39: MMR03:/rhgs/b3/memori1
Brick40: MMR04:/rhgs/b3/memori1
Brick41: MMR05:/rhgs/b3/memori1
Brick42: MMR06:/rhgs/b3/memori1
Brick43: MMR07:/rhgs/b3/memori1
Brick44: MMR08:/rhgs/b3/memori1
Brick45: MMR09:/rhgs/b3/memori1
Brick46: MMR10:/rhgs/b3/memori1
Brick47: MMR11:/rhgs/b3/memori1
Brick48: MMR12:/rhgs/b3/memori1
Brick49: MMR01:/rhgs/b4/memori1
Brick50: MMR02:/rhgs/b4/memori1
Brick51: MMR03:/rhgs/b4/memori1
Brick52: MMR04:/rhgs/b4/memori1
Brick53: MMR05:/rhgs/b4/memori1
Brick54: MMR06:/rhgs/b4/memori1
Brick55: MMR07:/rhgs/b4/memori1
Brick56: MMR08:/rhgs/b4/memori1
Brick57: MMR09:/rhgs/b4/memori1
Brick58: MMR10:/rhgs/b4/memori1
Brick59: MMR11:/rhgs/b4/memori1
Brick60: MMR12:/rhgs/b4/memori1
Brick61: MMR01:/rhgs/b5/memori1
Brick62: MMR02:/rhgs/b5/memori1
Brick63: MMR03:/rhgs/b5/memori1
Brick64: MMR04:/rhgs/b5/memori1
Brick65: MMR05:/rhgs/b5/memori1
Brick66: MMR06:/rhgs/b5/memori1
Brick67: MMR07:/rhgs/b5/memori1
Brick68: MMR08:/rhgs/b5/memori1
Brick69: MMR09:/rhgs/b5/memori1
Brick70: MMR10:/rhgs/b5/memori1
Brick71: MMR11:/rhgs/b5/memori1
Brick72: MMR12:/rhgs/b5/memori1
Brick73: MMR01:/rhgs/b6/memori1
Brick74: MMR02:/rhgs/b6/memori1
Brick75: MMR03:/rhgs/b6/memori1
Brick76: MMR04:/rhgs/b6/memori1
Brick77: MMR05:/rhgs/b6/memori1
Brick78: MMR06:/rhgs/b6/memori1
Brick79: MMR07:/rhgs/b6/memori1
Brick80: MMR08:/rhgs/b6/memori1
Brick81: MMR09:/rhgs/b6/memori1
Brick82: MMR10:/rhgs/b6/memori1
Brick83: MMR11:/rhgs/b6/memori1
Brick84: MMR12:/rhgs/b6/memori1
Brick85: MMR01:/rhgs/b7/memori1
Brick86: MMR02:/rhgs/b7/memori1
Brick87: MMR03:/rhgs/b7/memori1
Brick88: MMR04:/rhgs/b7/memori1
Brick89: MMR05:/rhgs/b7/memori1
Brick90: MMR06:/rhgs/b7/memori1
Brick91: MMR07:/rhgs/b7/memori1
Brick92: MMR08:/rhgs/b7/memori1
Brick93: MMR09:/rhgs/b7/memori1
Brick94: MMR10:/rhgs/b7/memori1
Brick95: MMR11:/rhgs/b7/memori1
Brick96: MMR12:/rhgs/b7/memori1
Brick97: MMR01:/rhgs/b8/memori1
Brick98: MMR02:/rhgs/b8/memori1
Brick99: MMR03:/rhgs/b8/memori1
Brick100: MMR04:/rhgs/b8/memori1
Brick101: MMR05:/rhgs/b8/memori1
Brick102: MMR06:/rhgs/b8/memori1
Brick103: MMR07:/rhgs/b8/memori1
Brick104: MMR08:/rhgs/b8/memori1
Brick105: MMR09:/rhgs/b8/memori1
Brick106: MMR10:/rhgs/b8/memori1
Brick107: MMR11:/rhgs/b8/memori1
Brick108: MMR12:/rhgs/b8/memori1
Brick109: MMR01:/rhgs/b9/memori1
Brick110: MMR02:/rhgs/b9/memori1
Brick111: MMR03:/rhgs/b9/memori1
Brick112: MMR04:/rhgs/b9/memori1
Brick113: MMR05:/rhgs/b9/memori1
Brick114: MMR06:/rhgs/b9/memori1
Brick115: MMR07:/rhgs/b9/memori1
Brick116: MMR08:/rhgs/b9/memori1
Brick117: MMR09:/rhgs/b9/memori1
Brick118: MMR10:/rhgs/b9/memori1
Brick119: MMR11:/rhgs/b9/memori1
Brick120: MMR12:/rhgs/b9/memori1
Brick121: MMR01:/rhgs/b10/memori1
Brick122: MMR02:/rhgs/b10/memori1
Brick123: MMR03:/rhgs/b10/memori1
Brick124: MMR04:/rhgs/b10/memori1
Brick125: MMR05:/rhgs/b10/memori1
Brick126: MMR06:/rhgs/b10/memori1
Brick127: MMR07:/rhgs/b10/memori1
Brick128: MMR08:/rhgs/b10/memori1
Brick129: MMR09:/rhgs/b10/memori1
Brick130: MMR10:/rhgs/b10/memori1
Brick131: MMR11:/rhgs/b10/memori1
Brick132: MMR12:/rhgs/b10/memori1
Brick133: MMR01:/rhgs/b11/memori1
Brick134: MMR02:/rhgs/b11/memori1
Brick135: MMR03:/rhgs/b11/memori1
Brick136: MMR04:/rhgs/b11/memori1
Brick137: MMR05:/rhgs/b11/memori1
Brick138: MMR06:/rhgs/b11/memori1
Brick139: MMR07:/rhgs/b11/memori1
Brick140: MMR08:/rhgs/b11/memori1
Brick141: MMR09:/rhgs/b11/memori1
Brick142: MMR10:/rhgs/b11/memori1
Brick143: MMR11:/rhgs/b11/memori1
Brick144: MMR12:/rhgs/b11/memori1
Brick145: MMR01:/rhgs/b12/memori1
Brick146: MMR02:/rhgs/b12/memori1
Brick147: MMR03:/rhgs/b12/memori1
Brick148: MMR04:/rhgs/b12/memori1
Brick149: MMR05:/rhgs/b12/memori1
Brick150: MMR06:/rhgs/b12/memori1
Brick151: MMR07:/rhgs/b12/memori1
Brick152: MMR08:/rhgs/b12/memori1
Brick153: MMR09:/rhgs/b12/memori1
Brick154: MMR10:/rhgs/b12/memori1
Brick155: MMR11:/rhgs/b12/memori1
Brick156: MMR12:/rhgs/b12/memori1
Brick157: MMR01:/rhgs/b13/memori1
Brick158: MMR02:/rhgs/b13/memori1
Brick159: MMR03:/rhgs/b13/memori1
Brick160: MMR04:/rhgs/b13/memori1
Brick161: MMR05:/rhgs/b13/memori1
Brick162: MMR06:/rhgs/b13/memori1
Brick163: MMR07:/rhgs/b13/memori1
Brick164: MMR08:/rhgs/b13/memori1
Brick165: MMR09:/rhgs/b13/memori1
Brick166: MMR10:/rhgs/b13/memori1
Brick167: MMR11:/rhgs/b13/memori1
Brick168: MMR12:/rhgs/b13/memori1
Brick169: MMR01:/rhgs/b14/memori1
Brick170: MMR02:/rhgs/b14/memori1
Brick171: MMR03:/rhgs/b14/memori1
Brick172: MMR04:/rhgs/b14/memori1
Brick173: MMR05:/rhgs/b14/memori1
Brick174: MMR06:/rhgs/b14/memori1
Brick175: MMR07:/rhgs/b14/memori1
Brick176: MMR08:/rhgs/b14/memori1
Brick177: MMR09:/rhgs/b14/memori1
Brick178: MMR10:/rhgs/b14/memori1
Brick179: MMR11:/rhgs/b14/memori1
Brick180: MMR12:/rhgs/b14/memori1
Brick181: MMR01:/rhgs/b15/memori1
Brick182: MMR02:/rhgs/b15/memori1
Brick183: MMR03:/rhgs/b15/memori1
Brick184: MMR04:/rhgs/b15/memori1
Brick185: MMR05:/rhgs/b15/memori1
Brick186: MMR06:/rhgs/b15/memori1
Brick187: MMR07:/rhgs/b15/memori1
Brick188: MMR08:/rhgs/b15/memori1
Brick189: MMR09:/rhgs/b15/memori1
Brick190: MMR10:/rhgs/b15/memori1
Brick191: MMR11:/rhgs/b15/memori1
Brick192: MMR12:/rhgs/b15/memori1
Brick193: MMR01:/rhgs/b16/memori1
Brick194: MMR02:/rhgs/b16/memori1
Brick195: MMR03:/rhgs/b16/memori1
Brick196: MMR04:/rhgs/b16/memori1
Brick197: MMR05:/rhgs/b16/memori1
Brick198: MMR06:/rhgs/b16/memori1
Brick199: MMR07:/rhgs/b16/memori1
Brick200: MMR08:/rhgs/b16/memori1
Brick201: MMR09:/rhgs/b16/memori1
Brick202: MMR10:/rhgs/b16/memori1
Brick203: MMR11:/rhgs/b16/memori1
Brick204: MMR12:/rhgs/b16/memori1
Brick205: MMR01:/rhgs/b17/memori1
Brick206: MMR02:/rhgs/b17/memori1
Brick207: MMR03:/rhgs/b17/memori1
Brick208: MMR04:/rhgs/b17/memori1
Brick209: MMR05:/rhgs/b17/memori1
Brick210: MMR06:/rhgs/b17/memori1
Brick211: MMR07:/rhgs/b17/memori1
Brick212: MMR08:/rhgs/b17/memori1
Brick213: MMR09:/rhgs/b17/memori1
Brick214: MMR10:/rhgs/b17/memori1
Brick215: MMR11:/rhgs/b17/memori1
Brick216: MMR12:/rhgs/b17/memori1
Brick217: MMR01:/rhgs/b18/memori1
Brick218: MMR02:/rhgs/b18/memori1
Brick219: MMR03:/rhgs/b18/memori1
Brick220: MMR04:/rhgs/b18/memori1
Brick221: MMR05:/rhgs/b18/memori1
Brick222: MMR06:/rhgs/b18/memori1
Brick223: MMR07:/rhgs/b18/memori1
Brick224: MMR08:/rhgs/b18/memori1
Brick225: MMR09:/rhgs/b18/memori1
Brick226: MMR10:/rhgs/b18/memori1
Brick227: MMR11:/rhgs/b18/memori1
Brick228: MMR12:/rhgs/b18/memori1
Options Reconfigured:
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.quick-read: off
cluster.tier-demote-frequency: 3600
nfs.disable: on
performance.readdir-ahead: enable
transport.address-family: inet
client.event-threads: 32
server.event-threads: 32
cluster.lookup-optimize: on
cluster.readdir-optimize: on
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
cluster.data-self-heal-algorithm: full
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: off
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 90000
performance.write-behind-window-size: 1MB
performance.client-io-threads: on
performance.read-ahead: off
performance.cache-size: 256MB
performance.io-thread-count: 64
performance.strict-o-direct: on
network.ping-timeout: 30
network.remote-dio: enable
user.cifs: off
diagnostics.client-log-level: WARNING
features.quota: off
features.inode-quota: off
server.allow-insecure: on
cluster.watermark-low: 1
cluster.watermark-hi: 1
diagnostics.brick-sys-log-level: INFO
diagnostics.brick-log-level: INFO
cluster.server-quorum-ratio: 51%

Comment 3 Ben Turner 2017-10-16 18:48:18 UTC
Some observations from Dmitri:

I'm taking a closer look at files w/o bitrot attributes:
These files have 'trusted.glusterfs.dht.linkto', the location in the attribute is empty.
Is it normal? I'm looking in a wrong place?

# getfattr -d -m . -e hex /rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta
getfattr: Removing leading '/' from absolute path names
# file: rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta
trusted.ec.config=0x0000080c04000200
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000000
trusted.gfid=0x38229d9705b442658af0ffbb37051074
trusted.glusterfs.dht.linkto=0x6d656d6f7269312d64697370657273652d3800

(check location in trusted.gfid, looks OK):
[root@MMR01 ~]# find /rhgs/b0/memori1/.glusterfs/38/22/
/rhgs/b0/memori1/.glusterfs/38/22/
/rhgs/b0/memori1/.glusterfs/38/22/38229d97-05b4-4265-8af0-ffbb37051074

(check location in 'trusted.glusterfs.dht.linkto', no files in the dir):
[root@MMR01 ~]# find /rhgs/b0/memori1/.glusterfs/6d/65
/rhgs/b0/memori1/.glusterfs/6d/65


I'm seeing a pattern here:

All 'broken' files seems to be stored twice on nodes. 
I.e. Client crashes when running:

# grep test f33fa16e-7c15-4b91-af70-cc8d3133bdd4/*.meta 

When I search for the file on bricks, it's stored on each server on two different bricks:

$ ansible memori-pool1 -m shell -a 'find /rhgs/*/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4 -name '*.meta''
MMR03.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR01.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR04.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR02.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR05.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR07.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR06.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR08.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR09.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR10.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR11.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

MMR12.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta
/rhgs/b10/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/f33fa16e-7c15-4b91-af70-cc8d3133bdd4/09977103-e2c9-4d3b-8ec2-12b17e55468a.meta

The following is example for a 'good' file (i.e. grep doesn't crash):

$ ansible memori-pool1 -m shell -a 'find /rhgs/*/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3 -name '*.meta''
MMR02.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR01.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR04.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR06.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR07.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR03.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR05.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR08.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR10.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR09.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR11.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

MMR12.ORC | SUCCESS | rc=0 >>
/rhgs/b0/memori1/35f52f88-e039-4bbf-bbfa-0a8ff71adabd/images/2b201c1c-2b0e-404f-acaf-69ec1d1af2a3/7aee9148-1b70-420f-8347-8c641c3dab5b.meta

Comment 6 Ben Turner 2017-10-17 03:54:39 UTC
Here is the BT:
\warning: core file may not match specified executable file.
[New LWP 4755]
[New LWP 4750]
[New LWP 4749]
[New LWP 4748]
[New LWP 4760]
[New LWP 4751]
[New LWP 4757]
[New LWP 4756]
[New LWP 4752]
[New LWP 4753]
[New LWP 4761]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --volfile-server=MMR04 --volfile-id=/memori2 /mnt/gluster/m'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f480747d704 in ec_manager_seek ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
(gdb) bt
#0  0x00007f480747d704 in ec_manager_seek ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so
#1  0x00007f48074659bb in __ec_manager ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so
#2  0x00007f4807465b98 in ec_resume ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so
#3  0x00007f4807465cbf in ec_complete ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so
#4  0x00007f480747a40b in ec_seek_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so
#5  0x00007f48076e1c97 in client3_3_seek_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#6  0x00007f4814fd5840 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#7  0x00007f4814fd5b27 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#8  0x00007f4814fd19e3 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#9  0x00007f4809bd23d6 in socket_event_poll_in ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#10 0x00007f4809bd497c in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#11 0x00007f48152671e6 in event_dispatch_epoll_worker ()
   from /lib64/libglusterfs.so.0
#12 0x00007f481406be25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f481393834d in clone () from /lib64/libc.so.6

Comment 7 Ben Turner 2017-10-17 03:58:38 UTC
And the second core with all the debuginfos installed:

[root@dell-per730-01 ~]# gdb /usr/sbin/glusterfs core.135829 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 135838]
[New LWP 135843]
[New LWP 135831]
[New LWP 135837]
[New LWP 135834]
[New LWP 135842]
[New LWP 135836]
[New LWP 135835]
[New LWP 135832]
[New LWP 135833]
[New LWP 135829]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --volfile-server=MMR01 --volfile-id=/memori2 /mnt/gluster/m'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f9384847704 in ec_manager_seek (fop=0x7f936c0ccf80, state=<optimized out>) at ec-inode-read.c:1592
1592	            if (cbk->op_ret >= 0) {
(gdb) bt
#0  0x00007f9384847704 in ec_manager_seek (fop=0x7f936c0ccf80, state=<optimized out>) at ec-inode-read.c:1592
#1  0x00007f938482f9bb in __ec_manager (fop=0x7f936c0ccf80, error=0) at ec-common.c:2384
#2  0x00007f938482fb98 in ec_resume (fop=0x7f936c0ccf80, error=0) at ec-common.c:334
#3  0x00007f938482fcbf in ec_complete (fop=0x7f936c0ccf80) at ec-common.c:407
#4  0x00007f938484440b in ec_seek_cbk (frame=<optimized out>, cookie=0x2, this=0x7f938019ea60, op_ret=-1, op_errno=6, offset=0, xdata=0x0) at ec-inode-read.c:1549
#5  0x00007f9384aabc97 in client3_3_seek_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f936c0cdfa0) at client-rpc-fops.c:2213
#6  0x00007f9392196840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f938062faf0, pollin=pollin@entry=0x7f93706bf7c0) at rpc-clnt.c:794
#7  0x00007f9392196b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f938062fb20, event=<optimized out>, data=0x7f93706bf7c0) at rpc-clnt.c:987
#8  0x00007f93921929e3 in rpc_transport_notify (this=this@entry=0x7f938062fc90, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f93706bf7c0) at rpc-transport.c:538
#9  0x00007f9386d933d6 in socket_event_poll_in (this=this@entry=0x7f938062fc90, notify_handled=<optimized out>) at socket.c:2306
#10 0x00007f9386d9597c in socket_event_handler (fd=17, idx=5, gen=1, data=0x7f938062fc90, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458
#11 0x00007f93924281e6 in event_dispatch_epoll_handler (event=0x7f937d0f1e80, event_pool=0x555c30037800) at event-epoll.c:572
#12 event_dispatch_epoll_worker (data=0x7f93802869f0) at event-epoll.c:648
#13 0x00007f939122ce25 in start_thread (arg=0x7f937d0f2700) at pthread_create.c:308
#14 0x00007f9390af934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb)

Comment 9 Xavi Hernandez 2017-10-17 06:37:05 UTC
The error is caused by an incorrect check in the seek cbk when an error is detected. Most probably this is the same problem detected in bug #1439068. There's a patch.

What I don't know is what is generating those 'seek' requests. The returned error for seek is ENXIO, which can only be returned if seek is being used with SEEK_DATA or SEEK_HOLE. AFAIK, bit-rot doesn't use seek with SEEK_DATA nor SEEK_HOLE.

There's also a problem causing ESTALE errors on some files. This is not necessarily a bad error if other operations are running on the volume. Maybe it's related to what is sending the seek requests.

Comment 10 Xavi Hernandez 2017-10-18 06:52:45 UTC
We should determine if the cause of the crash is the same that causes the bit-rot issues. Currently I'm unable to reproduce the problem. Those seek requests do not appear on my tests. The only possibility I see if there's really nothing else accessing the volume is that the implementation of grep that customer is using does use seek to improve performance on sparse files. Could that be verified ?

An strace of the grep execution could also give us light about this.

Comment 12 Sunil Kumar Acharya 2017-11-03 07:42:48 UTC
Fix for this issue as per the comment 9 : https://review.gluster.org/#/c/16998/

Comment 27 errata-xmlrpc 2018-09-04 06:36:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.