Created attachment 1490178 [details] GDB dump log Description of problem: -Few days ago I found my EC(4+2) volume was degraded. -I am using 3.12.13-1.el7.x86_64. -One brick was down, below is bricklog -I am suspicious loc->inode bug in index.c (see attached picture) -In GDB, loc->inode is null - inode_find (loc->inode->table, loc->gfid); Version-Release number of selected component (if applicable): - 3.12.13 How reproducible: - i don't know Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: glusterfsd brick error log [2018-09-29 13:22:36.536532] W [inode.c:942:inode_find] (-->/usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xd01c) [0x7f9bd249401c] -->/usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xc638) [0x7f9bd2493638] -->/lib64/libglusterfs.so.0(inode_find+0x92) [ 0x7f9be7090a82] ) 0-gluvol02-05-server: table not found [2018-09-29 13:22:36.536579] W [inode.c:680:inode_new] (-->/usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xd048) [0x7f9bd2494048] -->/usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xc14d) [0x7f9bd249314d] -->/lib64/libglusterfs.so.0(inode_new+0x8a) [0x 7f9be70900ba] ) 0-gluvol02-05-server: inode not found [2018-09-29 13:22:36.537568] W [inode.c:2305:inode_is_linked] (-->/usr/lib64/glusterfs/3.12.13/xlator/features/quota.so(+0x4fc6) [0x7f9bd2b1cfc6] -->/usr/lib64/glusterfs/3.12.13/xlator/features/index.so(+0x4bb9) [0x7f9bd2d43bb9] -->/lib64/libglusterfs.so.0(inode_is_linke d+0x8a) [0x7f9be70927ea] ) 0-gluvol02-05-index: inode not found pending frames: frame : type(0) op(18) frame : type(0) op(18) frame : type(0) op(28) --snip -- frame : type(0) op(28) frame : type(0) op(28) frame : type(0) op(18) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-09-29 13:22:36 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.13 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f9be70804c0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f9be708a3f4] /lib64/libc.so.6(+0x362f0)[0x7f9be56e02f0] /usr/lib64/glusterfs/3.12.13/xlator/features/index.so(+0x4bc4)[0x7f9bd2d43bc4] /usr/lib64/glusterfs/3.12.13/xlator/features/quota.so(+0x4fc6)[0x7f9bd2b1cfc6] /usr/lib64/glusterfs/3.12.13/xlator/debug/io-stats.so(+0x4e53)[0x7f9bd28eee53] /lib64/libglusterfs.so.0(default_lookup+0xbd)[0x7f9be70fddfd] /usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xc342)[0x7f9bd2493342] /usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xd048)[0x7f9bd2494048] /usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xd2c0)[0x7f9bd24942c0] /usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xc89e)[0x7f9bd249389e] /usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0xd354)[0x7f9bd2494354] /usr/lib64/glusterfs/3.12.13/xlator/protocol/server.so(+0x2f829)[0x7f9bd24b6829] /lib64/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f9be6e42246] /lib64/libpthread.so.0(+0x7e25)[0x7f9be5edfe25] /lib64/libc.so.6(clone+0x6d)[0x7f9be57a8bad]
Created attachment 1490994 [details] brick.log
Yesterday, another brick was died with same symptom. core file : http://ac2repo.gluesys.com/ac2repo/down/core.50570.tgz Volume configuration: - We have 10 EC Volume with 6 nodes (4+2). - Each brick size is 37TB - This is one cluser infomation of 10 EC volumes. - All volume configuration is same. Each volume Data charateristics - df -i : 8M, - 95% mp4 files(~10MB), some txt infomation File/Dir Layout - / └── indexdir ( about 1000) └── datadir ( about 800) └── data : about 10 (txt and mp4) NOTE: - Currently there are aggressive selfhealing job. - We delete all brick in one node, and then monitoring self-heal status. - self-heal daemon memory reaches 75GB - Before brick segfaulted, there are many gfid fd clean up calls Volume Name: xxxxxxx-01 Type: Disperse Volume ID: cac0ab6a-55bd-48ed-ac7a-92f0cb4aca80 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: xxxxxxx-GLUSTER2-1:/gluster/brick1/data Brick2: xxxxxxx-GLUSTER2-2:/gluster/brick1/data Brick3: xxxxxxx-GLUSTER2-3:/gluster/brick1/data Brick4: xxxxxxx-GLUSTER2-4:/gluster/brick1/data Brick5: xxxxxxx-GLUSTER2-5:/gluster/brick1/data Brick6: xxxxxxx-GLUSTER2-6:/gluster/brick1/data Options Reconfigured: performance.io-thread-count: 64 performance.least-prio-threads: 64 performance.high-prio-threads: 64 performance.normal-prio-threads: 64 performance.low-prio-threads: 64 server.event-threads: 1024 client.event-threads: 32 cluster.lookup-optimize: on performance.parallel-readdir: on cluster.use-compound-fops: on performance.nl-cache: on performance.nl-cache-positive-entry: on performance.nl-cache-limit: 1GB network.inode-lru-limit: 200000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on disperse.shd-wait-qlength: 32768 disperse.shd-max-threads: 16 disperse.self-heal-window-size: 16 disperse.heal-wait-qlength: 2048 disperse.background-heals: 64 performance.write-behind-window-size: 50MB performance.cache-size: 4GB cluster.shd-wait-qlength: 32768 cluster.background-self-heal-count: 64 cluster.self-heal-window-size: 16 transport.address-family: inet nfs.disable: on cluster.localtime-logging: enable
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
Does it still happen on newer releases?
(In reply to Yaniv Kaul from comment #6) > Does it still happen on newer releases? Closing for the time being. Please re-open if you have more information.
This issue is not occurred in current my system (glusterfs-6.x).