gluster-blockd crash was seen on CNS setup. Right now there is no clear steps on what caused this crash, but only gluster-block creation and delete were performed. sh-4.2# gdb /usr/sbin/gluster-blockd /core.595 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/gluster-blockd...Reading symbols from /usr/sbin/gluster-blockd...(no debugging symbols found)...done. (no debugging symbols found)...done. [New LWP 2738] [New LWP 2724] [New LWP 2730] [New LWP 2727] [New LWP 2723] [New LWP 2726] [New LWP 2729] [New LWP 2728] [New LWP 2725] [New LWP 595] [New LWP 676] [New LWP 675] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/gluster-blockd --glfs-lru-count 15 --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007f549eaf3c30 in pthread_mutex_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install gluster-block-0.2.1-10.el7rhgs.x86_64 (gdb) bt #0 0x00007f549eaf3c30 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x00007f549e207f15 in fd_anonymous () from /lib64/libglusterfs.so.0 #2 0x00007f54869d1927 in shard_common_inode_write_do () from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so #3 0x00007f54869d1c7d in shard_common_inode_write_post_mknod_handler () from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so #4 0x00007f54869ca77f in shard_common_mknod_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so #5 0x00007f5486c1164b in dht_newfile_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so #6 0x00007f5486e71ab1 in afr_mknod_unwind () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #7 0x00007f5486e73eeb in __afr_dir_write_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #8 0x00007f5486e7482d in afr_mknod_wind_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #9 0x00007f54870f6168 in client3_3_mknod_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so #10 0x00007f549dfac840 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #11 0x00007f549dfacb27 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #12 0x00007f549dfa89e3 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #13 0x00007f5490be63d6 in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so ---Type <return> to continue, or q <return> to quit--- #14 0x00007f5490be897c in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #15 0x00007f549e23e1e6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #16 0x00007f549eaf1e25 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f549d8b034d in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): gluster-block-0.2.1-10.el7rhgs.x86_64 How reproducible: n/a core file shall be attached shortly.
Based on the core-file, the only way it can happen is because it doesn't create all the shards. (gdb) fr 2 #2 0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, this=0x7f54800120d0) at shard.c:3883 3883 anon_fd = fd_anonymous (local->inode_list[i]); (gdb) p i $1 = 255 (gdb) p local->inode_list[i] $2 = (inode_t *) 0x0 (gdb) p lical->inode_list[i-1] No symbol "lical" in current context. (gdb) p local->inode_list[i-1] $3 = (inode_t *) 0x7f5474765440 (gdb) p local->offset $4 = 0 (gdb) p local->num_blocks $5 = 256 Based on this data, I went through the code and I see two races: 1) In shard_common_mknod_cbk() local->eexist_count is incremented without frame->lock 2) In shard_common_lookup_shards_cbk() local->create_count is incremented without frame->lock This can lead to the counts being less than what they need to be, so mknod is done just on 255 shards instead of 256 shards.
https://review.gluster.org/18203
(In reply to Pranith Kumar K from comment #4) > Based on the core-file, the only way it can happen is because it doesn't > create all the shards. > (gdb) fr 2 > #2 0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, > this=0x7f54800120d0) at shard.c:3883 > 3883 anon_fd = fd_anonymous (local->inode_list[i]); > (gdb) p i > $1 = 255 > (gdb) p local->inode_list[i] > $2 = (inode_t *) 0x0 > (gdb) p lical->inode_list[i-1] > No symbol "lical" in current context. > (gdb) p local->inode_list[i-1] > $3 = (inode_t *) 0x7f5474765440 > (gdb) p local->offset > $4 = 0 > (gdb) p local->num_blocks > $5 = 256 > > Based on this data, I went through the code and I see two races: > 1) In shard_common_mknod_cbk() > local->eexist_count is incremented without frame->lock > 2) In shard_common_lookup_shards_cbk() > local->create_count is incremented without frame->lock > > This can lead to the counts being less than what they need to be, so mknod > is done just on 255 shards instead of 256 shards. The race this particular crash hit is 1) but 2) also is there based on code reading.
(In reply to Pranith Kumar K from comment #6) > (In reply to Pranith Kumar K from comment #4) > > Based on the core-file, the only way it can happen is because it doesn't > > create all the shards. > > (gdb) fr 2 > > #2 0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, > > this=0x7f54800120d0) at shard.c:3883 > > 3883 anon_fd = fd_anonymous (local->inode_list[i]); > > (gdb) p i > > $1 = 255 > > (gdb) p local->inode_list[i] > > $2 = (inode_t *) 0x0 > > (gdb) p lical->inode_list[i-1] > > No symbol "lical" in current context. > > (gdb) p local->inode_list[i-1] > > $3 = (inode_t *) 0x7f5474765440 > > (gdb) p local->offset > > $4 = 0 > > (gdb) p local->num_blocks > > $5 = 256 > > > > Based on this data, I went through the code and I see two races: > > 1) In shard_common_mknod_cbk() > > local->eexist_count is incremented without frame->lock > > 2) In shard_common_lookup_shards_cbk() > > local->create_count is incremented without frame->lock > > > > This can lead to the counts being less than what they need to be, so mknod > > is done just on 255 shards instead of 256 shards. > > The race this particular crash hit is 1) but 2) also is there based on code > reading. Sorry messed up the numbers. This crash is hit because of 2) not 1)
I am able to recreate the issue on my laptop, with the following steps: Create more number of distribute legs to have parallelization: 1)gluster --mode=script --wignore volume create r2 replica 2 localhost.localdomain:/home/gfs/r2_0 localhost.localdomain:/home/gfs/r2_1 localhost.localdomain:/home/gfs/r2_2 localhost.localdomain:/home/gfs/r2_3 localhost.localdomain:/home/gfs/r2_4 localhost.localdomain:/home/gfs/r2_5 localhost.localdomain:/home/gfs/r2_6 localhost.localdomain:/home/gfs/r2_7 localhost.localdomain:/home/gfs/r2_8 localhost.localdomain:/home/gfs/r2_9 2) Increase event threads to 16: gluster v set r2 client.event-threads 16 3) 15:12:05 :) ⚡ gluster v info r2 Volume Name: r2 Type: Distributed-Replicate Volume ID: 64e7d1be-602f-4704-b010-cec8004d52d0 Status: Started Snapshot Count: 0 Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: localhost.localdomain:/home/gfs/r2_0 Brick2: localhost.localdomain:/home/gfs/r2_1 Brick3: localhost.localdomain:/home/gfs/r2_2 Brick4: localhost.localdomain:/home/gfs/r2_3 Brick5: localhost.localdomain:/home/gfs/r2_4 Brick6: localhost.localdomain:/home/gfs/r2_5 Brick7: localhost.localdomain:/home/gfs/r2_6 Brick8: localhost.localdomain:/home/gfs/r2_7 Brick9: localhost.localdomain:/home/gfs/r2_8 Brick10: localhost.localdomain:/home/gfs/r2_9 Options Reconfigured: client.event-threads: 16 features.shard: on transport.address-family: inet nfs.disable: on 4) Run the following command: root@dhcp35-190 - /mnt/r2 15:15:06 :) ⚡ for i in {1..100}; do fallocate -l 1073741824 b && rm -f b; done fallocate: fallocate failed: Software caused connection abort fallocate: cannot open b: Transport endpoint is not connected fallocate: cannot open b: Transport endpoint is not connected Here is similar looking trace: package-string: glusterfs 4.0dev /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb3)[0x7f019abef451] /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x259)[0x7f019abf5709] /usr/local/sbin/glusterfs(glusterfsd_print_trace+0x1f)[0x409f23] /lib64/libc.so.6(+0x366f0)[0x7f01991f96f0] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f0199a28d90] /usr/local/lib/libglusterfs.so.0(fd_anonymous+0x43)[0x7f019ac1ef57] /usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1f954)[0x7f018d65e954] /usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1fb7b)[0x7f018d65eb7b] /usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1b655)[0x7f018d65a655] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/distribute.so(+0x617ad)[0x7f018d8cf7ad] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd04a)[0x7f018db3c04a] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xb38e)[0x7f018db3a38e] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd0e2)[0x7f018db3c0e2] /usr/local/lib/glusterfs/4.0dev/xlator/protocol/client.so(+0x1f60d)[0x7f018ddf660d] Ran the same with the fix and didn't see any crash
The following logs confirm the theory: [2017-09-05 10:05:30.485831] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 255 [2017-09-05 10:05:32.484851] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 255 [2017-09-05 10:05:34.333780] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 255 [2017-09-05 10:05:36.081419] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 254 <<----- this should be 255 pending frames: frame : type(1) op(FALLOCATE) frame : type(1) op(FALLOCATE) ... frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2017-09-05 10:05:36 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 4.0dev /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb3)[0x7f1496271451] /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x259)[0x7f1496277709] /usr/local/sbin/glusterfs(glusterfsd_print_trace+0x1f)[0x409f23] /lib64/libc.so.6(+0x366f0)[0x7f149487b6f0] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f14950aad90] /usr/local/lib/libglusterfs.so.0(fd_anonymous+0x43)[0x7f14962a0f57] /usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1f99c)[0x7f1488ce099c] /usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1fbc3)[0x7f1488ce0bc3] /usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1b69d)[0x7f1488cdc69d] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/distribute.so(+0x617ad)[0x7f1488f517ad] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd04a)[0x7f14891be04a] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xb38e)[0x7f14891bc38e] /usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd0e2)[0x7f14891be0e2] /usr/local/lib/glusterfs/4.0dev/xlator/protocol/client.so(+0x1f60d)[0x7f148947860d] /usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1b7)[0x7f1496033cce] /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2bc)[0x7f149603426c] /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x10f)[0x7f14960302a0] /usr/local/lib/glusterfs/4.0dev/rpc-transport/socket.so(+0xb12f)[0x7f148a55112f] /usr/local/lib/glusterfs/4.0dev/rpc-transport/socket.so(+0xb754)[0x7f148a551754] /usr/local/lib/libglusterfs.so.0(+0x9b7cb)[0x7f14962de7cb] /usr/local/lib/libglusterfs.so.0(+0x9ba9f)[0x7f14962dea9f] /lib64/libpthread.so.0(+0x736d)[0x7f14950a836d] /lib64/libc.so.6(clone+0x3f)[0x7f1494955b8f]
upstream master patch : https://review.gluster.org/18203 upstream 3.12 patch : https://review.gluster.org/#/c/18204/ downstream patch : https://code.engineering.redhat.com/gerrit/#/c/117340/
Followed steps given in comment 9 and did not see any crash with glusterfs build glusterfs-3.8.4-44.el7rhgs.x86_64.
Moving the bug to verified based on comment 14 and 16.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774