(gdb) bt #0 0x00007f549eaf3c30 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x00007f549e207f15 in fd_anonymous () from /lib64/libglusterfs.so.0 #2 0x00007f54869d1927 in shard_common_inode_write_do () from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so #3 0x00007f54869d1c7d in shard_common_inode_write_post_mknod_handler () from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so #4 0x00007f54869ca77f in shard_common_mknod_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so #5 0x00007f5486c1164b in dht_newfile_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so #6 0x00007f5486e71ab1 in afr_mknod_unwind () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #7 0x00007f5486e73eeb in __afr_dir_write_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #8 0x00007f5486e7482d in afr_mknod_wind_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #9 0x00007f54870f6168 in client3_3_mknod_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so #10 0x00007f549dfac840 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #11 0x00007f549dfacb27 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #12 0x00007f549dfa89e3 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #13 0x00007f5490be63d6 in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so ---Type <return> to continue, or q <return> to quit--- #14 0x00007f5490be897c in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #15 0x00007f549e23e1e6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #16 0x00007f549eaf1e25 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f549d8b034d in clone () from /lib64/libc.so.6 Based on the core-file, the only way it can happen is because it doesn't create all the shards. (gdb) fr 2 #2 0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, this=0x7f54800120d0) at shard.c:3883 3883 anon_fd = fd_anonymous (local->inode_list[i]); (gdb) p i $1 = 255 (gdb) p local->inode_list[i] $2 = (inode_t *) 0x0 (gdb) p lical->inode_list[i-1] No symbol "lical" in current context. (gdb) p local->inode_list[i-1] $3 = (inode_t *) 0x7f5474765440 (gdb) p local->offset $4 = 0 (gdb) p local->num_blocks $5 = 256 Based on this data, I went through the code and I see two races: 1) In shard_common_mknod_cbk() local->eexist_count is incremented without frame->lock 2) In shard_common_lookup_shards_cbk() local->create_count is incremented without frame->lock This can lead to the counts being less than what they need to be, so mknod is done just on 255 shards instead of 256 shards.
REVIEW: https://review.gluster.org/18204 (features/shard: Increment counts in locks) posted (#1) for review on release-3.12 by Pranith Kumar Karampuri (pkarampu)
COMMIT: https://review.gluster.org/18204 committed in release-3.12 by jiffin tony Thottan (jthottan) ------ commit f5170d49e44d0327020335de0b0fc2999a455aad Author: Pranith Kumar K <pkarampu> Date: Tue Sep 5 13:30:53 2017 +0530 features/shard: Increment counts in locks Backport of https://review.gluster.org/18203 Problem: Because create_count/eexist_count are incremented without locks, all the shards may not be created because call_count will be lesser than what it needs to be. This can lead to crash in shard_common_inode_write_do() because inode on which we want to do fd_anonymous() is NULL Fix: Increment the counts in frame->lock >Change-Id: Ibc87dcb1021e9f4ac2929f662da07aa7662ab0d6 >BUG: 1488354 >Signed-off-by: Pranith Kumar K <pkarampu> Change-Id: Ibc87dcb1021e9f4ac2929f662da07aa7662ab0d6 BUG: 1488387 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: https://review.gluster.org/18204 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Krutika Dhananjay <kdhananj> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.1, please open a new bug report. glusterfs-glusterfs-3.12.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-September/032441.html [2] https://www.gluster.org/pipermail/gluster-users/