Description of problem: Heketi tests have started to fail with the Gluster 5 release. It seems that gluster-blockd occasionally gets a segfault and will not handle further requests anymore. Version-Release number of selected component (if applicable): glusterfs-5.1-1.el7.x86_64 How reproducible: random, but very often Steps to Reproduce: 1. Run the functional tests that are part of heketi 2. git clone github.com/heketi/heketi 3. cd heketi 4. make test-functional Actual results: tests fail, logs contain references that communicating with gluster-blockd failed. Expected results: Tests should pass Additional info: [root@storage2 ~]# systemctl status gluster-blockd ● gluster-blockd.service - Gluster block storage utility Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Fri 2018-12-14 15:42:16 UTC; 7min ago Process: 7246 ExecStart=/usr/sbin/gluster-blockd --glfs-lru-count $GB_GLFS_LRU_COUNT --log-level $GB_LOG_LEVEL $GB_EXTRA_ARGS (code=killed, signal=SEGV) Main PID: 7246 (code=killed, signal=SEGV) Dec 14 15:41:40 storage2 systemd[1]: Started Gluster block storage utility. Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter logfile is now '/var/log/gluster-block/gluster-block-configshell.log'. Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter loglevel_file is now 'info'. Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter auto_enable_tpgt is now 'false'. Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter auto_add_default_portal is now 'false'. Dec 14 15:41:41 storage2 gluster-blockd[7246]: Configuration saved to /etc/target/saveconfig.json Dec 14 15:42:16 storage2 systemd[1]: gluster-blockd.service: main process exited, code=killed, status=11/SEGV Dec 14 15:42:16 storage2 systemd[1]: Unit gluster-blockd.service entered failed state. Dec 14 15:42:16 storage2 systemd[1]: gluster-blockd.service failed. [root@storage2 ~]# dmesg | grep segf [ 143.199235] glfs_epoll000[7847]: segfault at f0 ip 00007fe5b3ddc9b9 sp 00007fe5beaa6440 error 6 in shard.so[7fe5b3dd3000+2b000] Core was generated by `/usr/sbin/gluster-blockd --glfs-lru-count 5 --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007fbb9cd639b9 in shard_unlink_block_inode (local=local@entry=0x7fbb80000a78, shard_block_num=<optimized out>) at shard.c:2929 2929 base_ictx->fsync_count--; (gdb) l 2924 if (ctx->fsync_needed) { 2925 unref_base_inode++; 2926 list_del_init(&ctx->to_fsync_list); 2927 if (base_inode) 2928 __shard_inode_ctx_get(base_inode, this, &base_ictx); 2929 base_ictx->fsync_count--; 2930 } 2931 } 2932 UNLOCK(&inode->lock); 2933 if (base_inode) (gdb) p *base_ictx Cannot access memory at address 0x0 The problem has been introduced by commit https://github.com/gluster/glusterfs/commit/02a05da6989f and was fixed only in the master branch with https://github.com/gluster/glusterfs/commit/145e1805 . The 2nd commit will need to be backported to the release-5 branch of glusterfs.
REVIEW: https://review.gluster.org/21866 (shard: prevent segfault in shard_unlink_block_inode()) posted (#1) for review on release-5 by Niels de Vos
REVIEW: https://review.gluster.org/21866 (shard: prevent segfault in shard_unlink_block_inode()) posted (#2) for review on release-5 by Shyamsundar Ranganathan
(In reply to Worker Ant from comment #2) > REVIEW: https://review.gluster.org/21866 (shard: prevent segfault in > shard_unlink_block_inode()) posted (#2) for review on release-5 by > Shyamsundar Ranganathan The above patch uses the "Updates" keyword, but there are no pending patches, so is the tag in the commit message correct? or are we expecting more patches around this?
(In reply to Shyamsundar from comment #3) > (In reply to Worker Ant from comment #2) > > REVIEW: https://review.gluster.org/21866 (shard: prevent segfault in > > shard_unlink_block_inode()) posted (#2) for review on release-5 by > > Shyamsundar Ranganathan > > The above patch uses the "Updates" keyword, but there are no pending > patches, so is the tag in the commit message correct? or are we expecting > more patches around this? This is the only patch that I expect is needed. If you prefer Closes: or Fixes: as a tag, feel free to change the commit message :)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.3, please open a new bug report. glusterfs-5.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-January/000118.html [2] https://www.gluster.org/pipermail/gluster-users/