Description of problem: gluster brick daemon crash with lock-heal feature on. # gluster volume info Volume Name: $volname Type: Replicate Volume ID: xxx Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: srv01:/srv/gfs/$volname/brick Brick2: srv02:/srv/gfs/$volname/brick Options Reconfigured: performance.parallel-readdir: on features.locks-revocation-secs: 1800 nfs.disable: on transport.address-family: inet cluster.min-free-disk: 3 diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING performance.cache-max-file-size: 10MB performance.cache-refresh-timeout: 60 performance.readdir-ahead: off performance.md-cache-timeout: 600 performance.client-io-threads: on storage.linux-aio: on features.lock-heal: on cluster.readdir-optimize: on diagnostics.client-sys-log-level: CRITICAL diagnostics.brick-sys-log-level: CRITICAL performance.cache-size: 256MB features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-samba-metadata: on performance.cache-invalidation: on network.inode-lru-limit: 90000 cluster.favorite-child-policy: mtime cluster.enable-shared-storage: enable Version-Release number of selected component (if applicable): 3.12.0, 3.12.1 How reproducible: gluster volume set $volname features.lock-heal on /etc/init.d/glusterfs-server stop /etc/init.d/glusterfs-server start Actual results: Final graph: +------------------------------------------------------------------------------+ 1: volume $volname-posix 2: type storage/posix 3: option glusterd-uuid xx 4: option directory /srv/gfs/$volname/brick 5: option volume-id xx 6: option shared-brick-count 1 7: option linux-aio on 8: end-volume 9: 10: volume $volname-trash 11: type features/trash 12: option trash-dir .trashcan 13: option brick-path /srv/gfs/$volname/brick 14: option trash-internal-op off 15: subvolumes $volname-posix 16: end-volume 17: 18: volume $volname-changetimerecorder 19: type features/changetimerecorder 20: option db-type sqlite3 21: option hot-brick off 22: option db-name brick.db 23: option db-path /srv/gfs/$volname/brick/.glusterfs/ 24: option record-exit off 25: option ctr_link_consistency off 26: option ctr_lookupheal_link_timeout 300 27: option ctr_lookupheal_inode_timeout 300 28: option record-entry on 29: option ctr-enabled off 30: option record-counters off 31: option ctr-record-metadata-heat off 32: option sql-db-cachesize 12500 33: option sql-db-wal-autocheckpoint 25000 34: subvolumes $volname-trash 35: end-volume 36: 37: volume $volname-changelog 38: type features/changelog 39: option changelog-brick /srv/gfs/$volname/brick 40: option changelog-dir /srv/gfs/$volname/brick/.glusterfs/changelogs 41: option changelog-barrier-timeout 120 42: subvolumes $volname-changetimerecorder 43: end-volume 44: 45: volume $volname-bitrot-stub 46: type features/bitrot-stub 47: option export /srv/gfs/$volname/brick 48: option bitrot disable 49: subvolumes $volname-changelog 50: end-volume 51: 52: volume $volname-access-control 53: type features/access-control 54: subvolumes $volname-bitrot-stub 55: end-volume 56: 57: volume $volname-locks 58: type features/locks 59: option revocation-secs 1800 60: subvolumes $volname-access-control 61: end-volume 62: 63: volume $volname-worm 64: type features/worm 65: option worm off 66: option worm-file-level off 67: subvolumes $volname-locks 68: end-volume 69: 70: volume $volname-read-only 71: type features/read-only 72: option read-only off 73: subvolumes $volname-worm 74: end-volume 75: 76: volume $volname-leases 77: type features/leases 78: option leases off 79: subvolumes $volname-read-only 80: end-volume 81: 82: volume $volname-upcall 83: type features/upcall 84: option cache-invalidation on 85: option cache-invalidation-timeout 600 86: subvolumes $volname-leases 87: end-volume 88: 89: volume $volname-io-threads 90: type performance/io-threads 91: subvolumes $volname-upcall 92: end-volume 93: 94: volume $volname-selinux 95: type features/selinux 96: option selinux on 97: subvolumes $volname-io-threads 98: end-volume 99: 100: volume $volname-marker 101: type features/marker 102: option volume-uuid xx 103: option timestamp-file /var/lib/glusterd/vols/$volname/marker.tstamp 104: option quota-version 0 105: option xtime off 106: option gsync-force-xtime off 107: option quota off 108: option inode-quota off 109: subvolumes $volname-selinux 110: end-volume 111: 112: volume $volname-barrier 113: type features/barrier 114: option barrier disable 115: option barrier-timeout 120 116: subvolumes $volname-marker 117: end-volume 118: 119: volume $volname-index 120: type features/index 121: option index-base /srv/gfs/$volname/brick/.glusterfs/indices 122: option xattrop-dirty-watchlist trusted.afr.dirty 123: option xattrop-pending-watchlist trusted.afr.$volname- 124: subvolumes $volname-barrier 125: end-volume 126: 127: volume $volname-quota 128: type features/quota 129: option volume-uuid $volname 130: option server-quota off 131: option deem-statfs off 132: subvolumes $volname-index 133: end-volume 134: 135: volume $volname-io-stats 136: type debug/io-stats 137: option unique-id /srv/gfs/$volname/brick 138: option log-level WARNING 139: option sys-log-level CRITICAL 140: option latency-measurement off 141: option count-fop-hits off 142: subvolumes $volname-quota 143: end-volume 144: 145: volume /srv/gfs/$volname/brick 146: type performance/decompounder 147: option auth.addr./srv/gfs/$volname/brick.allow xx 148: option auth-path /srv/gfs/$volname/brick 149: option auth.login.xx 150: option auth.login./srv/gfs/$volname/brick.allow xx 151: subvolumes $volname-io-stats 152: end-volume 153: 154: volume $volname-server 155: type protocol/server 156: option transport.socket.listen-port 49153 157: option rpc-auth.auth-glusterfs on 158: option rpc-auth.auth-unix on 159: option rpc-auth.auth-null on 160: option rpc-auth-allow-insecure on 161: option transport-type tcp 162: option transport.address-family inet 163: option auth.login./srv/gfs/$volname/brick.allow xx 164: option auth.login.xx 165: option auth-path /srv/gfs/$volname/brick 166: option auth.addr./srv/gfs/$volname/brick.allow xx 167: option inode-lru-limit 90000 168: option transport.socket.keepalive 1 169: option lk-heal on 170: option transport.tcp-user-timeout 0 171: option transport.socket.keepalive-time 20 172: option transport.socket.keepalive-interval 2 173: option transport.socket.keepalive-count 9 174: option transport.listen-backlog 10 175: subvolumes /srv/gfs/$volname/brick 176: end-volume 177: +------------------------------------------------------------------------------+ pending frames: frame : type(0) op(26) frame : type(0) op(11) frame : type(0) op(27) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2017-09-16 08:25:01 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.1 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7fe80ffc49ea] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7fe80ffce6c7] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7fe80f3b74b0] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/features/upcall.so(+0xb8af)[0x7fe80825d8af] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_lk_resume+0x1c2)[0x7fe810051262] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x75)[0x7fe80ffe7325] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/performance/io-threads.so(+0x4974)[0x7fe80804b974] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fe80f7536ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe80f4893dd] Expected results: not to crash Additional info: fist i was suspisious about cache-invalidation (on), so i also turned off this feature, but brick daemon has also crashed. # gluster volume set $volname features.cache-invalidation on # gluster volume set $volname performance.cache-invalidation on # /etc/init.d/glusterfs-server stop # /etc/init.d/glusterfs-server start Final graph: +------------------------------------------------------------------------------+ 1: volume $volname-posix 2: type storage/posix 3: option glusterd-uuid xx 4: option directory /srv/gfs/$volname/brick 5: option volume-id xx 6: option shared-brick-count 1 7: option linux-aio on 8: end-volume 9: 10: volume $volname-trash 11: type features/trash 12: option trash-dir .trashcan 13: option brick-path /srv/gfs/$volname/brick 14: option trash-internal-op off 15: subvolumes $volname-posix 16: end-volume 17: 18: volume $volname-changetimerecorder 19: type features/changetimerecorder 20: option db-type sqlite3 21: option hot-brick off 22: option db-name brick.db 23: option db-path /srv/gfs/$volname/brick/.glusterfs/ 24: option record-exit off 25: option ctr_link_consistency off 26: option ctr_lookupheal_link_timeout 300 27: option ctr_lookupheal_inode_timeout 300 28: option record-entry on 29: option ctr-enabled off 30: option record-counters off 31: option ctr-record-metadata-heat off 32: option sql-db-cachesize 12500 33: option sql-db-wal-autocheckpoint 25000 34: subvolumes $volname-trash 35: end-volume 36: 37: volume $volname-changelog 38: type features/changelog 39: option changelog-brick /srv/gfs/$volname/brick 40: option changelog-dir /srv/gfs/$volname/brick/.glusterfs/changelogs 41: option changelog-barrier-timeout 120 42: subvolumes $volname-changetimerecorder 43: end-volume 44: 45: volume $volname-bitrot-stub 46: type features/bitrot-stub 47: option export /srv/gfs/$volname/brick 48: option bitrot disable 49: subvolumes $volname-changelog 50: end-volume 51: 52: volume $volname-access-control 53: type features/access-control 54: subvolumes $volname-bitrot-stub 55: end-volume 56: 57: volume $volname-locks 58: type features/locks 59: option revocation-secs 1800 60: subvolumes $volname-access-control 61: end-volume 62: 63: volume $volname-worm 64: type features/worm 65: option worm off 66: option worm-file-level off 67: subvolumes $volname-locks 68: end-volume 69: 70: volume $volname-read-only 71: type features/read-only 72: option read-only off 73: subvolumes $volname-worm 74: end-volume 75: 76: volume $volname-leases 77: type features/leases 78: option leases off 79: subvolumes $volname-read-only 80: end-volume 81: 82: volume $volname-upcall 83: type features/upcall 84: option cache-invalidation off 85: option cache-invalidation-timeout 600 86: subvolumes $volname-leases 87: end-volume 88: 89: volume $volname-io-threads 90: type performance/io-threads 91: subvolumes $volname-upcall 92: end-volume 93: 94: volume $volname-selinux 95: type features/selinux 96: option selinux on 97: subvolumes $volname-io-threads 98: end-volume 99: 100: volume $volname-marker 101: type features/marker 102: option volume-uuid xx 103: option timestamp-file /var/lib/glusterd/vols/$volname/marker.tstamp 104: option quota-version 0 105: option xtime off 106: option gsync-force-xtime off 107: option quota off 108: option inode-quota off 109: subvolumes $volname-selinux 110: end-volume 111: 112: volume $volname-barrier 113: type features/barrier 114: option barrier disable 115: option barrier-timeout 120 116: subvolumes $volname-marker 117: end-volume 118: 119: volume $volname-index 120: type features/index 121: option index-base /srv/gfs/$volname/brick/.glusterfs/indices 122: option xattrop-dirty-watchlist trusted.afr.dirty 123: option xattrop-pending-watchlist trusted.afr.$volname- 124: subvolumes $volname-barrier 125: end-volume 126: 127: volume $volname-quota 128: type features/quota 129: option volume-uuid $volname 130: option server-quota off 131: option deem-statfs off 132: subvolumes $volname-index 133: end-volume 134: 135: volume $volname-io-stats 136: type debug/io-stats 137: option unique-id /srv/gfs/$volname/brick 138: option log-level WARNING 139: option sys-log-level CRITICAL 140: option latency-measurement off 141: option count-fop-hits off 142: subvolumes $volname-quota 143: end-volume 144: 145: volume /srv/gfs/$volname/brick 146: type performance/decompounder 147: option auth.addr./srv/gfs/$volname/brick.allow xx 148: option auth-path /srv/gfs/$volname/brick 149: option auth.login.xx 150: option auth.login./srv/gfs/$volname/brick.allow xx 151: subvolumes $volname-io-stats 152: end-volume 153: 154: volume $volname-server 155: type protocol/server 156: option transport.socket.listen-port 49153 157: option rpc-auth.auth-glusterfs on 158: option rpc-auth.auth-unix on 159: option rpc-auth.auth-null on 160: option rpc-auth-allow-insecure on 161: option transport-type tcp 162: option transport.address-family inet 163: option auth.login./srv/gfs/$volname/brick.allow xx 164: option auth.login.xx 165: option auth-path /srv/gfs/$volname/brick 166: option auth.addr./srv/gfs/$volname/brick.allow xx 167: option inode-lru-limit 90000 168: option transport.socket.keepalive 1 169: option lk-heal on 170: option transport.tcp-user-timeout 0 171: option transport.socket.keepalive-time 20 172: option transport.socket.keepalive-interval 2 173: option transport.socket.keepalive-count 9 174: option transport.listen-backlog 10 175: subvolumes /srv/gfs/$volname/brick 176: end-volume 177: +------------------------------------------------------------------------------+ pending frames: frame : type(0) op(26) frame : type(0) op(27) frame : type(0) op(27) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2017-09-16 09:16:52 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.1 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7f3d59fe19ea] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7f3d59feb6c7] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f3d593d44b0] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/features/locks.so(+0x18bf1)[0x7f3d4ea69bf1] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/features/worm.so(+0x2483)[0x7f3d4e845483] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/features/read-only.so(+0x1f03)[0x7f3d4e63cf03] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/features/leases.so(+0x6377)[0x7f3d4e42c377] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/features/upcall.so(+0xba83)[0x7f3d4e215a83] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_lk_resume+0x1c2)[0x7f3d5a06e262] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x75)[0x7f3d5a004325] /usr/lib/x86_64-linux-gnu/glusterfs/3.12.1/xlator/performance/io-threads.so(+0x4974)[0x7f3d4e003974] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f3d597706ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f3d594a63dd]
After 3.12 series, we removed lock-recovery logic, and hence this cannot be reproduced now!