Bug 1805049 - Glusterfsd crashing in ec-inode-write.c, in GF_ASSERT
Summary: Glusterfsd crashing in ec-inode-write.c, in GF_ASSERT
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1706603
Blocks: 1709174 1709660 1806848
TreeView+ depends on / blocked
 
Reported: 2020-02-20 07:34 UTC by Pranith Kumar K
Modified: 2020-03-03 14:05 UTC (History)
2 users (show)

Fixed In Version: glusterfs-5.12
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1706603
Environment:
Last Closed: 2020-02-24 07:54:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 24145 0 None Merged cluster/ec: Reopen shouldn't happen with O_TRUNC 2020-02-24 07:54:47 UTC

Description Pranith Kumar K 2020-02-20 07:34:30 UTC
+++ This bug was initially created as a clone of Bug #1706603 +++

Mount crashing in an 'ASSERT' that checks the inode size in the function ec-inode-write.c

Program terminated with signal 11, Segmentation fault.
#0  0x00007f5502715dcb in ec_manager_truncate (fop=0x7f53ff654910, state=<optimized out>) at ec-inode-write.c:1475
1475	                GF_ASSERT(ec_get_inode_size(fop, fop->locks[0].lock->loc.inode,

This is the corresponding thread:

Thread 1 (Thread 0x7f54f907a700 (LWP 31806)):
#0  0x00007f5502715dcb in ec_manager_truncate (fop=0x7f53ff654910, state=<optimized out>) at ec-inode-write.c:1475
#1  0x00007f55026f399b in __ec_manager (fop=0x7f53ff654910, error=0) at ec-common.c:2698
#2  0x00007f55026f3b78 in ec_resume (fop=0x7f53ff654910, error=0) at ec-common.c:481
#3  0x00007f55026f3c9f in ec_complete (fop=0x7f53ff654910) at ec-common.c:554
#4  0x00007f5502711d0c in ec_inode_write_cbk (frame=<optimized out>, this=0x7f54fc186380, cookie=0x3, op_ret=0, op_errno=0, prestat=0x7f54f9079920, poststat=0x7f54f9079990, xdata=0x0) at ec-inode-write.c:156
#5  0x00007f550298224c in client3_3_ftruncate_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f5488ba7870) at client-rpc-fops.c:1415
#6  0x00007f5510476960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f54fc4a1330, pollin=pollin@entry=0x7f549b65dc30) at rpc-clnt.c:778
#7  0x00007f5510476d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f54fc4a1360, event=<optimized out>, data=0x7f549b65dc30) at rpc-clnt.c:971
#8  0x00007f5510472a73 in rpc_transport_notify (this=this@entry=0x7f54fc4a1500, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f549b65dc30) at rpc-transport.c:538
#9  0x00007f5505067566 in socket_event_poll_in (this=this@entry=0x7f54fc4a1500, notify_handled=<optimized out>) at socket.c:2315
#10 0x00007f5505069b0c in socket_event_handler (fd=90, idx=99, gen=472, data=0x7f54fc4a1500, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#11 0x00007f551070c7e4 in event_dispatch_epoll_handler (event=0x7f54f9079e80, event_pool=0x5625cf18aa30) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x7f54fc296580) at event-epoll.c:659
#13 0x00007f550f50ddd5 in start_thread (arg=0x7f54f907a700) at pthread_create.c:307
#14 0x00007f550edd5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

We're crashing in this part of the code, specifically:

(gdb) l
1470
1471	                /* This shouldn't fail because we have the inode locked. */
1472	                /* Inode size doesn't need to be updated under locks, because
1473	                 * conflicting operations won't be in-flight
1474	                 */
1475	                GF_ASSERT(ec_get_inode_size(fop, fop->locks[0].lock->loc.inode,
1476	                                            &cbk->iatt[0].ia_size));
1477	                cbk->iatt[1].ia_size = fop->user_size;
1478	                /* This shouldn't fail because we have the inode locked. */
1479	                GF_ASSERT(ec_set_inode_size(fop, fop->locks[0].lock->loc.inode,

(gdb) p *cbk
$7 = {list = {next = 0x7f53ff654950, prev = 0x7f53ff654950}, answer_list = {next = 0x7f53ff654960, prev = 0x7f53ff654960}, fop = 0x7f53ff654910, next = 0x0, idx = 3, op_ret = 0, op_errno = 0, count = 1, mask = 8,
  xdata = 0x0, dict = 0x0, int32 = 0, uintptr = {0, 0, 0}, size = 0, version = {0, 0}, inode = 0x0, fd = 0x0, statvfs = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0,
    f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, iatt = {{ia_ino = 12285952560967103824, ia_gfid = "\337\b\247-\b\344Fݪ\200x«\276\265P", ia_dev = 2224, ia_type = IA_IFREG,
      ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 1 '\001', write = 0 '\000', exec = 1 '\001'}, other = {read = 1 '\001',
          write = 0 '\000', exec = 1 '\001'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 491520, ia_blksize = 4096, ia_blocks = 3840, ia_atime = 1557032019, ia_atime_nsec = 590833985,
      ia_mtime = 1557032498, ia_mtime_nsec = 824769499, ia_ctime = 1557032498, ia_ctime_nsec = 824769499}, {ia_ino = 12285952560967103824, ia_gfid = "\337\b\247-\b\344Fݪ\200x«\276\265P", ia_dev = 2224, ia_type = IA_IFREG,
      ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 1 '\001', write = 0 '\000', exec = 1 '\001'}, other = {read = 1 '\001',
          write = 0 '\000', exec = 1 '\001'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1557032019, ia_atime_nsec = 590833985, ia_mtime = 1557032498,
      ia_mtime_nsec = 824769499, ia_ctime = 1557032498, ia_ctime_nsec = 824769499}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000',
        sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}},
      ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0,
      ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {
          read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0,
      ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000',
        sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000',
          exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}},
  flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, str = 0x0, entries = {{list = {next = 0x7f54429a3188,
        prev = 0x7f54429a3188}, {next = 0x7f54429a3188, prev = 0x7f54429a3188}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL,
      ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000',
          write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0,
      ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name = 0x7f54429a3230 ""}, offset = 0, what = GF_SEEK_DATA}


(gdb) p *cbk->fop
$8 = {id = 24, refs = 3, state = 4, minimum = 1, expected = 1, winds = 0, jobs = 1, error = 0, parent = 0x7f532c197d80, xl = 0x7f54fc186380, req_frame = 0x7f532c048c60, frame = 0x7f54700662d0, cbk_list = {
    next = 0x7f54429a2a10, prev = 0x7f54429a2a10}, answer_list = {next = 0x7f54429a2a20, prev = 0x7f54429a2a20}, pending_list = {next = 0x7f533007acc0, prev = 0x7f5477976ac0}, answer = 0x7f54429a2a10, lock_count = 0,
  locked = 0, locks = {{lock = 0x0, fop = 0x0, owner_list = {next = 0x7f53ff6549a0, prev = 0x7f53ff6549a0}, wait_list = {next = 0x7f53ff6549b0, prev = 0x7f53ff6549b0}, update = {_gf_false, _gf_false}, dirty = {_gf_false,
        _gf_false}, optimistic_changelog = _gf_false, base = 0x0, size = 0, waiting_flags = 0, fl_start = 0, fl_end = 0}, {lock = 0x0, fop = 0x0, owner_list = {next = 0x7f53ff654a10, prev = 0x7f53ff654a10}, wait_list = {
        next = 0x7f53ff654a20, prev = 0x7f53ff654a20}, update = {_gf_false, _gf_false}, dirty = {_gf_false, _gf_false}, optimistic_changelog = _gf_false, base = 0x0, size = 0, waiting_flags = 0, fl_start = 0, fl_end = 0}},
  first_lock = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
      __size = '\000' <repeats 39 times>, __align = 0}}, flags = 0, first = 0, mask = 8, healing = 0, remaining = 0, received = 8, good = 8, uid = 0, gid = 0, wind = 0x7f5502710ae0 <ec_wind_ftruncate>,
  handler = 0x7f5502715c50 <ec_manager_truncate>, resume = 0x0, cbks = {access = 0x7f550270af50 <ec_open_truncate_cbk>, create = 0x7f550270af50 <ec_open_truncate_cbk>, discard = 0x7f550270af50 <ec_open_truncate_cbk>,
    entrylk = 0x7f550270af50 <ec_open_truncate_cbk>, fentrylk = 0x7f550270af50 <ec_open_truncate_cbk>, fallocate = 0x7f550270af50 <ec_open_truncate_cbk>, flush = 0x7f550270af50 <ec_open_truncate_cbk>,
    fsync = 0x7f550270af50 <ec_open_truncate_cbk>, fsyncdir = 0x7f550270af50 <ec_open_truncate_cbk>, getxattr = 0x7f550270af50 <ec_open_truncate_cbk>, fgetxattr = 0x7f550270af50 <ec_open_truncate_cbk>,
    heal = 0x7f550270af50 <ec_open_truncate_cbk>, fheal = 0x7f550270af50 <ec_open_truncate_cbk>, inodelk = 0x7f550270af50 <ec_open_truncate_cbk>, finodelk = 0x7f550270af50 <ec_open_truncate_cbk>,
    link = 0x7f550270af50 <ec_open_truncate_cbk>, lk = 0x7f550270af50 <ec_open_truncate_cbk>, lookup = 0x7f550270af50 <ec_open_truncate_cbk>, mkdir = 0x7f550270af50 <ec_open_truncate_cbk>,
    mknod = 0x7f550270af50 <ec_open_truncate_cbk>, open = 0x7f550270af50 <ec_open_truncate_cbk>, opendir = 0x7f550270af50 <ec_open_truncate_cbk>, readdir = 0x7f550270af50 <ec_open_truncate_cbk>,
    readdirp = 0x7f550270af50 <ec_open_truncate_cbk>, readlink = 0x7f550270af50 <ec_open_truncate_cbk>, readv = 0x7f550270af50 <ec_open_truncate_cbk>, removexattr = 0x7f550270af50 <ec_open_truncate_cbk>,
    fremovexattr = 0x7f550270af50 <ec_open_truncate_cbk>, rename = 0x7f550270af50 <ec_open_truncate_cbk>, rmdir = 0x7f550270af50 <ec_open_truncate_cbk>, setattr = 0x7f550270af50 <ec_open_truncate_cbk>,
    fsetattr = 0x7f550270af50 <ec_open_truncate_cbk>, setxattr = 0x7f550270af50 <ec_open_truncate_cbk>, fsetxattr = 0x7f550270af50 <ec_open_truncate_cbk>, stat = 0x7f550270af50 <ec_open_truncate_cbk>,
    fstat = 0x7f550270af50 <ec_open_truncate_cbk>, statfs = 0x7f550270af50 <ec_open_truncate_cbk>, symlink = 0x7f550270af50 <ec_open_truncate_cbk>, truncate = 0x7f550270af50 <ec_open_truncate_cbk>,
    ftruncate = 0x7f550270af50 <ec_open_truncate_cbk>, unlink = 0x7f550270af50 <ec_open_truncate_cbk>, writev = 0x7f550270af50 <ec_open_truncate_cbk>, xattrop = 0x7f550270af50 <ec_open_truncate_cbk>,
    fxattrop = 0x7f550270af50 <ec_open_truncate_cbk>, zerofill = 0x7f550270af50 <ec_open_truncate_cbk>, seek = 0x7f550270af50 <ec_open_truncate_cbk>, ipc = 0x7f550270af50 <ec_open_truncate_cbk>}, data = 0x7f5477976a60,
  heal = 0x0, healer = {next = 0x7f53ff654b08, prev = 0x7f53ff654b08}, user_size = 0, head = 0, use_fd = 1, xdata = 0x0, dict = 0x0, int32 = 0, uint32 = 0, size = 0, offset = 0, mode = {0, 0}, entrylk_cmd = ENTRYLK_LOCK,
  entrylk_type = ENTRYLK_RDLCK, xattrop_flags = GF_XATTROP_ADD_ARRAY, dev = 0, inode = 0x0, fd = 0x7f54dfb900a0, iatt = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {
      suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000',
        write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0,
    ia_ctime_nsec = 0}, str = {0x0, 0x0}, loc = {{path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0,
      gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0,
  buffers = 0x0, seek = GF_SEEK_DATA, errstr = 0x0}
  
Checking further the lock:

(gdb) p fop->locks[0]
$5 = {lock = 0x0, fop = 0x0, owner_list = {next = 0x7f53ff6549a0, prev = 0x7f53ff6549a0}, wait_list = {next = 0x7f53ff6549b0, prev = 0x7f53ff6549b0}, update = {_gf_false, _gf_false}, dirty = {_gf_false, _gf_false},
  optimistic_changelog = _gf_false, base = 0x0, size = 0, waiting_flags = 0, fl_start = 0, fl_end = 0}

(gdb) p fop->locks[0].lock
$6 = (ec_lock_t *) 0x0

(gdb) p fop->locks[0].lock->loc.inode
Cannot access memory at address 0x90

--- Additional comment from Worker Ant on 2019-05-06 00:01:57 UTC ---

REVIEW: https://review.gluster.org/22660 (cluster/ec: Reopen shouldn't happen with O_TRUNC) merged (#1) on master by Pranith Kumar Karampuri

--- Additional comment from Worker Ant on 2019-05-07 12:25:28 UTC ---

REVIEW: https://review.gluster.org/22674 (tests: Test openfd heal doesn't truncate files) posted (#1) for review on master by Pranith Kumar Karampuri

--- Additional comment from Worker Ant on 2019-05-24 03:46:39 UTC ---

REVIEW: https://review.gluster.org/22674 (tests: Test openfd heal doesn't truncate files) merged (#8) on master by Pranith Kumar Karampuri

Comment 1 Worker Ant 2020-02-20 08:29:38 UTC
REVIEW: https://review.gluster.org/24145 (cluster/ec: Reopen shouldn't happen with O_TRUNC) posted (#1) for review on release-5 by Pranith Kumar Karampuri

Comment 2 Worker Ant 2020-02-24 07:54:48 UTC
REVIEW: https://review.gluster.org/24145 (cluster/ec: Reopen shouldn't happen with O_TRUNC) merged (#2) on release-5 by hari gowtham

Comment 3 hari gowtham 2020-03-02 08:36:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.12, please open a new bug report.

glusterfs-5.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/gluster-users/2020-March/037797.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.