Description of problem: ====================== Observing the below Assertion failed messages in rebalance logs during rebalance [2016-11-24 06:51:34.447866] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 06:51:34.508070] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 06:51:38.229553] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 06:54:16.408204] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 06:54:17.452540] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 06:58:45.236521] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 07:41:15.196585] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 07:41:19.539656] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-11-24 07:41:21.841912] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) Version-Release number of selected component (if applicable): 3.8.4-5.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: =================== 1) Create a 2 X (4+2) distributed-disperse volume and start it. 2) FUSE mount the volume on multiple clients. 3) From mount point, create 100K files and untar linux kernel package. 4) Wait till step-3 completes. 5) From client-1, start renaming the files and from other clients start continuous lookups. 6) While rename+ lookups are in progress, add few bricks and start rebalance. Keep monitoring rebalance logs and at some point we are seeing Assertion failure error messages in rebalance logs. Actual results: =============== Assertion Failed Error messages in rebalance logs during rebalance. Expected results: ================ There should not be any error messages during rebalance in rebalance logs.
While trying to reproduce this issue, I hit with the crash - (gdb) bt #0 0x00007f7503611765 in raise () from /lib64/libc.so.6 #1 0x00007f750361336a in abort () from /lib64/libc.so.6 #2 0x00007f7503609f97 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007f750360a042 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f74f77bafd4 in ec_manager_setattr (fop=0x7f74ec0b491c, state=4) at ec-inode-write.c:394 #5 0x00007f74f779c405 in __ec_manager (fop=0x7f74ec0b491c, error=0) at ec-common.c:2283 #6 0x00007f74f7797d90 in ec_resume (fop=0x7f74ec0b491c, error=0) at ec-common.c:289 #7 0x00007f74f7797fba in ec_complete (fop=0x7f74ec0b491c) at ec-common.c:362 #8 0x00007f74f77b979a in ec_inode_write_cbk (frame=0x7f74ec2d82bc, this=0x7f74f002e770, cookie=0x3, op_ret=0, op_errno=0, prestat=0x7f74f848b980, poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:65 #9 0x00007f74f77baacc in ec_setattr_cbk (frame=0x7f74ec2d82bc, cookie=0x3, this=0x7f74f002e770, op_ret=0, op_errno=0, prestat=0x7f74f848b980, poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:349 #10 0x00007f74f7a5563d in client3_3_setattr_cbk (req=0x7f74ec0033ec, iov=0x7f74ec00342c, count=1, myframe=0x7f74ec16fabc) at client-rpc-fops.c:2264 #11 0x00007f7504d8026a in rpc_clnt_handle_reply (clnt=0x7f74f005fe70, pollin=0x7f74f0321c60) at rpc-clnt.c:790 #12 0x00007f7504d807c3 in rpc_clnt_notify (trans=0x7f74f0060300, mydata=0x7f74f005fec8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-clnt.c:970 #13 0x00007f7504d7ca37 in rpc_transport_notify (this=0x7f74f0060300, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-transport.c:537 #14 0x00007f74f9f53ca1 in socket_event_poll_in (this=0x7f74f0060300) at socket.c:2265 #15 0x00007f74f9f541f2 in socket_event_handler (fd=29, idx=19, data=0x7f74f0060300, poll_in=1, poll_out=0, poll_err=0) at socket.c:2395 #16 0x00007f7505024dbf in event_dispatch_epoll_handler (event_pool=0x1268010, event=0x7f74f848bea0) at event-epoll.c:571 #17 0x00007f750502519e in event_dispatch_epoll_worker (data=0x12b1a60) at event-epoll.c:674 #18 0x00007f7503e065ca in start_thread () from /lib64/libpthread.so.0 #19 0x00007f75036e00ed in clone () from /lib64/libc.so.6 ------------------------------ It is crashing in ec_manager_setattr at - if (cbk->iatt[0].ia_type == IA_IFREG) { ec_iatt_rebuild(fop->xl->private, cbk->iatt, 2, cbk->count); /* This shouldn't fail because we have the inode locked. */ GF_ASSERT(ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)); cbk->iatt[1].ia_size = cbk->iatt[0].ia_size; } -------------------------------- Reason behind this is that, a file which is a link "IA_IFLNK" is coming as regular file "IA_IFREG" in callback. (gdb) p *cbk $6 = {list = {next = 0x7f74ec0b495c, prev = 0x7f74ec0b495c}, answer_list = {next = 0x7f74ec0b496c, prev = 0x7f74f04271ec}, fop = 0x7f74ec0b491c, next = 0x7f74f04271dc, idx = 3, op_ret = 0, op_errno = 0, count = 6, mask = 63, dirty = {0, 0}, xdata = 0x7f74f01c6a3c, dict = 0x0, int32 = 0, uintptr = {0, 0, 0}, size = 0, version = {0, 0}, inode = 0x0, fd = 0x0, statvfs = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, iatt = {{ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, ia_mtime_nsec = 489040926, ia_ctime = 1481113459, ia_ctime_nsec = 492040918}, {ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFREG, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, ia_mtime_nsec = 489040926, ia_ctime = 1481113459, ia_ctime_nsec = 494040912}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, str = 0x0, entries = {{list = {next = 0x7f74f01b4884, prev = 0x7f74f01b4884}, {next = 0x7f74f01b4884, prev = 0x7f74f01b4884}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name = 0x7f74f01b492c ""}, offset = 0, what = GF_SEEK_DATA} (gdb) (gdb) (gdb) p *cbk->fop $7 = {id = 38, refs = 3, state = 4, minimum = 4, expected = 6, winds = 0, jobs = 1, error = 0, parent = 0x0, xl = 0x7f74f002e770, req_frame = 0x7f74ec218d8c, frame = 0x7f74ec2d82bc, cbk_list = {next = 0x7f74f01b40fc, prev = 0x7f74f01b40fc}, answer_list = {next = 0x7f74ec12ef2c, prev = 0x7f74f01b410c}, pending_list = {next = 0x7f74e0017adc, prev = 0x7f74ec154f6c}, answer = 0x7f74f01b40fc, lock_count = 1, locked = 1, locks = {{ lock = 0x7f74ec07b3bc, fop = 0x7f74ec0b491c, owner_list = {next = 0x7f74ec07b3cc, prev = 0x7f74ec07b3cc}, wait_list = {next = 0x7f74ec0b49bc, prev = 0x7f74ec0b49bc}, update = {_gf_false, _gf_true}, base = 0x0, size = 0}, { lock = 0x0, fop = 0x0, owner_list = {next = 0x7f74ec0b49f4, prev = 0x7f74ec0b49f4}, wait_list = {next = 0x7f74ec0b4a04, prev = 0x7f74ec0b4a04}, update = {_gf_false, _gf_false}, base = 0x0, size = 0}}, first_lock = 0, lock = { spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, flags = 0, first = 0, mask = 63, healing = 0, remaining = 0, received = 63, good = 63, uid = 0, gid = 0, wind = 0x7f74f77baad2 <ec_wind_setattr>, handler = 0x7f74f77bae65 <ec_manager_setattr>, resume = 0x0, cbks = { access = 0x7f750505f400 <default_setattr_cbk>, create = 0x7f750505f400 <default_setattr_cbk>, discard = 0x7f750505f400 <default_setattr_cbk>, entrylk = 0x7f750505f400 <default_setattr_cbk>, fentrylk = 0x7f750505f400 <default_setattr_cbk>, fallocate = 0x7f750505f400 <default_setattr_cbk>, flush = 0x7f750505f400 <default_setattr_cbk>, fsync = 0x7f750505f400 <default_setattr_cbk>, fsyncdir = 0x7f750505f400 <default_setattr_cbk>, getxattr = 0x7f750505f400 <default_setattr_cbk>, fgetxattr = 0x7f750505f400 <default_setattr_cbk>, heal = 0x7f750505f400 <default_setattr_cbk>, fheal = 0x7f750505f400 <default_setattr_cbk>, inodelk = 0x7f750505f400 <default_setattr_cbk>, finodelk = 0x7f750505f400 <default_setattr_cbk>, link = 0x7f750505f400 <default_setattr_cbk>, lk = 0x7f750505f400 <default_setattr_cbk>, lookup = 0x7f750505f400 <default_setattr_cbk>, mkdir = 0x7f750505f400 <default_setattr_cbk>, mknod = 0x7f750505f400 <default_setattr_cbk>, open = 0x7f750505f400 <default_setattr_cbk>, opendir = 0x7f750505f400 <default_setattr_cbk>, readdir = 0x7f750505f400 <default_setattr_cbk>, readdirp = 0x7f750505f400 <default_setattr_cbk>, readlink = 0x7f750505f400 <default_setattr_cbk>, readv = 0x7f750505f400 <default_setattr_cbk>, removexattr = 0x7f750505f400 <default_setattr_cbk>, fremovexattr = 0x7f750505f400 <default_setattr_cbk>, rename = 0x7f750505f400 <default_setattr_cbk>, rmdir = 0x7f750505f400 <default_setattr_cbk>, setattr = 0x7f750505f400 <default_setattr_cbk>, fsetattr = 0x7f750505f400 <default_setattr_cbk>, setxattr = 0x7f750505f400 <default_setattr_cbk>, fsetxattr = 0x7f750505f400 <default_setattr_cbk>, stat = 0x7f750505f400 <default_setattr_cbk>, fstat = 0x7f750505f400 <default_setattr_cbk>, statfs = 0x7f750505f400 <default_setattr_cbk>, symlink = 0x7f750505f400 <default_setattr_cbk>, truncate = 0x7f750505f400 <default_setattr_cbk>, ftruncate = 0x7f750505f400 <default_setattr_cbk>, unlink = 0x7f750505f400 <default_setattr_cbk>, writev = 0x7f750505f400 <default_setattr_cbk>, xattrop = 0x7f750505f400 <default_setattr_cbk>, fxattrop = 0x7f750505f400 <default_setattr_cbk>, zerofill = 0x7f750505f400 <default_setattr_cbk>, seek = 0x7f750505f400 <default_setattr_cbk>, ipc = 0x7f750505f400 <default_setattr_cbk>}, data = 0x0, heal = 0x0, healer = {next = 0x7f74ec0b4ac4, prev = 0x7f74ec0b4ac4}, user_size = 0, head = 0, use_fd = 0, xdata = 0x7f74ec1ca3fc, dict = 0x0, int32 = 6, uint32 = 0, size = 0, offset = 0, mode = {0, 0}, entrylk_cmd = ENTRYLK_LOCK, entrylk_type = ENTRYLK_RDLCK, xattrop_flags = GF_XATTROP_ADD_ARRAY, dev = 0, inode = 0x0, fd = 0x0, iatt = {ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFLNK, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, other = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 23, ia_blksize = 4096, ia_blocks = 4, ia_atime = 1481113215, ia_atime_nsec = 659685788, ia_mtime = 1474704618, ia_mtime_nsec = 0, ia_ctime = 1481111709, ia_ctime_nsec = 633516169}, str = {0x0, 0x0}, loc = {{ path = 0x7f74ec2f0b80 "/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts", name = 0x7f74ec2f0b9f "sun8i-a33-ippo-q8h-v1.2.dts", inode = 0x7f74cc02756c, parent = 0x7f74e40085dc, gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", pargfid = "\224ĪŠ\346~UGZ\225\067\003\064\003J", <incomplete sequence \362>}, {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, seek = GF_SEEK_DATA} (gdb) The RC for this is that, when we do setattr the link file was actually a link. However, the moment it reached to backend, it has been migrated to some other subvolume. At this point when posix_pstat get called, it is the "T" file on which stat happens. This T file is a simply regular file. root@apandey glusterfs]# [root@apandey glusterfs]# ll /brick/gluster/vol*/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-11/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-12/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts ---------T. 2 root root 0 Dec 7 17:54 /brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts ---------T. 2 root root 0 Dec 7 17:54 /brick/gluster/vol-14/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts ---------T. 2 root root 0 Dec 7 17:54 /brick/gluster/vol-15/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts ---------T. 2 root root 0 Dec 7 17:54 /brick/gluster/vol-16/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts ---------T. 2 root root 0 Dec 7 17:54 /brick/gluster/vol-17/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts ---------T. 2 root root 0 Dec 7 17:54 /brick/gluster/vol-18/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-7/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-8/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-9/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts [root@apandey glusterfs]# [root@apandey glusterfs]# [root@apandey glusterfs]# [root@apandey glusterfs]# stat /brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts File: '/brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts' Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: fd03h/64771d Inode: 4233378 Links: 2 Access: (1000/---------T) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:default_t:s0 Access: 2016-12-07 17:54:19.488040928 +0530 Modify: 2016-12-07 17:54:19.488040928 +0530 Change: 2016-12-07 17:56:05.634760199 +0530 Birth: - [root@apandey glusterfs]# stat /brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts File: '/brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts' -> 'sun8i-a33-q8-tablet.dts' Size: 23 Blocks: 8 IO Block: 4096 symbolic link Device: fd03h/64771d Inode: 4208551 Links: 2 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:default_t:s0 Access: 2016-12-07 17:56:09.681749496 +0530 Modify: 2016-09-24 13:40:18.000000000 +0530 Change: 2016-12-07 17:25:09.628516181 +0530 This is the log I placed in posix - [2016-12-07 12:24:19.494117] E [MSGID: 113018] [posix.c:506:posix_setattr] 0-vol-posix: Getting REG ofr LINK : /brick/gluster/vol-18/.glusterfs/80/1a/801ad5f8-2ec8-44ae-be33-580d55a2a7a9 statpost.ia_type = 1statpre.ia_type = 1 LOC_IS_LNK(loc) = 0 ------------------------ Now, when the call back reaches to EC, the condition if (cbk->iatt[0].ia_type == IA_IFREG) { becomes true and it checks for the size, which it will not have. That leads to assertion failure. I think we have to handle this case in EC. I talked to Prashant and he was saying that had it been given to dht, it would have handled in second phase of setattr.
upstream mainline patch http://review.gluster.org/16103 posted for review.
I tried the same scenario with "--enable-debug" flag OFF to not to throw any assert. I added bricks and started re-balance. The re-balance "completed" without fail and any crash. I confirmed that files and links were migrated to newly added bricks too. I also tried rename of 100 files from mount point and did not see any core. I think this bug does not cause a functionality break for now. volume rebalance: vol: success [root@apandey glusterfs]# gluster v rebalance vol status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 891 3.9KB 3169 0 681 completed 0:0:28 volume rebalance: vol: success
Reproducible on 3.3.0 as well.
From rebalance ogs during 3.3.0 Scale tests : [root@gqas007 glusterfs]# grep -R "Assertion failed" butcher-rebalance.log:[2017-07-19 14:32:00.813871] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) butcher-rebalance.log:[2017-07-19 14:34:50.753306] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) butcher-rebalance.log:[2017-07-19 14:35:10.113387] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [root@gqas007 glusterfs]# sosreports will be updated.
*** Bug 1566417 has been marked as a duplicate of this bug. ***
Status?
Tested with RHGS 3.5.0 interim build ( glusterfs-6.0-15.el7rhgs ) and I see the same error messages in excess when there is active I/O during rebalance of the distributed-disperse volume [2019-10-10 12:42:50.855875] E [ec-inode-write.c:483:ec_manager_setattr] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x12378) [0x7fbdb18e7378] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x1219b) [0x7fbdb18e719b] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x32273) [0x7fbdb1907273] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2019-10-10 12:42:50.868382] E [ec-inode-write.c:483:ec_manager_setattr] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x12378) [0x7fbdb18e7378] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x1219b) [0x7fbdb18e719b] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x32273) [0x7fbdb1907273] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [root@ ~]# grep ' E ' /var/log/glusterfs/backupvol-rebalance.log-20191011 | grep -i 'assertion failed' | wc -l 186
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days