Bug 1398196

Summary: Assertion Failed Error messages in rebalance logs during rebalance
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: distributeAssignee: Susant Kumar Palai <spalai>
Status: CLOSED DEFERRED QA Contact: Sayalee <saraut>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.2CC: amukherj, aspandey, nchilaka, pkarampu, pprakash, rhinduja, rhs-bugs, sabose, saraut, sasundar, spalai, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1402538 (view as bug list) Environment:
Last Closed: 2020-01-02 05:35:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1402538    

Description Prasad Desala 2016-11-24 09:48:36 UTC
Description of problem:
======================
Observing the below Assertion failed messages in rebalance logs during rebalance

[2016-11-24 06:51:34.447866] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:51:34.508070] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:51:38.229553] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:54:16.408204] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:54:17.452540] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:58:45.236521] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 07:41:15.196585] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 07:41:19.539656] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 07:41:21.841912] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
 

Version-Release number of selected component (if applicable):
3.8.4-5.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a 2 X (4+2) distributed-disperse volume and start it.
2) FUSE mount the volume on multiple clients.
3) From mount point, create 100K files and untar linux kernel package.
4) Wait till step-3 completes.
5) From client-1, start renaming the files and from other clients start continuous lookups.
6) While rename+ lookups are in progress, add few bricks and start rebalance.

Keep monitoring rebalance logs and at some point we are seeing Assertion failure error messages in rebalance logs.

Actual results:
===============
Assertion Failed Error messages in rebalance logs during rebalance.

Expected results:
================
There should not be any error messages during rebalance in rebalance logs.

Comment 6 Ashish Pandey 2016-12-07 13:16:51 UTC
While trying to reproduce this issue, I hit with the crash -

(gdb) bt
#0  0x00007f7503611765 in raise () from /lib64/libc.so.6
#1  0x00007f750361336a in abort () from /lib64/libc.so.6
#2  0x00007f7503609f97 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f750360a042 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f74f77bafd4 in ec_manager_setattr (fop=0x7f74ec0b491c, state=4) at ec-inode-write.c:394
#5  0x00007f74f779c405 in __ec_manager (fop=0x7f74ec0b491c, error=0) at ec-common.c:2283
#6  0x00007f74f7797d90 in ec_resume (fop=0x7f74ec0b491c, error=0) at ec-common.c:289
#7  0x00007f74f7797fba in ec_complete (fop=0x7f74ec0b491c) at ec-common.c:362
#8  0x00007f74f77b979a in ec_inode_write_cbk (frame=0x7f74ec2d82bc, this=0x7f74f002e770, cookie=0x3, op_ret=0, op_errno=0, prestat=0x7f74f848b980, poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:65
#9  0x00007f74f77baacc in ec_setattr_cbk (frame=0x7f74ec2d82bc, cookie=0x3, this=0x7f74f002e770, op_ret=0, op_errno=0, prestat=0x7f74f848b980, poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:349
#10 0x00007f74f7a5563d in client3_3_setattr_cbk (req=0x7f74ec0033ec, iov=0x7f74ec00342c, count=1, myframe=0x7f74ec16fabc) at client-rpc-fops.c:2264
#11 0x00007f7504d8026a in rpc_clnt_handle_reply (clnt=0x7f74f005fe70, pollin=0x7f74f0321c60) at rpc-clnt.c:790
#12 0x00007f7504d807c3 in rpc_clnt_notify (trans=0x7f74f0060300, mydata=0x7f74f005fec8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-clnt.c:970
#13 0x00007f7504d7ca37 in rpc_transport_notify (this=0x7f74f0060300, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-transport.c:537
#14 0x00007f74f9f53ca1 in socket_event_poll_in (this=0x7f74f0060300) at socket.c:2265
#15 0x00007f74f9f541f2 in socket_event_handler (fd=29, idx=19, data=0x7f74f0060300, poll_in=1, poll_out=0, poll_err=0) at socket.c:2395
#16 0x00007f7505024dbf in event_dispatch_epoll_handler (event_pool=0x1268010, event=0x7f74f848bea0) at event-epoll.c:571
#17 0x00007f750502519e in event_dispatch_epoll_worker (data=0x12b1a60) at event-epoll.c:674
#18 0x00007f7503e065ca in start_thread () from /lib64/libpthread.so.0
#19 0x00007f75036e00ed in clone () from /lib64/libc.so.6

------------------------------

It is crashing in ec_manager_setattr at -

 if (cbk->iatt[0].ia_type == IA_IFREG) {
                    ec_iatt_rebuild(fop->xl->private, cbk->iatt, 2,
                                    cbk->count);

                    /* This shouldn't fail because we have the inode locked. */
                    GF_ASSERT(ec_get_inode_size(fop,
                                                fop->locks[0].lock->loc.inode,
                                                &cbk->iatt[0].ia_size));
                    cbk->iatt[1].ia_size = cbk->iatt[0].ia_size;
                }

--------------------------------

Reason behind this is that, a file which is a link "IA_IFLNK" is coming as regular file "IA_IFREG" in callback.

(gdb) p *cbk
$6 = {list = {next = 0x7f74ec0b495c, prev = 0x7f74ec0b495c}, answer_list = {next = 0x7f74ec0b496c, prev = 0x7f74f04271ec}, fop = 0x7f74ec0b491c, next = 0x7f74f04271dc, idx = 3, op_ret = 0, op_errno = 0, count = 6, mask = 63, dirty = {0, 
    0}, xdata = 0x7f74f01c6a3c, dict = 0x0, int32 = 0, uintptr = {0, 0, 0}, size = 0, version = {0, 0}, inode = 0x0, fd = 0x0, statvfs = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, 
    f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, iatt = {{ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFREG, 
      ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, 
      ia_mtime_nsec = 489040926, ia_ctime = 1481113459, ia_ctime_nsec = 492040918}, {ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFREG, ia_prot = {
        suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, ia_mtime_nsec = 489040926, 
      ia_ctime = 1481113459, ia_ctime_nsec = 494040912}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
      ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {
        suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, 
      ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, 
      ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, 
  str = 0x0, entries = {{list = {next = 0x7f74f01b4884, prev = 0x7f74f01b4884}, {next = 0x7f74f01b4884, prev = 0x7f74f01b4884}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, 
      ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {
          read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, 
      ia_ctime = 0, ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name = 0x7f74f01b492c ""}, offset = 0, what = GF_SEEK_DATA}
(gdb)                 
(gdb) 
(gdb) p *cbk->fop
$7 = {id = 38, refs = 3, state = 4, minimum = 4, expected = 6, winds = 0, jobs = 1, error = 0, parent = 0x0, xl = 0x7f74f002e770, req_frame = 0x7f74ec218d8c, frame = 0x7f74ec2d82bc, cbk_list = {next = 0x7f74f01b40fc, 
    prev = 0x7f74f01b40fc}, answer_list = {next = 0x7f74ec12ef2c, prev = 0x7f74f01b410c}, pending_list = {next = 0x7f74e0017adc, prev = 0x7f74ec154f6c}, answer = 0x7f74f01b40fc, lock_count = 1, locked = 1, locks = {{
      lock = 0x7f74ec07b3bc, fop = 0x7f74ec0b491c, owner_list = {next = 0x7f74ec07b3cc, prev = 0x7f74ec07b3cc}, wait_list = {next = 0x7f74ec0b49bc, prev = 0x7f74ec0b49bc}, update = {_gf_false, _gf_true}, base = 0x0, size = 0}, {
      lock = 0x0, fop = 0x0, owner_list = {next = 0x7f74ec0b49f4, prev = 0x7f74ec0b49f4}, wait_list = {next = 0x7f74ec0b4a04, prev = 0x7f74ec0b4a04}, update = {_gf_false, _gf_false}, base = 0x0, size = 0}}, first_lock = 0, lock = {
    spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, 
  flags = 0, first = 0, mask = 63, healing = 0, remaining = 0, received = 63, good = 63, uid = 0, gid = 0, wind = 0x7f74f77baad2 <ec_wind_setattr>, handler = 0x7f74f77bae65 <ec_manager_setattr>, resume = 0x0, cbks = {
    access = 0x7f750505f400 <default_setattr_cbk>, create = 0x7f750505f400 <default_setattr_cbk>, discard = 0x7f750505f400 <default_setattr_cbk>, entrylk = 0x7f750505f400 <default_setattr_cbk>, 
    fentrylk = 0x7f750505f400 <default_setattr_cbk>, fallocate = 0x7f750505f400 <default_setattr_cbk>, flush = 0x7f750505f400 <default_setattr_cbk>, fsync = 0x7f750505f400 <default_setattr_cbk>, 
    fsyncdir = 0x7f750505f400 <default_setattr_cbk>, getxattr = 0x7f750505f400 <default_setattr_cbk>, fgetxattr = 0x7f750505f400 <default_setattr_cbk>, heal = 0x7f750505f400 <default_setattr_cbk>, 
    fheal = 0x7f750505f400 <default_setattr_cbk>, inodelk = 0x7f750505f400 <default_setattr_cbk>, finodelk = 0x7f750505f400 <default_setattr_cbk>, link = 0x7f750505f400 <default_setattr_cbk>, lk = 0x7f750505f400 <default_setattr_cbk>, 
    lookup = 0x7f750505f400 <default_setattr_cbk>, mkdir = 0x7f750505f400 <default_setattr_cbk>, mknod = 0x7f750505f400 <default_setattr_cbk>, open = 0x7f750505f400 <default_setattr_cbk>, opendir = 0x7f750505f400 <default_setattr_cbk>, 
    readdir = 0x7f750505f400 <default_setattr_cbk>, readdirp = 0x7f750505f400 <default_setattr_cbk>, readlink = 0x7f750505f400 <default_setattr_cbk>, readv = 0x7f750505f400 <default_setattr_cbk>, 
    removexattr = 0x7f750505f400 <default_setattr_cbk>, fremovexattr = 0x7f750505f400 <default_setattr_cbk>, rename = 0x7f750505f400 <default_setattr_cbk>, rmdir = 0x7f750505f400 <default_setattr_cbk>, 
    setattr = 0x7f750505f400 <default_setattr_cbk>, fsetattr = 0x7f750505f400 <default_setattr_cbk>, setxattr = 0x7f750505f400 <default_setattr_cbk>, fsetxattr = 0x7f750505f400 <default_setattr_cbk>, 
    stat = 0x7f750505f400 <default_setattr_cbk>, fstat = 0x7f750505f400 <default_setattr_cbk>, statfs = 0x7f750505f400 <default_setattr_cbk>, symlink = 0x7f750505f400 <default_setattr_cbk>, 
    truncate = 0x7f750505f400 <default_setattr_cbk>, ftruncate = 0x7f750505f400 <default_setattr_cbk>, unlink = 0x7f750505f400 <default_setattr_cbk>, writev = 0x7f750505f400 <default_setattr_cbk>, 
    xattrop = 0x7f750505f400 <default_setattr_cbk>, fxattrop = 0x7f750505f400 <default_setattr_cbk>, zerofill = 0x7f750505f400 <default_setattr_cbk>, seek = 0x7f750505f400 <default_setattr_cbk>, 
    ipc = 0x7f750505f400 <default_setattr_cbk>}, data = 0x0, heal = 0x0, healer = {next = 0x7f74ec0b4ac4, prev = 0x7f74ec0b4ac4}, user_size = 0, head = 0, use_fd = 0, xdata = 0x7f74ec1ca3fc, dict = 0x0, int32 = 6, uint32 = 0, size = 0, 
  offset = 0, mode = {0, 0}, entrylk_cmd = ENTRYLK_LOCK, entrylk_type = ENTRYLK_RDLCK, xattrop_flags = GF_XATTROP_ADD_ARRAY, dev = 0, inode = 0x0, fd = 0x0, iatt = {ia_ino = 13705394905313093545, 
    ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFLNK, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, 
      group = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, other = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 23, ia_blksize = 4096, ia_blocks = 4, 
    ia_atime = 1481113215, ia_atime_nsec = 659685788, ia_mtime = 1474704618, ia_mtime_nsec = 0, ia_ctime = 1481111709, ia_ctime_nsec = 633516169}, str = {0x0, 0x0}, loc = {{
      path = 0x7f74ec2f0b80 "/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts", name = 0x7f74ec2f0b9f "sun8i-a33-ippo-q8h-v1.2.dts", inode = 0x7f74cc02756c, parent = 0x7f74e40085dc, 
      gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", pargfid = "\224ϩ\346~UGZ\225\067\003\064\003J", <incomplete sequence \362>}, {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, 
      pargfid = '\000' <repeats 15 times>}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, seek = GF_SEEK_DATA}
(gdb) 


The RC for this is that, when we do setattr the link file was actually a link.
However, the moment it reached to backend, it has been migrated to some other subvolume. At this point when posix_pstat get called, it is the "T" file on which stat happens. This T file is a simply regular file.


root@apandey glusterfs]# 
[root@apandey glusterfs]# ll  /brick/gluster/vol*/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-11/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-12/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-14/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-15/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-16/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-17/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-18/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-7/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-8/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-9/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
[root@apandey glusterfs]# 
[root@apandey glusterfs]# 
[root@apandey glusterfs]# 
[root@apandey glusterfs]# stat /brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
  File: '/brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts'
  Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
Device: fd03h/64771d	Inode: 4233378     Links: 2
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:default_t:s0
Access: 2016-12-07 17:54:19.488040928 +0530
Modify: 2016-12-07 17:54:19.488040928 +0530
Change: 2016-12-07 17:56:05.634760199 +0530
 Birth: -
[root@apandey glusterfs]# stat /brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
  File: '/brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts' -> 'sun8i-a33-q8-tablet.dts'
  Size: 23        	Blocks: 8          IO Block: 4096   symbolic link
Device: fd03h/64771d	Inode: 4208551     Links: 2
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:default_t:s0
Access: 2016-12-07 17:56:09.681749496 +0530
Modify: 2016-09-24 13:40:18.000000000 +0530
Change: 2016-12-07 17:25:09.628516181 +0530


This is the log I placed in posix - 

[2016-12-07 12:24:19.494117] E [MSGID: 113018] [posix.c:506:posix_setattr] 0-vol-posix: Getting REG ofr LINK : /brick/gluster/vol-18/.glusterfs/80/1a/801ad5f8-2ec8-44ae-be33-580d55a2a7a9 statpost.ia_type = 1statpre.ia_type = 1 LOC_IS_LNK(loc) = 0


------------------------

Now, when the call back reaches to EC, the condition 
 if (cbk->iatt[0].ia_type == IA_IFREG) {

becomes true and it checks for the size, which it will not have.
That leads to assertion failure.

I think we have to handle this case in EC. I talked to Prashant and he was saying that had it been given to dht, it would have handled in second phase of setattr.

Comment 7 Atin Mukherjee 2016-12-12 08:52:29 UTC
upstream mainline patch http://review.gluster.org/16103 posted for review.

Comment 9 Ashish Pandey 2016-12-20 11:35:09 UTC
I tried the same scenario with "--enable-debug" flag OFF to not to throw any assert.

I added bricks and started re-balance. The re-balance "completed" without fail and any crash.
I confirmed that files and links were migrated to newly added bricks too. 

I also tried rename of 100 files from mount point and did not see any core.

I think this bug does not cause a functionality break for now.

volume rebalance: vol: success
[root@apandey glusterfs]# gluster v rebalance vol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              891         3.9KB          3169             0           681            completed        0:0:28
volume rebalance: vol: success

Comment 11 Ambarish 2017-07-19 14:41:16 UTC
Reproducible on 3.3.0 as well.

Comment 12 Ambarish 2017-07-19 14:45:33 UTC
From rebalance ogs during 3.3.0 Scale tests :


[root@gqas007 glusterfs]# grep -R "Assertion failed"
butcher-rebalance.log:[2017-07-19 14:32:00.813871] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
butcher-rebalance.log:[2017-07-19 14:34:50.753306] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
butcher-rebalance.log:[2017-07-19 14:35:10.113387] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[root@gqas007 glusterfs]# 


sosreports will be updated.

Comment 13 Ashish Pandey 2018-04-13 09:53:04 UTC
*** Bug 1566417 has been marked as a duplicate of this bug. ***

Comment 15 Yaniv Kaul 2019-05-05 12:14:15 UTC
Status?

Comment 21 SATHEESARAN 2019-10-11 07:44:49 UTC
Tested with RHGS 3.5.0 interim build ( glusterfs-6.0-15.el7rhgs ) and I see the same error messages in excess when there is active I/O during rebalance of the distributed-disperse volume

[2019-10-10 12:42:50.855875] E [ec-inode-write.c:483:ec_manager_setattr] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x12378) [0x7fbdb18e7378] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x1219b) [0x7fbdb18e719b] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x32273) [0x7fbdb1907273] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2019-10-10 12:42:50.868382] E [ec-inode-write.c:483:ec_manager_setattr] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x12378) [0x7fbdb18e7378] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x1219b) [0x7fbdb18e719b] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x32273) [0x7fbdb1907273] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)


[root@ ~]# grep ' E ' /var/log/glusterfs/backupvol-rebalance.log-20191011 | grep -i 'assertion failed' | wc -l
186

Comment 26 Red Hat Bugzilla 2023-09-14 03:35:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days