Bug 1398196 - Assertion Failed Error messages in rebalance logs during rebalance
Summary: Assertion Failed Error messages in rebalance logs during rebalance
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: Sayalee
URL:
Whiteboard:
: 1566417 (view as bug list)
Depends On:
Blocks: 1402538
TreeView+ depends on / blocked
 
Reported: 2016-11-24 09:48 UTC by Prasad Desala
Modified: 2023-09-14 03:35 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1402538 (view as bug list)
Environment:
Last Closed: 2020-01-02 05:35:04 UTC
Embargoed:


Attachments (Terms of Use)

Description Prasad Desala 2016-11-24 09:48:36 UTC
Description of problem:
======================
Observing the below Assertion failed messages in rebalance logs during rebalance

[2016-11-24 06:51:34.447866] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:51:34.508070] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:51:38.229553] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:54:16.408204] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:54:17.452540] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 06:58:45.236521] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 07:41:15.196585] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 07:41:19.539656] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-11-24 07:41:21.841912] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd918) [0x7fc7fd5d5918] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd73b) [0x7fc7fd5d573b] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x29893) [0x7fc7fd5f1893] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
 

Version-Release number of selected component (if applicable):
3.8.4-5.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a 2 X (4+2) distributed-disperse volume and start it.
2) FUSE mount the volume on multiple clients.
3) From mount point, create 100K files and untar linux kernel package.
4) Wait till step-3 completes.
5) From client-1, start renaming the files and from other clients start continuous lookups.
6) While rename+ lookups are in progress, add few bricks and start rebalance.

Keep monitoring rebalance logs and at some point we are seeing Assertion failure error messages in rebalance logs.

Actual results:
===============
Assertion Failed Error messages in rebalance logs during rebalance.

Expected results:
================
There should not be any error messages during rebalance in rebalance logs.

Comment 6 Ashish Pandey 2016-12-07 13:16:51 UTC
While trying to reproduce this issue, I hit with the crash -

(gdb) bt
#0  0x00007f7503611765 in raise () from /lib64/libc.so.6
#1  0x00007f750361336a in abort () from /lib64/libc.so.6
#2  0x00007f7503609f97 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f750360a042 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f74f77bafd4 in ec_manager_setattr (fop=0x7f74ec0b491c, state=4) at ec-inode-write.c:394
#5  0x00007f74f779c405 in __ec_manager (fop=0x7f74ec0b491c, error=0) at ec-common.c:2283
#6  0x00007f74f7797d90 in ec_resume (fop=0x7f74ec0b491c, error=0) at ec-common.c:289
#7  0x00007f74f7797fba in ec_complete (fop=0x7f74ec0b491c) at ec-common.c:362
#8  0x00007f74f77b979a in ec_inode_write_cbk (frame=0x7f74ec2d82bc, this=0x7f74f002e770, cookie=0x3, op_ret=0, op_errno=0, prestat=0x7f74f848b980, poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:65
#9  0x00007f74f77baacc in ec_setattr_cbk (frame=0x7f74ec2d82bc, cookie=0x3, this=0x7f74f002e770, op_ret=0, op_errno=0, prestat=0x7f74f848b980, poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:349
#10 0x00007f74f7a5563d in client3_3_setattr_cbk (req=0x7f74ec0033ec, iov=0x7f74ec00342c, count=1, myframe=0x7f74ec16fabc) at client-rpc-fops.c:2264
#11 0x00007f7504d8026a in rpc_clnt_handle_reply (clnt=0x7f74f005fe70, pollin=0x7f74f0321c60) at rpc-clnt.c:790
#12 0x00007f7504d807c3 in rpc_clnt_notify (trans=0x7f74f0060300, mydata=0x7f74f005fec8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-clnt.c:970
#13 0x00007f7504d7ca37 in rpc_transport_notify (this=0x7f74f0060300, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-transport.c:537
#14 0x00007f74f9f53ca1 in socket_event_poll_in (this=0x7f74f0060300) at socket.c:2265
#15 0x00007f74f9f541f2 in socket_event_handler (fd=29, idx=19, data=0x7f74f0060300, poll_in=1, poll_out=0, poll_err=0) at socket.c:2395
#16 0x00007f7505024dbf in event_dispatch_epoll_handler (event_pool=0x1268010, event=0x7f74f848bea0) at event-epoll.c:571
#17 0x00007f750502519e in event_dispatch_epoll_worker (data=0x12b1a60) at event-epoll.c:674
#18 0x00007f7503e065ca in start_thread () from /lib64/libpthread.so.0
#19 0x00007f75036e00ed in clone () from /lib64/libc.so.6

------------------------------

It is crashing in ec_manager_setattr at -

 if (cbk->iatt[0].ia_type == IA_IFREG) {
                    ec_iatt_rebuild(fop->xl->private, cbk->iatt, 2,
                                    cbk->count);

                    /* This shouldn't fail because we have the inode locked. */
                    GF_ASSERT(ec_get_inode_size(fop,
                                                fop->locks[0].lock->loc.inode,
                                                &cbk->iatt[0].ia_size));
                    cbk->iatt[1].ia_size = cbk->iatt[0].ia_size;
                }

--------------------------------

Reason behind this is that, a file which is a link "IA_IFLNK" is coming as regular file "IA_IFREG" in callback.

(gdb) p *cbk
$6 = {list = {next = 0x7f74ec0b495c, prev = 0x7f74ec0b495c}, answer_list = {next = 0x7f74ec0b496c, prev = 0x7f74f04271ec}, fop = 0x7f74ec0b491c, next = 0x7f74f04271dc, idx = 3, op_ret = 0, op_errno = 0, count = 6, mask = 63, dirty = {0, 
    0}, xdata = 0x7f74f01c6a3c, dict = 0x0, int32 = 0, uintptr = {0, 0, 0}, size = 0, version = {0, 0}, inode = 0x0, fd = 0x0, statvfs = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, 
    f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, iatt = {{ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFREG, 
      ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, 
      ia_mtime_nsec = 489040926, ia_ctime = 1481113459, ia_ctime_nsec = 492040918}, {ia_ino = 13705394905313093545, ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFREG, ia_prot = {
        suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, ia_mtime_nsec = 489040926, 
      ia_ctime = 1481113459, ia_ctime_nsec = 494040912}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
      ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {
        suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, 
      ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, 
      ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, 
  str = 0x0, entries = {{list = {next = 0x7f74f01b4884, prev = 0x7f74f01b4884}, {next = 0x7f74f01b4884, prev = 0x7f74f01b4884}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, 
      ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {
          read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, 
      ia_ctime = 0, ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name = 0x7f74f01b492c ""}, offset = 0, what = GF_SEEK_DATA}
(gdb)                 
(gdb) 
(gdb) p *cbk->fop
$7 = {id = 38, refs = 3, state = 4, minimum = 4, expected = 6, winds = 0, jobs = 1, error = 0, parent = 0x0, xl = 0x7f74f002e770, req_frame = 0x7f74ec218d8c, frame = 0x7f74ec2d82bc, cbk_list = {next = 0x7f74f01b40fc, 
    prev = 0x7f74f01b40fc}, answer_list = {next = 0x7f74ec12ef2c, prev = 0x7f74f01b410c}, pending_list = {next = 0x7f74e0017adc, prev = 0x7f74ec154f6c}, answer = 0x7f74f01b40fc, lock_count = 1, locked = 1, locks = {{
      lock = 0x7f74ec07b3bc, fop = 0x7f74ec0b491c, owner_list = {next = 0x7f74ec07b3cc, prev = 0x7f74ec07b3cc}, wait_list = {next = 0x7f74ec0b49bc, prev = 0x7f74ec0b49bc}, update = {_gf_false, _gf_true}, base = 0x0, size = 0}, {
      lock = 0x0, fop = 0x0, owner_list = {next = 0x7f74ec0b49f4, prev = 0x7f74ec0b49f4}, wait_list = {next = 0x7f74ec0b4a04, prev = 0x7f74ec0b4a04}, update = {_gf_false, _gf_false}, base = 0x0, size = 0}}, first_lock = 0, lock = {
    spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, 
  flags = 0, first = 0, mask = 63, healing = 0, remaining = 0, received = 63, good = 63, uid = 0, gid = 0, wind = 0x7f74f77baad2 <ec_wind_setattr>, handler = 0x7f74f77bae65 <ec_manager_setattr>, resume = 0x0, cbks = {
    access = 0x7f750505f400 <default_setattr_cbk>, create = 0x7f750505f400 <default_setattr_cbk>, discard = 0x7f750505f400 <default_setattr_cbk>, entrylk = 0x7f750505f400 <default_setattr_cbk>, 
    fentrylk = 0x7f750505f400 <default_setattr_cbk>, fallocate = 0x7f750505f400 <default_setattr_cbk>, flush = 0x7f750505f400 <default_setattr_cbk>, fsync = 0x7f750505f400 <default_setattr_cbk>, 
    fsyncdir = 0x7f750505f400 <default_setattr_cbk>, getxattr = 0x7f750505f400 <default_setattr_cbk>, fgetxattr = 0x7f750505f400 <default_setattr_cbk>, heal = 0x7f750505f400 <default_setattr_cbk>, 
    fheal = 0x7f750505f400 <default_setattr_cbk>, inodelk = 0x7f750505f400 <default_setattr_cbk>, finodelk = 0x7f750505f400 <default_setattr_cbk>, link = 0x7f750505f400 <default_setattr_cbk>, lk = 0x7f750505f400 <default_setattr_cbk>, 
    lookup = 0x7f750505f400 <default_setattr_cbk>, mkdir = 0x7f750505f400 <default_setattr_cbk>, mknod = 0x7f750505f400 <default_setattr_cbk>, open = 0x7f750505f400 <default_setattr_cbk>, opendir = 0x7f750505f400 <default_setattr_cbk>, 
    readdir = 0x7f750505f400 <default_setattr_cbk>, readdirp = 0x7f750505f400 <default_setattr_cbk>, readlink = 0x7f750505f400 <default_setattr_cbk>, readv = 0x7f750505f400 <default_setattr_cbk>, 
    removexattr = 0x7f750505f400 <default_setattr_cbk>, fremovexattr = 0x7f750505f400 <default_setattr_cbk>, rename = 0x7f750505f400 <default_setattr_cbk>, rmdir = 0x7f750505f400 <default_setattr_cbk>, 
    setattr = 0x7f750505f400 <default_setattr_cbk>, fsetattr = 0x7f750505f400 <default_setattr_cbk>, setxattr = 0x7f750505f400 <default_setattr_cbk>, fsetxattr = 0x7f750505f400 <default_setattr_cbk>, 
    stat = 0x7f750505f400 <default_setattr_cbk>, fstat = 0x7f750505f400 <default_setattr_cbk>, statfs = 0x7f750505f400 <default_setattr_cbk>, symlink = 0x7f750505f400 <default_setattr_cbk>, 
    truncate = 0x7f750505f400 <default_setattr_cbk>, ftruncate = 0x7f750505f400 <default_setattr_cbk>, unlink = 0x7f750505f400 <default_setattr_cbk>, writev = 0x7f750505f400 <default_setattr_cbk>, 
    xattrop = 0x7f750505f400 <default_setattr_cbk>, fxattrop = 0x7f750505f400 <default_setattr_cbk>, zerofill = 0x7f750505f400 <default_setattr_cbk>, seek = 0x7f750505f400 <default_setattr_cbk>, 
    ipc = 0x7f750505f400 <default_setattr_cbk>}, data = 0x0, heal = 0x0, healer = {next = 0x7f74ec0b4ac4, prev = 0x7f74ec0b4ac4}, user_size = 0, head = 0, use_fd = 0, xdata = 0x7f74ec1ca3fc, dict = 0x0, int32 = 6, uint32 = 0, size = 0, 
  offset = 0, mode = {0, 0}, entrylk_cmd = ENTRYLK_LOCK, entrylk_type = ENTRYLK_RDLCK, xattrop_flags = GF_XATTROP_ADD_ARRAY, dev = 0, inode = 0x0, fd = 0x0, iatt = {ia_ino = 13705394905313093545, 
    ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type = IA_IFLNK, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, 
      group = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, other = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 23, ia_blksize = 4096, ia_blocks = 4, 
    ia_atime = 1481113215, ia_atime_nsec = 659685788, ia_mtime = 1474704618, ia_mtime_nsec = 0, ia_ctime = 1481111709, ia_ctime_nsec = 633516169}, str = {0x0, 0x0}, loc = {{
      path = 0x7f74ec2f0b80 "/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts", name = 0x7f74ec2f0b9f "sun8i-a33-ippo-q8h-v1.2.dts", inode = 0x7f74cc02756c, parent = 0x7f74e40085dc, 
      gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", pargfid = "\224ĪŠ\346~UGZ\225\067\003\064\003J", <incomplete sequence \362>}, {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, 
      pargfid = '\000' <repeats 15 times>}}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, seek = GF_SEEK_DATA}
(gdb) 


The RC for this is that, when we do setattr the link file was actually a link.
However, the moment it reached to backend, it has been migrated to some other subvolume. At this point when posix_pstat get called, it is the "T" file on which stat happens. This T file is a simply regular file.


root@apandey glusterfs]# 
[root@apandey glusterfs]# ll  /brick/gluster/vol*/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-11/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-12/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-14/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-15/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-16/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-17/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54 /brick/gluster/vol-18/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-7/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-8/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40 /brick/gluster/vol-9/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts -> sun8i-a33-q8-tablet.dts
[root@apandey glusterfs]# 
[root@apandey glusterfs]# 
[root@apandey glusterfs]# 
[root@apandey glusterfs]# stat /brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
  File: '/brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts'
  Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
Device: fd03h/64771d	Inode: 4233378     Links: 2
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:default_t:s0
Access: 2016-12-07 17:54:19.488040928 +0530
Modify: 2016-12-07 17:54:19.488040928 +0530
Change: 2016-12-07 17:56:05.634760199 +0530
 Birth: -
[root@apandey glusterfs]# stat /brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
  File: '/brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts' -> 'sun8i-a33-q8-tablet.dts'
  Size: 23        	Blocks: 8          IO Block: 4096   symbolic link
Device: fd03h/64771d	Inode: 4208551     Links: 2
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:default_t:s0
Access: 2016-12-07 17:56:09.681749496 +0530
Modify: 2016-09-24 13:40:18.000000000 +0530
Change: 2016-12-07 17:25:09.628516181 +0530


This is the log I placed in posix - 

[2016-12-07 12:24:19.494117] E [MSGID: 113018] [posix.c:506:posix_setattr] 0-vol-posix: Getting REG ofr LINK : /brick/gluster/vol-18/.glusterfs/80/1a/801ad5f8-2ec8-44ae-be33-580d55a2a7a9 statpost.ia_type = 1statpre.ia_type = 1 LOC_IS_LNK(loc) = 0


------------------------

Now, when the call back reaches to EC, the condition 
 if (cbk->iatt[0].ia_type == IA_IFREG) {

becomes true and it checks for the size, which it will not have.
That leads to assertion failure.

I think we have to handle this case in EC. I talked to Prashant and he was saying that had it been given to dht, it would have handled in second phase of setattr.

Comment 7 Atin Mukherjee 2016-12-12 08:52:29 UTC
upstream mainline patch http://review.gluster.org/16103 posted for review.

Comment 9 Ashish Pandey 2016-12-20 11:35:09 UTC
I tried the same scenario with "--enable-debug" flag OFF to not to throw any assert.

I added bricks and started re-balance. The re-balance "completed" without fail and any crash.
I confirmed that files and links were migrated to newly added bricks too. 

I also tried rename of 100 files from mount point and did not see any core.

I think this bug does not cause a functionality break for now.

volume rebalance: vol: success
[root@apandey glusterfs]# gluster v rebalance vol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              891         3.9KB          3169             0           681            completed        0:0:28
volume rebalance: vol: success

Comment 11 Ambarish 2017-07-19 14:41:16 UTC
Reproducible on 3.3.0 as well.

Comment 12 Ambarish 2017-07-19 14:45:33 UTC
From rebalance ogs during 3.3.0 Scale tests :


[root@gqas007 glusterfs]# grep -R "Assertion failed"
butcher-rebalance.log:[2017-07-19 14:32:00.813871] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
butcher-rebalance.log:[2017-07-19 14:34:50.753306] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
butcher-rebalance.log:[2017-07-19 14:35:10.113387] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xdb98) [0x7fc5e97f5b98] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0xd9bb) [0x7fc5e97f59bb] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/disperse.so(+0x2a063) [0x7fc5e9812063] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[root@gqas007 glusterfs]# 


sosreports will be updated.

Comment 13 Ashish Pandey 2018-04-13 09:53:04 UTC
*** Bug 1566417 has been marked as a duplicate of this bug. ***

Comment 15 Yaniv Kaul 2019-05-05 12:14:15 UTC
Status?

Comment 21 SATHEESARAN 2019-10-11 07:44:49 UTC
Tested with RHGS 3.5.0 interim build ( glusterfs-6.0-15.el7rhgs ) and I see the same error messages in excess when there is active I/O during rebalance of the distributed-disperse volume

[2019-10-10 12:42:50.855875] E [ec-inode-write.c:483:ec_manager_setattr] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x12378) [0x7fbdb18e7378] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x1219b) [0x7fbdb18e719b] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x32273) [0x7fbdb1907273] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2019-10-10 12:42:50.868382] E [ec-inode-write.c:483:ec_manager_setattr] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x12378) [0x7fbdb18e7378] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x1219b) [0x7fbdb18e719b] -->/usr/lib64/glusterfs/6.0/xlator/cluster/disperse.so(+0x32273) [0x7fbdb1907273] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)


[root@ ~]# grep ' E ' /var/log/glusterfs/backupvol-rebalance.log-20191011 | grep -i 'assertion failed' | wc -l
186

Comment 26 Red Hat Bugzilla 2023-09-14 03:35:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.