Bug 804914 - glusterfs crash during rebalance
Summary: glusterfs crash during rebalance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-20 07:26 UTC by Shwetha Panduranga
Modified: 2015-12-01 16:45 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 18:01:46 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shwetha Panduranga 2012-03-20 07:26:43 UTC
Description of problem:
Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id dstore --xlator-option *dht'.
Program terminated with signal 6, Aborted.
#0  0x0000003638632885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64

(gdb) bt full
#0  0x0000003638632885 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003638634065 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000363862b9fe in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x000000363862bac0 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f148e904e37 in client3_1_open (frame=0x7f1491c18454, this=0xa86ff0, data=0x7f1470556040) at client3_1-fops.c:3293
        local = 0x120255c
        conf = 0x0
        args = 0x7f1470556040
        req = {gfid = '\000' <repeats 15 times>, flags = 0, wbflags = 0, xdata = {xdata_len = 0, xdata_val = 0x0}}
        ret = 0
        op_errno = 116
        __PRETTY_FUNCTION__ = "client3_1_open"
        __FUNCTION__ = "client3_1_open"
#5  0x00007f148e8ecd07 in client_open (frame=0x7f1491c18454, this=0xa86ff0, loc=0x7f14770879c8, flags=2, fd=0x7f1470339a44, wbflags=0) at client.c:864
        ret = -1
        conf = 0xb374d0
        proc = 0x7f148eb1fdd0
        args = {loc = 0x7f14770879c8, fd = 0x7f1470339a44, xattr_req = 0x0, linkname = 0x0, iobref = 0x0, vector = 0x0, xattr = 0x0, stbuf = 0x0, dict = 0x0, 
          oldloc = 0x0, newloc = 0x0, name = 0x0, flock = 0x0, volume = 0x0, basename = 0x0, offset = 0, mask = 0, cmd = 0, size = 0, mode = 0, rdev = 0, flags = 2, 
          wbflags = 0, count = 0, datasync = 0, cmd_entrylk = ENTRYLK_LOCK, type = ENTRYLK_RDLCK, optype = GF_XATTROP_ADD_ARRAY, valid = 0, len = 0}
        __FUNCTION__ = "client_open"
#6  0x00007f148e691231 in afr_sh_data_open (frame=0x7f1491aa8234, this=0xa8c100) at afr-self-heal-data.c:1296
        _new = 0x7f1491c18454
        old_THIS = 0xa8c100
        tmp_cbk = 0x7f148e690c29 <afr_sh_data_open_cbk>
        i = 0
        call_count = 2
        fd = 0x7f1470339a44
        local = 0x7f1477087990
        priv = 0xab00c0
        sh = 0x7f1477089f58
        __FUNCTION__ = "afr_sh_data_open"
---Type <return> to continue, or q <return> to quit---
#7  0x00007f148e6912ee in afr_self_heal_data (frame=0x7f1491aa8234, this=0xa8c100) at afr-self-heal-data.c:1323
        local = 0x7f1477087990
        sh = 0x7f1477089f58
        priv = 0xab00c0
        __FUNCTION__ = "afr_self_heal_data"
#8  0x00007f148e6989ce in afr_sh_metadata_done (frame=0x7f1491aa8234, this=0xa8c100) at afr-self-heal-metadata.c:79
        local = 0x7f1477087990
        sh = 0x7f1477089f58
        __FUNCTION__ = "afr_sh_metadata_done"
#9  0x00007f148e69ac37 in afr_self_heal_metadata (frame=0x7f1491aa8234, this=0xa8c100) at afr-self-heal-metadata.c:602
        local = 0x7f1477087990
        priv = 0xab00c0
#10 0x00007f148e693b7c in afr_sh_missing_entries_done (frame=0x7f1491aa8234, this=0xa8c100) at afr-self-heal-common.c:924
        local = 0x7f1477087990
        sh = 0x7f1477089f58
        __FUNCTION__ = "afr_sh_missing_entries_done"
#11 0x00007f148e697c03 in afr_self_heal (frame=0x7f1491c180f8, this=0xa8c100, inode=0x7f14761f7378) at afr-self-heal-common.c:2176
        local = 0x7f147708504c
        sh = 0x7f1477089f58
        priv = 0xab00c0
        op_errno = 12
        ret = 0
        orig_sh = 0x7f1477087614
        sh_frame = 0x7f1491aa8234
        sh_local = 0x7f1477087990
        loc = 0x7f14770879c8
        __PRETTY_FUNCTION__ = "afr_self_heal"
        __FUNCTION__ = "afr_self_heal"
#12 0x00007f148e6bab34 in afr_launch_self_heal (frame=0x7f1491c180f8, this=0xa8c100, inode=0x7f14761f7378, background=_gf_true, ia_type=IA_IFREG, 
    reason=0x7f148e6ca096 "subvolume came online", gfid_sh_success_cbk=0, unwind=0) at afr-common.c:1325
        local = 0x7f147708504c
        sh_type_str = " data missing-entry gfid", '\000' <repeats 231 times>
        bg = 0x7f148e6d091f "background"
        __PRETTY_FUNCTION__ = "afr_launch_self_heal"
        __FUNCTION__ = "afr_launch_self_heal"
#13 0x00007f148e67ab21 in afr_trigger_open_fd_self_heal (frame=0x7f1491c180f8, this=0xa8c100) at afr-inode-write.c:339
---Type <return> to continue, or q <return> to quit---
        local = 0x7f147708504c
        sh = 0x7f1477087614
        inode = 0x7f14761f7378
        reason = 0x7f148e6ca096 "subvolume came online"
#14 0x00007f148e67af38 in afr_open_fd_fix (frame=0x7f1491c180f8, this=0xa8c100, pause_fop=_gf_false) at afr-inode-write.c:426
        ret = 0
        i = 2
        fd_ctx = 0x13c1080
        need_self_heal = _gf_true
        need_open = 0x7f146c000ea0
        need_open_count = 1
        local = 0x7f147708504c
        priv = 0xab00c0
        fop_continue = _gf_true
        __PRETTY_FUNCTION__ = "afr_open_fd_fix"
        __FUNCTION__ = "afr_open_fd_fix"
#15 0x00007f148e678e67 in afr_readv (frame=0x7f1491c180f8, this=0xa8c100, fd=0x7f14703399e0, size=131072, offset=8912896, flags=0) at afr-inode-read.c:1346
        priv = 0xab00c0
        local = 0x7f147708504c
        children = 0xab0290
        call_child = 0
        op_errno = 0
        read_child = 0
        ret = 0
        __FUNCTION__ = "afr_readv"
#16 0x00007f1492e2f642 in syncop_readv (subvol=0xa8c100, fd=0x7f14703399e0, size=131072, off=8912896, flags=0, vector=0x7f1470556a28, count=0x7f1470556a30, 
    iobref=0x7f1470556a20) at syncop.c:989
        _new = 0x7f1491c180f8
        old_THIS = 0xa8e360
        tmp_cbk = 0x7f1492e2f24b <syncop_readv_cbk>
        task = 0x7f14703568b0
        args = {op_ret = 0, op_errno = 0, iatt1 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', 
              sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', 
                exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
            ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, iatt2 = {ia_ino = 0, 
            ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {
---Type <return> to continue, or q <return> to quit---
                read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
                write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
            ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, xattr = 0x0, entries = {{list = {next = 0x0, prev = 0x0}, {
                next = 0x0, prev = 0x0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, 
              ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, 
                group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, 
              ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, 
              ia_ctime = 0, ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name = 0x7f1470556710 ""}, statvfs_buf = {f_bsize = 0, f_frsize = 0, f_blocks = 0, 
            f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, vector = 0x0, 
          count = 0, iobref = 0x0, buffer = 0x0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {
                __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, complete = 0 '\000', cond = {__data = {__lock = 0, __futex = 0, 
              __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, 
            __align = 0}, task = 0x0}
        __FUNCTION__ = "syncop_readv"
#17 0x00007f148e407525 in __dht_rebalance_migrate_data (from=0xa8c100, to=0xa8d790, src=0x7f14703399e0, dst=0x7f147033997c, ia_size=10485760, hole_exists=0)
    at dht-rebalance.c:401
        ret = 131072
        count = 1
        offset = 8912896
        vector = 0x0
        iobref = 0x0
        total = 8912896
        read_size = 131072
#18 0x00007f148e408765 in dht_migrate_file (this=0xa8e360, loc=0xb642b4, from=0xa8c100, to=0xa8d790, flag=0) at dht-rebalance.c:717
        ret = 0
        new_stbuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
            owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
              write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
          ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        stbuf = {ia_ino = 10373069258793640843, ia_gfid = "v\245B\006\251\232H\370\217\364\213\210\060\353c\213", ia_dev = 2065, ia_type = IA_IFREG, ia_prot = {
            suid = 0 '\000', sgid = 1 '\001', sticky = 1 '\001', owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {read = 1 '\001', 
              write = 0 '\000', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0, 
          ia_size = 10485760, ia_blksize = 4096, ia_blocks = 20488, ia_atime = 1332198184, ia_atime_nsec = 387218553, ia_mtime = 1332198184, 
          ia_mtime_nsec = 576485905, ia_ctime = 1332198991, ia_ctime_nsec = 506370811}
        empty_iatt = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', 
            sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {
---Type <return> to continue, or q <return> to quit---
              read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, 
          ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        src_ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {read = 1 '\001', 
            write = 0 '\000', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}
        src_fd = 0x7f14703399e0
        dst_fd = 0x7f147033997c
        dict = 0x90d804
        xattr = 0x0
        xattr_rsp = 0x90d8ac
        file_has_holes = 0
        __FUNCTION__ = "dht_migrate_file"
#19 0x00007f148e409305 in rebalance_task (data=0x7f1491c18b0c) at dht-rebalance.c:887
        ret = -1
        local = 0xb642ac
        frame = 0x7f1491c18b0c
#20 0x00007f1492e29c9a in synctask_wrap (old_task=0x7f14703568b0) at syncop.c:128
        task = 0x7f14703568b0
#21 0x0000003638643610 in ?? () from /lib64/libc.so.6
No symbol table info available.
#22 0x0000000000000000 in ?? ()
No symbol table info available.

Version-Release number of selected component (if applicable):
3.3.0qa30

Additional info:
The crash is observed because afr triggered open-fd self-heal, where the fd->inode has 0 gfid. Since the path present in inode->path is not null(it was <gfid:000...> it will try to trigger the self-heal, as if the loc that is passed to it is coming as part of fresh-lookup. The open crashed as the loc does not have any gfid.

Comment 1 Anand Avati 2012-03-31 12:16:30 UTC
CHANGE: http://review.gluster.com/3045 (cluster/afr: Handle invalid inode in open_fd_fix) merged in master by Vijay Bellur (vijay)

Comment 2 Shwetha Panduranga 2012-05-12 13:41:09 UTC
bug is fixed. verified on 3.3.0qa41


Note You need to log in before you can comment on or make changes to this bug.