Bug 928656 - nfs process crashed after rebalance during unlock of files.
Summary: nfs process crashed after rebalance during unlock of files.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: santosh pradhan
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-28 07:22 UTC by Rajesh
Modified: 2014-09-02 10:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-02 10:17:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Rajesh 2013-03-28 07:22:23 UTC
Description of problem:
Gluster NFS process crashes when the "Steps to Reproduce" are followed.

Version-Release number of selected component (if applicable):
upstream mainline

How reproducible: always (3/3)


Steps to Reproduce:
1. create a single brick volume
2. mount via nfs, hold locks on many files simultaneously on the
   mount point and sleep for about 200 seconds
3. add a brick and start rebalance.
4. After rebalance completes, wait till unlocks are attempted.
  
Actual results:
Gluster NFS process crashes during unlock with the bt:

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glu'.
Program terminated with signal 11, Segmentation fault.
#0  dht_migration_complete_check_task (data=0x7f890981f02c) at dht-helper.c:743
743	                ret = syncop_open (dst_node, &local->loc,
(gdb) bt
#0  dht_migration_complete_check_task (data=0x7f890981f02c) at dht-helper.c:743
#1  0x00007f890b619383 in synctask_wrap (old_task=0x1124040) at syncop.c:161
#2  0x00007f890abf7650 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x0000000000000000 in ?? ()


Expected results:
rebalance should not cause crashes on unlocking.

Additional info:
script for locking:


lock.py :
--------------------------------------------------------------------------
#!/usr/bin/python

import fcntl
import time
import sys

def main():
    file_path = sys.argv[1]
    FH = open(file_path, "w")
    try:
       fcntl.lockf(FH, fcntl.LOCK_EX, 10, 0, 0)
    except IOError:
       print "lock could not be acquired."
       return False
    print "lock acquired"
    time.sleep(200)
    print "out of sleep"
    try:
       fcntl.lockf(FH, fcntl.LOCK_UN)
    except IOError:
       print "unlock failed"
       return False

    print "unlock happened"
    return True

if __name__ == "__main__":
   main()
-----------------------------------------------------------------------

to hold locks on multiple fiels lets say, {a..z},

# for i in {a..z}; do lock $i & done

-----------------------------------------------------------------------
NFS logs when it crashed:


[2013-03-28 07:17:04.712942] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-vol-client-0: remote operation failed: No such file or directory
[2013-03-28 07:17:04.714454] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-vol-client-0: remote operation failed: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-03-28 07:17:04configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/usr/local/sbin/glusterfs(glusterfsd_print_trace+0x1f)[0x408085]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0)[0x7f890abe54a0]
/usr/local/lib/glusterfs/3git/xlator/cluster/distribute.so(dht_migration_complete_check_task+0x7a3)[0x7f89064b3fd3]
/usr/local/lib/libglusterfs.so.0(synctask_wrap+0x37)[0x7f890b619383]
/lib/x86_64-linux-gnu/libc.so.6(+0x48650)[0x7f890abf7650]

Comment 1 Rajesh 2013-03-28 09:11:31 UTC
More info from the core:

(gdb) p (call_frame_t*) data
$1 = (call_frame_t *) 0x7f890981f02c
(gdb) p *$
$2 = {root = 0x7f89096100a4, parent = 0x7f890981f4e0, next = 0x7f890981f4e0, prev = 0x7f890981f184, local = 0x7f88fed7e7f0, this = 0x1097d80, ret = 0x7f8906292cac <io_stats_stat_cbk>, ref_count = 0, 
  lock = 1, cookie = 0x7f890981f02c, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, 
  wind_from = 0x7f89062a2642 <__FUNCTION__.11339> "io_stats_stat", wind_to = 0x7f89062a1a2c "FIRST_CHILD(this)->fops->stat", unwind_from = 0x0, unwind_to = 0x7f89062a1a4a "io_stats_stat_cbk"}
(gdb) info locals
ret = -1
src_node = 0x1095f80
dst_node = 0x1097210
local = 0x7f88fed7e7f0
dict = 0x0
layout = 0x10c59d0
stbuf = {ia_ino = 9242341100452143058, ia_gfid = "\032\217{dL\255\200CdCIq", <incomplete sequence \322>, ia_dev = 2055, ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
    owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, 
  ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 1364454824, ia_mtime_nsec = 0, ia_ctime = 1364454843, 
  ia_ctime_nsec = 822884540}
this = 0x1097d80
frame = 0x7f890981f02c
tmp_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
path = 0x0
conf = 0x10c4db0
__FUNCTION__ = "dht_migration_complete_check_task"
(gdb) f 1
#1  0x00007f890b619383 in synctask_wrap (old_task=0x1124040) at syncop.c:161
161	        task->ret = task->syncfn (task->opaque);
(gdb) f 0
#0  dht_migration_complete_check_task (data=0x7f890981f02c) at dht-helper.c:743
743	                ret = syncop_open (dst_node, &local->loc,
(gdb) p local->loc
$3 = {path = 0x10919b0 "<gfid:1adfb78f-7b64-4cad-8043-644349715fd2>", name = 0x0, inode = 0x7f8904f430c8, parent = 0x0, gfid = "\032\217{dL\255\200CdCIq", <incomplete sequence \322>, 
  pargfid = '\000' <repeats 15 times>}
(gdb) p dst_node
$4 = (xlator_t *) 0x1097210
(gdb) p *$
$5 = {name = 0x1090bf0 "vol-client-1", type = 0x1097060 "protocol/client", next = 0x1095f80, prev = 0x1097d80, parents = 0x1098e60, children = 0x0, options = 0x7f89091d625c, dlhandle = 0x10969e0, 
  fops = 0x7f89069495a0 <fops>, cbks = 0x7f8906949580 <cbks>, dumpops = 0x7f8906949840 <dumpops>, volume_options = {next = 0x1097bb0, prev = 0x1097cc0}, fini = 0x7f890671c234 <fini>, 
  init = 0x7f890671bfe0 <init>, reconfigure = 0x7f890671bdcf <reconfigure>, mem_acct_init = 0x7f890671b82d <mem_acct_init>, notify = 0x7f890671b43b <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, 
      max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0x105a010, graph = 0x1091c10, itable = 0x0, init_succeeded = 1 '\001', private = 0x10c5a30, mem_acct = {
    num_types = 0, rec = 0x0}, winds = 0, switched = 0 '\000', local_pool = 0x10c5b10, is_autoloaded = _gf_false}
(gdb) l
738	         */
739	        SYNCTASK_SETID(0, 0);
740	        /* if 'local->fd' (ie, fd based operation), send a 'open()' on
741	           destination if not already done */
742	        if (local->loc.inode) {
743	                ret = syncop_open (dst_node, &local->loc,
744	                                   local->fd->flags, local->fd);
745	        } else {
746	                tmp_loc.inode = local->fd->inode;
747	                inode_path (local->fd->inode, NULL, &path);
(gdb) p local->fd
$6 = (fd_t *) 0x0
(gdb) info args
data = 0x7f890981f02c
(gdb) p (call_frame_t*) data
$7 = (call_frame_t *) 0x7f890981f02c
(gdb) info locals
ret = -1
src_node = 0x1095f80
dst_node = 0x1097210
local = 0x7f88fed7e7f0
dict = 0x0
layout = 0x10c59d0
stbuf = {ia_ino = 9242341100452143058, ia_gfid = "\032\217{dL\255\200CdCIq", <incomplete sequence \322>, ia_dev = 2055, ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
    owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, 
  ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 1364454824, ia_mtime_nsec = 0, ia_ctime = 1364454843, 
  ia_ctime_nsec = 822884540}
this = 0x1097d80
frame = 0x7f890981f02c
tmp_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
path = 0x0
conf = 0x10c4db0
__FUNCTION__ = "dht_migration_complete_check_task"
(gdb) f
#0  dht_migration_complete_check_task (data=0x7f890981f02c) at dht-helper.c:743
743	                ret = syncop_open (dst_node, &local->loc,
(gdb) p *local
$8 = {call_cnt = 1, loc = {path = 0x10919b0 "<gfid:1adfb78f-7b64-4cad-8043-644349715fd2>", name = 0x0, inode = 0x7f8904f430c8, parent = 0x0, gfid = "\032\217{dL\255\200CdCIq", <incomplete sequence \322>, 
    pargfid = '\000' <repeats 15 times>}, loc2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, op_ret = -1, op_errno = 117, 
  layout_mismatch = 0, stbuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', 
        write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, 
    ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, prebuf = {ia_ino = 0, 
    ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, 
      group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
    ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, preoldparent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, 
    ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', 
        exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
    ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, postoldparent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {
      suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {
        read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, 
    ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, preparent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', 
      sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
        exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, 
    ia_ctime_nsec = 0}, postparent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {
        read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, 
    ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, statvfs = {
    f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, fd = 0x0, 
  inode = 0x0, params = 0x0, xattr = 0x0, xattr_req = 0x0, layout = 0x10c5910, size = 0, ia_ino = 0, src_hashed = 0x0, src_cached = 0x0, dst_hashed = 0x0, dst_cached = 0x0, cached_subvol = 0x1097210, 
  hashed_subvol = 0x0, need_selfheal = 0 '\000', file_count = 0, dir_count = 0, main_frame = 0x0, fop_succeeded = 0, linkfile = {linkfile_cbk = 0x0, stbuf = {ia_ino = 0, 
      ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, 
        group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
      ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, 
      gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, inode = 0x0, xattr = 0x0, srcvol = 0x0}, selfheal = {hole_cnt = 0, overlaps_cnt = 0, down = 0, misc = 0, dir_cbk = 0x0, 
    layout = 0x0}, uid = 0, gid = 0, flags = 0, mode = 0, rdev = 0, umask = 0, xattr_val = 0x0, key = 0x0, xsel = '\000' <repeats 255 times>, alloc_len = 0, newpath = 0x0, gfid = '\000' <repeats 15 times>, 
  marker = {volmark = 0x0, data = 0x0, host_timebuf = {0, 0}, net_timebuf = {0, 0}, call_count = 0, has_xtime = 0, enoent_count = 0, enotconn_count = 0, enodata_count = 0, noxtime_count = 0, esomerr = 0, 
    xl_specf_unwind = 0x0, xl_local = 0x0, vol_uuid = 0x0, retval = 0 '\000'}, return_estale = 0 '\000', need_lookup_everywhere = 0 '\000', fop = GF_FOP_STAT, linked = _gf_false, link_subvol = 0x0, 
  rebalance = {from_subvol = 0x0, target_node = 0x0, offset = 0, size = 0, flags = 0, count = 0, iobref = 0x0, vector = 0x0, stbuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, 
      ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
      ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, target_op_fn = 0x7f89064ed57e <dht_attr2>, xdata = 0x0}}
(gdb)

Comment 3 Shyamsundar 2013-12-10 05:59:53 UTC
Repeated test as per #Description and found that the crash is not present.

Code base: master (f4488e34553ec0bf68c270034af179ab14fd2b58)

There are quite a few changes in the area where the old crash was observed, dht_migration_complete_check_task. See commit, 98e796e50198945adc660e42f3f5ab5b668f7bba.

Reverted to code before the above mentioned commit, but crashes were still not visible.

Overall repeated on both code bases, ~5 times without any crash.

There is a functional failure in unlock in the latest code base w.r.t NFS code as follows though (always repeatable),

[2013-12-08 14:55:45.843630] W [nlm4.c:1688:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2013-12-08 14:55:45.843694] W [nlm4.c:1702:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume

This is due to a graph change initiated due to the brick addition that causes NFS to restart and hence nlm_client_list is becomes empty, causing nlm_get_uniq to return a NULL and from there on the unlock to fail.

Moving this to NFS, @spradhan can you confirm the behaviour.

Comment 4 santosh pradhan 2013-12-10 15:52:01 UTC
(In reply to Shyamsundar from comment #3)

> There is a functional failure in unlock in the latest code base w.r.t NFS
> code as follows though (always repeatable),
> 
> [2013-12-08 14:55:45.843630] W [nlm4.c:1688:nlm4_unlock_resume] 0-nfs-NLM:
> nlm_get_uniq() returned NULL
> [2013-12-08 14:55:45.843694] W [nlm4.c:1702:nlm4_unlock_resume] 0-nfs-NLM:
> unable to unlock_fd_resume
> 
> This is due to a graph change initiated due to the brick addition that
> causes NFS to restart and hence nlm_client_list is becomes empty, causing
> nlm_get_uniq to return a NULL and from there on the unlock to fail.
> 
> Moving this to NFS, @spradhan can you confirm the behaviour.

This wont be a functional failure if the client does not reclaim the LOCK after server recovers. i.e.

1. Client takes the LOCK on file and goes for sleep().
2. Add brick/Rebalance (topology change) on the server, causes the NFS server to restart.

3. Once server is back online, it would ask the clients to reclaim their LOCK(s). If the server does not inform the client(s) about its recover status, then it would be an server issue.

4. Clients should reclaim their LOCK again. If the client does not reclaim the LOCK (i.e. not sending the LOCK call again), then its a client issue.

5. By that time if client comes out of the sleep, then that process finishes execution, then it would try to UNLOCK on the file where it did have the LOCK. This would log the behaviour you just noticed. 

Could you just increase the sleep time and retry. Packet capture would help you to monitor the behaviour. If the crash is not seen, then it is just timing issue with (lock/unlock)'ing. It can be tested with a single file.

Reassigning the issue to you.

Thanks,
Santosh

Comment 7 santosh pradhan 2014-05-30 16:33:08 UTC

As per comment #3, Shyam had followed the steps to repro the issue but no SEGV was seen. So, we should try to repro the issue with latest code drop. If not reproducible, then we can close the issue.

Let me know if anybody is working on this.

Comment 9 santosh pradhan 2014-09-02 10:17:11 UTC
Not reproducible anymore with the test case. Closing it.


Note You need to log in before you can comment on or make changes to this bug.