Hide Forgot
Log at: /share/tickets/1845/
From Allen: While testing a 4 node setup of dist+mirror with rc15 GNFS, we ran into a lockup issue after the vIP is failed over to the mirrored node with ucarp. vIPs are 153.90.178.104/24 and 153.90.178.105/24 . 104 failed over to 105 then the lockup happens. There are two problems here: 1. nfs server is receiving changed inode numbers from the server. The first response by nfs server should be to revalidate the inode. This is fixed in a patch committed after rc15 so this is not a problem right now. 2. The end of the log shows that the lock-up happens because there are no replies from the either replicate or from the brick for a create fop, for eg: [2010-10-06 17:52:46] D [nfs3-helpers.c:2275:nfs3_log_create_call] nfs-nfsv3: XID: 74d03094, CREATE: args: FH: hashcount 0, xlid 0, gen 0, ino 1, name: 11, mode: UNCHECKED [2010-10-06 17:52:46] T [nfs3.c:2240:nfs3_create] nfs-nfsv3: FH to Volume: glustervol1 [2010-10-06 17:52:46] T [nfs3-helpers.c:3000:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 1, gen: 0, entry: 11, hashidx: 0 [2010-10-06 17:52:46] T [nfs3-helpers.c:3008:nfs3_fh_resolve_entry_hard] nfs-nfsv3: Entry needs lookup: /11 [2010-10-06 17:52:46] T [nfs-fops.c:280:nfs_fop_lookup] nfs: Lookup: /11 [2010-10-06 17:52:46] T [nfs3-helpers.c:2549:nfs3_fh_resolve_entry_lookup_cbk] nfs-nfsv3: Lookup failed: /11: No such file or directory [2010-10-06 17:52:46] T [nfs.c:600:nfs_user_create] nfs: uid: 0, gid 0, gids: 1 [2010-10-06 17:52:46] T [nfs.c:608:nfs_user_create] nfs: gid: 0 [2010-10-06 17:52:46] T [nfs-fops.c:602:nfs_fop_create] nfs: Create: /11 [2010-10-06 17:52:46] T [nfs-fops.c:130:nfs_create_frame] nfs: uid: 0, gid 0, gids: 1 [2010-10-06 17:52:46] T [nfs-fops.c:132:nfs_create_frame] nfs: gid: 0 [2010-10-06 17:52:46] T [dht-common.c:3050:dht_create] glustervol1: creating /11 on mirror-0 [2010-10-06 17:52:46] D [dht-diskusage.c:71:dht_du_info_cbk] glustervol1: on subvolume 'mirror-1': avail_percent is: 99.00 and avail_space is: 1476552265728 [2010-10-06 17:52:46] D [dht-diskusage.c:71:dht_du_info_cbk] glustervol1: on subvolume 'mirror-0': avail_percent is: 99.00 and avail_space is: 1476552269824 Not sure yet whether this is a dht, afr or server problem.
Sac has been doing failover tests in the last week or so. We havent run into such a hang in 3.1. I am closing this bug.