This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours

Bug 763577 (GLUSTER-1845)

Summary: Create fop hangs beyond distribute
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: nfs-alphaCC: amarts, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Shehjar Tikoo 2010-10-07 01:43:13 EDT
Log at: /share/tickets/1845/
Comment 1 Shehjar Tikoo 2010-10-07 04:35:25 EDT
From Allen:

While testing a 4 node setup of dist+mirror with rc15 GNFS, we ran into a lockup issue after the vIP is failed over to the mirrored node with ucarp.

vIPs are 153.90.178.104/24 and 153.90.178.105/24 . 104 failed over to 105 then the lockup happens.

There are two problems here:

1. nfs server is receiving changed inode numbers from the server. The first response by nfs server should be to revalidate the inode. This is fixed in a patch committed after rc15 so this is not a problem right now.

2. The end of the log shows that the lock-up happens because there are no replies from the either replicate or from the brick for a create fop, for eg:

[2010-10-06 17:52:46] D [nfs3-helpers.c:2275:nfs3_log_create_call] nfs-nfsv3: XID: 74d03094, CREATE: args: FH: hashcount 0, xlid 0, gen 0, ino 1, name: 11, mode: UNCHECKED
[2010-10-06 17:52:46] T [nfs3.c:2240:nfs3_create] nfs-nfsv3: FH to Volume: glustervol1
[2010-10-06 17:52:46] T [nfs3-helpers.c:3000:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 1, gen: 0, entry: 11, hashidx: 0
[2010-10-06 17:52:46] T [nfs3-helpers.c:3008:nfs3_fh_resolve_entry_hard] nfs-nfsv3: Entry needs lookup: /11
[2010-10-06 17:52:46] T [nfs-fops.c:280:nfs_fop_lookup] nfs: Lookup: /11
[2010-10-06 17:52:46] T [nfs3-helpers.c:2549:nfs3_fh_resolve_entry_lookup_cbk] nfs-nfsv3: Lookup failed: /11: No such file or directory
[2010-10-06 17:52:46] T [nfs.c:600:nfs_user_create] nfs: uid: 0, gid 0, gids: 1
[2010-10-06 17:52:46] T [nfs.c:608:nfs_user_create] nfs: gid: 0
[2010-10-06 17:52:46] T [nfs-fops.c:602:nfs_fop_create] nfs: Create: /11
[2010-10-06 17:52:46] T [nfs-fops.c:130:nfs_create_frame] nfs: uid: 0, gid 0, gids: 1
[2010-10-06 17:52:46] T [nfs-fops.c:132:nfs_create_frame] nfs: gid: 0
[2010-10-06 17:52:46] T [dht-common.c:3050:dht_create] glustervol1: creating /11 on mirror-0
[2010-10-06 17:52:46] D [dht-diskusage.c:71:dht_du_info_cbk] glustervol1: on subvolume 'mirror-1': avail_percent is: 99.00 and avail_space is: 1476552265728
[2010-10-06 17:52:46] D [dht-diskusage.c:71:dht_du_info_cbk] glustervol1: on subvolume 'mirror-0': avail_percent is: 99.00 and avail_space is: 1476552269824


Not sure yet whether this is a dht, afr or server problem.
Comment 2 Shehjar Tikoo 2010-11-08 21:10:52 EST
Sac has been doing failover tests in the last week or so. We havent run into such a hang in 3.1. I am closing this bug.