| Summary: | Create fop hangs beyond distribute | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Shehjar Tikoo <shehjart> |
| Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
| Status: | CLOSED WORKSFORME | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | nfs-alpha | CC: | amarts, gluster-bugs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Shehjar Tikoo
2010-10-07 05:43:13 UTC
From Allen: While testing a 4 node setup of dist+mirror with rc15 GNFS, we ran into a lockup issue after the vIP is failed over to the mirrored node with ucarp. vIPs are 153.90.178.104/24 and 153.90.178.105/24 . 104 failed over to 105 then the lockup happens. There are two problems here: 1. nfs server is receiving changed inode numbers from the server. The first response by nfs server should be to revalidate the inode. This is fixed in a patch committed after rc15 so this is not a problem right now. 2. The end of the log shows that the lock-up happens because there are no replies from the either replicate or from the brick for a create fop, for eg: [2010-10-06 17:52:46] D [nfs3-helpers.c:2275:nfs3_log_create_call] nfs-nfsv3: XID: 74d03094, CREATE: args: FH: hashcount 0, xlid 0, gen 0, ino 1, name: 11, mode: UNCHECKED [2010-10-06 17:52:46] T [nfs3.c:2240:nfs3_create] nfs-nfsv3: FH to Volume: glustervol1 [2010-10-06 17:52:46] T [nfs3-helpers.c:3000:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 1, gen: 0, entry: 11, hashidx: 0 [2010-10-06 17:52:46] T [nfs3-helpers.c:3008:nfs3_fh_resolve_entry_hard] nfs-nfsv3: Entry needs lookup: /11 [2010-10-06 17:52:46] T [nfs-fops.c:280:nfs_fop_lookup] nfs: Lookup: /11 [2010-10-06 17:52:46] T [nfs3-helpers.c:2549:nfs3_fh_resolve_entry_lookup_cbk] nfs-nfsv3: Lookup failed: /11: No such file or directory [2010-10-06 17:52:46] T [nfs.c:600:nfs_user_create] nfs: uid: 0, gid 0, gids: 1 [2010-10-06 17:52:46] T [nfs.c:608:nfs_user_create] nfs: gid: 0 [2010-10-06 17:52:46] T [nfs-fops.c:602:nfs_fop_create] nfs: Create: /11 [2010-10-06 17:52:46] T [nfs-fops.c:130:nfs_create_frame] nfs: uid: 0, gid 0, gids: 1 [2010-10-06 17:52:46] T [nfs-fops.c:132:nfs_create_frame] nfs: gid: 0 [2010-10-06 17:52:46] T [dht-common.c:3050:dht_create] glustervol1: creating /11 on mirror-0 [2010-10-06 17:52:46] D [dht-diskusage.c:71:dht_du_info_cbk] glustervol1: on subvolume 'mirror-1': avail_percent is: 99.00 and avail_space is: 1476552265728 [2010-10-06 17:52:46] D [dht-diskusage.c:71:dht_du_info_cbk] glustervol1: on subvolume 'mirror-0': avail_percent is: 99.00 and avail_space is: 1476552269824 Not sure yet whether this is a dht, afr or server problem. Sac has been doing failover tests in the last week or so. We havent run into such a hang in 3.1. I am closing this bug. |