872923 – ln command execution fails on files with "invalid argument" error when executed the command from nfs mount

Bug 872923 - ln command execution fails on files with "invalid argument" error when executed the command from nfs mount

Summary: ln command execution fails on files with "invalid argument" error when execut...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Vivek Agarwal
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	872924 (view as bug list)
Depends On:
Blocks:	874051
TreeView+	depends on / blocked

Reported:	2012-11-04 10:09 UTC by spandura
Modified:	2016-02-18 00:02 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.4.0qa5
Doc Type:	Bug Fix
Doc Text:	Cause: Internally, getattr does a lookup after a recent change. For a lookup, the server needs parent gfid and the basename of the file, but since lookup is issued from getattr operation, it does not have the parent gfid ready(is NULL). Consequence: ln fails because lookup from getattr(internally) fails with EINVAL. Fix: Populate the parent inode(which contains gfid) in inode_loc_fill so that servers can lookup based on {parent gfid, basename} Result: The problem described does not happen with the fix. Also, bug 872924 is also solved with this fix.
Clone Of:
Clones:	874051 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:33:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description spandura 2012-11-04 10:09:55 UTC

Description of problem:
=======================
Configuration : 1x2 replicate volume, nfs mount. 

On a pure replicate volume (1x2) , when one of the brick is offline and from nfs mount when ln command is executed, the command fails with "Invalid argument" error message. 

Version-Release number of selected component (if applicable):
===========================================================
[11/04/12 - 04:07:56 root@darrel ~]# gluster --version
glusterfs 3.3.0.5rhs built on Nov  2 2012 01:29:35

[11/04/12 - 04:11:02 root@darrel ~]# rpm -qa | grep gluster
glusterfs-3.3.0.5rhs-35.el6rhs.x86_64
glusterfs-server-3.3.0.5rhs-35.el6rhs.x86_64

How reproducible:
==================
Often

script1.sh:-
============
mkdir test_hardlink_self_heal; cd test_hardlink_self_heal; for i in `seq 1 5`; do mkdir dir.$i; for j in `seq 1 10`; do dd if=/dev/input_file of=dir.$i/file.$j bs=1k count=$j ; done ; done; cd ../

script2.sh:-
=============
cd test_hardlink_self_heal; for i in `seq 1 5`;do  for j in `seq 1 10`; do ln dir.$i/file.$j dir.$i/link_file.$j; done ; done; cd ../

Steps to Reproduce:
==================
1.create a pure replicate volume (1x2). start the volume

2.create a nfs mount from the client. 

3.execute "script1.sh" from the nfs mount

4.After execution of "script1.sh", bring down brick1 

5.executed "script2.sh" from nfs mount

Actual results:
=================
The ln commands fails with "Invalid argument"

ln command output:-
======================
ln: accessing `dir.1/file.1': Invalid argument
ln: accessing `dir.1/file.2': Invalid argument
ln: accessing `dir.1/file.3': Invalid argument
ln: accessing `dir.1/file.4': Invalid argument
ln: accessing `dir.1/file.5': Invalid argument
ln: accessing `dir.1/file.6': Invalid argument
ln: accessing `dir.1/file.7': Invalid argument
ln: accessing `dir.1/file.8': Invalid argument
ln: accessing `dir.1/file.9': Invalid argument
ln: accessing `dir.1/file.10': Invalid argument

[2012-11-04 04:48:38.888919] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-rep_new-client-1: remote operation failed: Invalid argument. Path: /test_hardlink_self_heal/dir.1/file.1 (5b8f7106-5b0f-4613-9aee-21ec44b428e4)
[2012-11-04 04:48:38.888977] W [nfs3.c:707:nfs3svc_getattr_lookup_cbk] 0-nfs: 7aa2b74c: /test_hardlink_self_heal/dir.1/file.1 => -1 (Invalid argument)
[2012-11-04 04:48:38.889008] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7aa2b74c, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)
[2012-11-04 04:48:38.891498] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-rep_new-client-1: remote operation failed: Invalid argument. Path: /test_hardlink_self_heal/dir.1/file.2 (9c1aca42-136f-44a2-abfb-eb543e0446f7)
[2012-11-04 04:48:38.891556] W [nfs3.c:707:nfs3svc_getattr_lookup_cbk] 0-nfs: 7ca2b74c: /test_hardlink_self_heal/dir.1/file.2 => -1 (Invalid argument)
[2012-11-04 04:48:38.891587] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7ca2b74c, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)

Expected results:
================
ln command execution should be successful.

Comment 3 Jeff Darcy 2012-11-05 12:04:59 UTC

I have been able to reproduce this once on master, out of >20 attempts.

Comment 4 Pranith Kumar K 2012-11-05 12:21:08 UTC

Use this. Happens every time.

#!/bin/bash -x
glusterd
HOSTNAME=`hostname`
mkdir /gfs
gluster --mode=script volume create r2 replica 2 `hostname`:/gfs/r2_0 `hostname`:/gfs/r2_1
gluster --mode=script volume start r2
sleep 5
mount -t nfs `hostname`:/r2 /mnt/r2 -o vers=3,nolock
cd /mnt/r2
mkdir test_hardlink_self_heal; cd test_hardlink_self_heal; for i in `seq 1 5`; do mkdir dir.$i; for j in `seq 1 10`; do dd if=/dev/zero of=dir.$i/file.$j bs=1k count=$j ; done ; done; cd ../
kill -15 `cat /var/lib/glusterd/vols/r2/run/$HOSTNAME-gfs-r2_0.pid`
sleep 2
cd test_hardlink_self_heal; for i in `seq 1 5`;do  for j in `seq 1 10`; do ln dir.$i/file.$j dir.$i/link_file.$j; done ; done; cd ../
echo $?
cd
umount /mnt/r2

Comment 5 Jeff Darcy 2012-11-05 13:28:14 UTC

I'm still not having much luck.  Either there's something very timing-dependent here, or we're using different versions.  Will try with 3.3 branch.

Also, the suggestion has been made that this started with http://review.gluster.org/#change,4058.  If that's the case, then we could trivially make it go away again by having the changed part of nfs3_getattr_resume to check for a null parent GFID and revert to the old behavior if no GFID is present.  What worries me is that, without understanding why nfs3_fh_resolve_and_resume is returning such a loc, we might just be covering up a more fundamental problem.  That might bring back the problem 4058 was meant to fix, or even introduce new ones.

Comment 6 Rajesh 2012-11-07 07:39:03 UTC

*** Bug 872924 has been marked as a duplicate of this bug. ***

Comment 8 spandura 2012-11-07 11:59:25 UTC

This bug is to be verified for 2.1. 

The clone of this bug https://bugzilla.redhat.com/show_bug.cgi?id=874051 is verified for update_3. 

Moving this bug to ON_QA.

Comment 9 Vidya Sakar 2012-11-07 12:02:49 UTC

This fix is not yet there in 2.1, moving the bug to MODIFIED.

Comment 10 Rajesh 2012-11-07 12:04:02 UTC

Cause: Internally, getattr does a lookup after a recent change. For a lookup, the server needs parent gfid and the basename of the file, but since lookup is issued from getattr operation, it does not have the parent gfid ready(is NULL).

Consequence: ln fails because lookup from getattr(internally) fails with EINVAL.

Fix: Populate the parent inode(which contains gfid) in inode_loc_fill so that servers can lookup based on {parent gfid, basename}

Result: The problem described does not happen with the fix. Also, bug 872924 is also solved with this fix.

Comment 11 Vijay Bellur 2012-11-13 08:00:12 UTC

CHANGE: http://review.gluster.org/4157 (nfs: resolve parent inode during inode_loc_fill) merged in master by Vijay Bellur (vbellur)

Comment 13 spandura 2013-07-10 05:33:36 UTC

Verified the fix on build : 

root@king [Jul-10-2013-10:53:48] >rpm -qa | grep glusterfs
glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-devel-3.4.0.12rhs.beta3-1.el6rhs.x86_64

root@king [Jul-10-2013-10:54:01] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18

Bug is fixed.

Comment 14 Scott Haines 2013-09-23 22:33:34 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.