Bug 987847 - nfs: EIO while untarring linux kernel and creating dirs with 200 depth simultaneously [NEEDINFO]
nfs: EIO while untarring linux kernel and creating dirs with 200 depth simul...
Status: CLOSED EOL
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nfs (Show other bugs)
2.1
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Niels de Vos
Saurabh
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-24 05:51 EDT by M S Vishwanath Bhat
Modified: 2016-01-19 01:14 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-03 12:24:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
spradhan: needinfo? (saujain)


Attachments (Terms of Use)
nfs log from harrier.blr.redhat.com (9.82 MB, text/x-log)
2013-07-24 05:51 EDT, M S Vishwanath Bhat
no flags Details
nfs log from mustang.blr.redhat.com (38.33 KB, text/x-log)
2013-07-24 05:52 EDT, M S Vishwanath Bhat
no flags Details

  None (edit)
Description M S Vishwanath Bhat 2013-07-24 05:51:06 EDT
Created attachment 777694 [details]
nfs log from harrier.blr.redhat.com

Description of problem:
I was trying to to untar the linux kernel on the nfs mountpoint. It was taking *long* time (more than an hour). Now I tried to create dirs with 200 depth (well within the path max). Now I see EIO while untarring the kernel.

Note: There was a geo-rep session going on between master (where I hit this issue) and another slave node of 2*2 distribute-distribute

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.12rhs.beta6-1.el6rhs.x86_64

How reproducible:
Hit once. Not sure if 100% reproducible.

Steps to Reproduce:
1. Create and start 2*2 distributed-replicated volume.
2. Mount via nfs and start untarring the linux kernel.
3. While untarring is still in progress created dirs with 200 depth from another client.
   mkdir -p `perl -e "print 'foo/' x 200"`

Actual results:
Errors seen during kernel Untar


linux-3.10.1/arch/ia64/include/asm/timex.h
linux-3.10.1/arch/ia64/include/asm/tlb.h
linux-3.10.1/arch/ia64/include/asm/tlbflush.h
linux-3.10.1/arch/ia64/include/asm/topology.h
linux-3.10.1/arch/ia64/include/asm/types.h
tar: linux-3.10.1/arch/ia64/include/asm/types.h: Cannot close: Input/output error
linux-3.10.1/arch/ia64/include/asm/uaccess.h
linux-3.10.1/arch/ia64/include/asm/unaligned.h
tar: linux-3.10.1/arch/ia64/include/asm/unaligned.h: Cannot close: Input/output error


Expected results:
Linux kernel untar should not error out and should not take long time.

Additional info:

Messages from the log files

[2013-07-24 07:57:28.714625] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 0-hosa-master-replicate-1: added root inode
[2013-07-24 07:57:28.715705] I [afr-common.c:2181:afr_discovery_cbk] 0-hosa-master-replicate-0: selecting local read_child hosa-master-client-1
[2013-07-24 09:17:30.264006] E [rpc-clnt.c:207:call_bail] 0-hosa-master-client-0: bailing out frame type(GlusterFS 3.3) op(FXATTROP(34)) xid = 0x123674x sent = 2013-07-24 08:47:24.430155. timeout = 1800
[2013-07-24 09:17:30.264055] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-hosa-master-client-0: remote operation failed: Transport endpoint is not connected
[2013-07-24 09:17:30.264076] E [rpc-clnt.c:207:call_bail] 0-hosa-master-client-1: bailing out frame type(GlusterFS 3.3) op(FXATTROP(34)) xid = 0x128217x sent = 2013-07-24 08:47:24.430210. timeout = 1800
[2013-07-24 09:17:30.264084] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-hosa-master-client-1: remote operation failed: Transport endpoint is not connected
[2013-07-24 09:17:31.341269] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: e95cef95: /linux-3.10.1/arch/ia64/include/asm/types.h => -1 (Transport endpoint is not connected)
[2013-07-24 09:17:31.341314] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: e95cef95, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, STABLE,wverf: 1374652648
[root@mustang ~]# tailf /var/log/glusterfs/nfs.log
[2013-07-24 09:33:44.158046] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: e95cef95, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, STABLE,wverf: 1374652648
[2013-07-24 09:33:44.158177] W [client-rpc-fops.c:1579:client3_3_finodelk_cbk] 0-hosa-master-client-1: remote operation failed: Invalid argument
[2013-07-24 09:33:44.158191] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-hosa-master-replicate-0: (null): unlock failed on 1 unlock by 2434990000000000
[2013-07-24 09:33:44.158213] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: a5def95: /linux-3.10.1/arch/ia64/include/asm/unaligned.h => -1 (Transport endpoint is not connected)
[2013-07-24 09:33:44.158221] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: a5def95, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, STABLE,wverf: 1374652648
[2013-07-24 09:33:53.224235] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 0-hosa-master-client-0: changing port to 49152 (from 0)




[2013-07-24 09:42:34.383365] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6745391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.383383] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6745391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647
[2013-07-24 09:42:34.383571] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6945391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.383603] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6945391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647
[2013-07-24 09:42:34.383771] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6b45391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.383789] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6b45391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647
[2013-07-24 09:42:34.383968] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6d45391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.384006] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6d45391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647



I have attached the nfs logs from both the nfs servers.
Comment 1 M S Vishwanath Bhat 2013-07-24 05:52:40 EDT
Created attachment 777695 [details]
nfs log from mustang.blr.redhat.com
Comment 3 Amar Tumballi 2013-07-25 02:28:10 EDT
Considering the directory depth is ~200 here, I am moving the priority to 'low'. If it was ~20 or lesser, this is high priority.
Comment 4 rjoseph 2013-09-16 02:55:35 EDT
From the logs it seems that one of the replicate set bricks have gone down during or before I/O operation. Because of which NFS is failing all the I/O operations on that brick with "I/O Error".

Please check if all the brick processes are running properly. Also check if any brick process crashed during the operation. If yes then please attach the core file for details.
Comment 5 M S Vishwanath Bhat 2013-09-18 07:32:23 EDT
As far I remember, I didn't bring down any node and all the bricks were online. There were no crashes.

I don't have that setup now. But there is nothing geo-rep specific to it so Ideally it should be readily reproducible, even though I haven't tried it once more.
Comment 6 santosh pradhan 2014-02-20 03:17:43 EST
I tried to repro the issue  with a single brick NFS server but no success. I dont see any issue. It would be great if QE can try with latest RHS build.
Comment 8 Vivek Agarwal 2015-12-03 12:24:58 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.