Bug 987847

Summary: nfs: EIO while untarring linux kernel and creating dirs with 200 depth simultaneously
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: gluster-nfsAssignee: Niels de Vos <ndevos>
Status: CLOSED EOL QA Contact: Saurabh <saujain>
Severity: medium Docs Contact:
Priority: low    
Version: 2.1CC: mzywusko, poelstra, rhs-bugs, rjoseph, saujain, surs, vagarwal, vbellur, vbhat
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:24:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nfs log from harrier.blr.redhat.com
none
nfs log from mustang.blr.redhat.com none

Description M S Vishwanath Bhat 2013-07-24 09:51:06 UTC
Created attachment 777694 [details]
nfs log from harrier.blr.redhat.com

Description of problem:
I was trying to to untar the linux kernel on the nfs mountpoint. It was taking *long* time (more than an hour). Now I tried to create dirs with 200 depth (well within the path max). Now I see EIO while untarring the kernel.

Note: There was a geo-rep session going on between master (where I hit this issue) and another slave node of 2*2 distribute-distribute

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.12rhs.beta6-1.el6rhs.x86_64

How reproducible:
Hit once. Not sure if 100% reproducible.

Steps to Reproduce:
1. Create and start 2*2 distributed-replicated volume.
2. Mount via nfs and start untarring the linux kernel.
3. While untarring is still in progress created dirs with 200 depth from another client.
   mkdir -p `perl -e "print 'foo/' x 200"`

Actual results:
Errors seen during kernel Untar


linux-3.10.1/arch/ia64/include/asm/timex.h
linux-3.10.1/arch/ia64/include/asm/tlb.h
linux-3.10.1/arch/ia64/include/asm/tlbflush.h
linux-3.10.1/arch/ia64/include/asm/topology.h
linux-3.10.1/arch/ia64/include/asm/types.h
tar: linux-3.10.1/arch/ia64/include/asm/types.h: Cannot close: Input/output error
linux-3.10.1/arch/ia64/include/asm/uaccess.h
linux-3.10.1/arch/ia64/include/asm/unaligned.h
tar: linux-3.10.1/arch/ia64/include/asm/unaligned.h: Cannot close: Input/output error


Expected results:
Linux kernel untar should not error out and should not take long time.

Additional info:

Messages from the log files

[2013-07-24 07:57:28.714625] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 0-hosa-master-replicate-1: added root inode
[2013-07-24 07:57:28.715705] I [afr-common.c:2181:afr_discovery_cbk] 0-hosa-master-replicate-0: selecting local read_child hosa-master-client-1
[2013-07-24 09:17:30.264006] E [rpc-clnt.c:207:call_bail] 0-hosa-master-client-0: bailing out frame type(GlusterFS 3.3) op(FXATTROP(34)) xid = 0x123674x sent = 2013-07-24 08:47:24.430155. timeout = 1800
[2013-07-24 09:17:30.264055] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-hosa-master-client-0: remote operation failed: Transport endpoint is not connected
[2013-07-24 09:17:30.264076] E [rpc-clnt.c:207:call_bail] 0-hosa-master-client-1: bailing out frame type(GlusterFS 3.3) op(FXATTROP(34)) xid = 0x128217x sent = 2013-07-24 08:47:24.430210. timeout = 1800
[2013-07-24 09:17:30.264084] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-hosa-master-client-1: remote operation failed: Transport endpoint is not connected
[2013-07-24 09:17:31.341269] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: e95cef95: /linux-3.10.1/arch/ia64/include/asm/types.h => -1 (Transport endpoint is not connected)
[2013-07-24 09:17:31.341314] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: e95cef95, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, STABLE,wverf: 1374652648
[root@mustang ~]# tailf /var/log/glusterfs/nfs.log
[2013-07-24 09:33:44.158046] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: e95cef95, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, STABLE,wverf: 1374652648
[2013-07-24 09:33:44.158177] W [client-rpc-fops.c:1579:client3_3_finodelk_cbk] 0-hosa-master-client-1: remote operation failed: Invalid argument
[2013-07-24 09:33:44.158191] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-hosa-master-replicate-0: (null): unlock failed on 1 unlock by 2434990000000000
[2013-07-24 09:33:44.158213] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: a5def95: /linux-3.10.1/arch/ia64/include/asm/unaligned.h => -1 (Transport endpoint is not connected)
[2013-07-24 09:33:44.158221] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: a5def95, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, STABLE,wverf: 1374652648
[2013-07-24 09:33:53.224235] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 0-hosa-master-client-0: changing port to 49152 (from 0)




[2013-07-24 09:42:34.383365] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6745391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.383383] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6745391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647
[2013-07-24 09:42:34.383571] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6945391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.383603] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6945391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647
[2013-07-24 09:42:34.383771] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6b45391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.383789] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6b45391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647
[2013-07-24 09:42:34.383968] W [nfs3.c:2069:nfs3svc_write_cbk] 0-nfs: 6d45391f: /foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/.linux-3.10.1.tar.gz.oToFw3 => -1 (Transport endpoint is not connected)
[2013-07-24 09:42:34.384006] W [nfs3-helpers.c:3443:nfs3_log_write_res] 0-nfs-nfsv3: XID: 6d45391f, WRITE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), count: 0, UNSTABLE,wverf: 1374652647



I have attached the nfs logs from both the nfs servers.

Comment 1 M S Vishwanath Bhat 2013-07-24 09:52:40 UTC
Created attachment 777695 [details]
nfs log from mustang.blr.redhat.com

Comment 3 Amar Tumballi 2013-07-25 06:28:10 UTC
Considering the directory depth is ~200 here, I am moving the priority to 'low'. If it was ~20 or lesser, this is high priority.

Comment 4 rjoseph 2013-09-16 06:55:35 UTC
From the logs it seems that one of the replicate set bricks have gone down during or before I/O operation. Because of which NFS is failing all the I/O operations on that brick with "I/O Error".

Please check if all the brick processes are running properly. Also check if any brick process crashed during the operation. If yes then please attach the core file for details.

Comment 5 M S Vishwanath Bhat 2013-09-18 11:32:23 UTC
As far I remember, I didn't bring down any node and all the bricks were online. There were no crashes.

I don't have that setup now. But there is nothing geo-rep specific to it so Ideally it should be readily reproducible, even though I haven't tried it once more.

Comment 6 santosh pradhan 2014-02-20 08:17:43 UTC
I tried to repro the issue  with a single brick NFS server but no success. I dont see any issue. It would be great if QE can try with latest RHS build.

Comment 8 Vivek Agarwal 2015-12-03 17:24:58 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Comment 9 Red Hat Bugzilla 2023-09-14 01:48:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days