Bug 1278399 - I/O failure on attaching tier on nfs client
Summary: I/O failure on attaching tier on nfs client
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.1.2
Assignee: Mohammed Rafi KC
QA Contact: Bhaskarakiran
URL:
Whiteboard:
: 1049181 (view as bug list)
Depends On: 1272949
Blocks: 1049181 1114033 1139193 1146338 1260783 1260923 1276742 1279095 1279830 1286064
TreeView+ depends on / blocked
 
Reported: 2015-11-05 11:56 UTC by Nag Pavan Chilakam
Modified: 2018-11-30 05:42 UTC (History)
15 users (show)

Fixed In Version: glusterfs-3.7.5-17
Doc Type: Bug Fix
Doc Text:
Clone Of: 1272949
: 1279095 (view as bug list)
Environment:
Last Closed: 2016-03-01 05:52:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Nag Pavan Chilakam 2015-11-05 11:56:44 UTC
+++ This bug was initially created as a clone of Bug #1272949 +++

Description of problem:

on going I/o's are failing when attaching tier

Version-Release number of selected component (if applicable):


How reproducible:

100

Steps to Reproduce:
1.create a dist-rep volume
2.mount and start linux untar on mount point
3.attach-tier

Actual results:

i/o failure

Expected results:

i/o should not fail

Additional info:

--- Additional comment from Vijay Bellur on 2015-10-19 05:46:22 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-19 05:47:10 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-20 00:51:58 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#4) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-20 00:52:01 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-22 14:54:39 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#5) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-22 14:54:42 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-22 14:54:46 EDT ---

REVIEW: http://review.gluster.org/12414 (dht:heal layout after a nameless lookup) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-28 15:52:44 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#6) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-28 15:52:47 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-28 15:52:50 EDT ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-30 07:43:56 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#7) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-30 07:44:00 EDT ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-30 07:44:09 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#6) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-30 08:58:17 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#8) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-30 08:58:20 EDT ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-10-30 08:58:23 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#7) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-02 01:28:43 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#9) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-02 01:28:46 EST ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-02 01:28:49 EST ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#8) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-02 09:14:10 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#10) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-02 09:14:13 EST ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-02 09:14:16 EST ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#9) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-03 01:24:02 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#11) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-03 01:24:05 EST ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#6) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-03 01:24:07 EST ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#10) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2015-11-03 14:01:40 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#12) for review on master by mohammed rafi  kc (rkavunga)

Comment 3 Mohammed Rafi KC 2015-11-10 10:01:55 UTC
I/O's were failed after attaching the tier is because, the fix-layout was not complete for some directories. So the directory structure was not proper on hot tier, and then trying to access such directories will result a failure.

Fix : 

after a nameless lookup if we get an incomplete layout, we will trigger a healing after getting full path from the server.

Comment 4 Bhaskarakiran 2015-11-17 12:08:13 UTC
Checked on 3.7.5-6 build and there are errors with attach-tier during IO on nfs mount. Once the attach-tier is done, IO resumes normally.

linux-4.1.1/arch/arm/boot/dts/animeo_ip.dts
tar: linux-4.1.1/arch/arm/boot/dts/animeo_ip.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/arm-realview-pb1176.dts
tar: linux-4.1.1/arch/arm/boot/dts/arm-realview-pb1176.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-db.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-db.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-mirabox.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-mirabox.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn102.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn102.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn104.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn104.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-rd.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-rd.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-synology-ds213j.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-synology-ds213j.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-xp.dtsi
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-xp.dtsi: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370.dtsi
tar: linux-4.1.1/arch/arm/boot/dts/armada-370.dtsi: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-375-db.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-375-db.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-375.dtsi
tar: linux-4.1.1/arch/arm/boot/dts/armada-375.dtsi: Cannot open: Invalid argument

Comment 5 Vivek Agarwal 2015-11-17 12:11:49 UTC
Are these log entries only or are these actual I/O errors?

Comment 6 Mohammed Rafi KC 2015-11-17 14:35:54 UTC
tried to reproduce with same machine and configuration. I couldn't find I/O's failing. I will keep on trying to reproduce this.

Comment 7 Mohammed Rafi KC 2015-11-19 07:30:29 UTC
I reproduced this error in tiered volume as well as in non-tiered volume. This is inconsistently happening in a heavy parallel i/o's. Also sometimes, mount hangs indefinitely as server is not responding after attach/add a brick. But restarting the nfs-server will resume the i/o gracefully if it is hanged.

This is not a specific tier problem as It can reproduced on a non-tiered volume the same way in tiered volume

Comment 8 Jiffin 2015-11-20 09:16:08 UTC
Adding certain points to Rafi's comment.

The issue is reproduced in non-tier distributed replicated volume while a add-brick happens. Most of time I/O hangs(say 90 reproducible). And also I got "invalidate error on the mount" twice (one in tier and another non tier volume)

After discussing with Niels, he suggested me run the same test with reducing epoll threads to one. When I ran with epoll thread=1 , then out of four run , only one hangs in non tier distributed replicated volume. (But issue still exists)

As mentioned in Rafi's comment when we restart the nfs-server again, it resumes gracefully.

I suspect mount hang on client may be related to nfs issue, but "invalidate error" may be related to some other component.
 
And one more thing, issue is reproduced consistently(I mean hung) in my setup due to the low system(hardware) configuration.

Comment 9 Soumya Koduri 2015-11-24 09:16:14 UTC
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1282771#c2, this operation in the large workload seem to work if disabled throttling. Request QE to disable it and confirm if the issue still persist.https://bugzilla.redhat.com/show_bug.cgi?id=1282771#c2

Comment 10 Susant Kumar Palai 2015-11-27 10:32:50 UTC
*** Bug 1049181 has been marked as a duplicate of this bug. ***

Comment 11 Bhaskarakiran 2015-11-30 09:35:13 UTC
I can still see the invalid argument errors even after setting throttling to 0. The packet trace and setup details are provided to DEV. This holds good for the other bug  https://bugzilla.redhat.com/show_bug.cgi?id=1282771 (Detach-tier + NFS) as well.

Comment 14 Joseph Elwin Fernandes 2016-01-19 12:24:36 UTC
This will probably fix with bug 1296048 , it is worth of revisiting after the bug  bug 1296048 fixed.

Comment 19 Bhaskarakiran 2016-01-28 15:53:55 UTC
verification of bug 1296048 and this one is same. Moving the state.

Comment 21 errata-xmlrpc 2016-03-01 05:52:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.