Bug 1278399 - I/O failure on attaching tier on nfs client
I/O failure on attaching tier on nfs client
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.1.2
Assigned To: Mohammed Rafi KC
Bhaskarakiran
: ZStream
: 1049181 (view as bug list)
Depends On: 1272949
Blocks: 1146338 1049181 1114033 1139193 1260783 1260923 1276742 1279095 1279830 1286064
  Show dependency treegraph
 
Reported: 2015-11-05 06:56 EST by nchilaka
Modified: 2016-11-23 18:11 EST (History)
15 users (show)

See Also:
Fixed In Version: glusterfs-3.7.5-17
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1272949
: 1279095 (view as bug list)
Environment:
Last Closed: 2016-03-01 00:52:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2015-11-05 06:56:44 EST
+++ This bug was initially created as a clone of Bug #1272949 +++

Description of problem:

on going I/o's are failing when attaching tier

Version-Release number of selected component (if applicable):


How reproducible:

100

Steps to Reproduce:
1.create a dist-rep volume
2.mount and start linux untar on mount point
3.attach-tier

Actual results:

i/o failure

Expected results:

i/o should not fail

Additional info:

--- Additional comment from Vijay Bellur on 2015-10-19 05:46:22 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#2) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-19 05:47:10 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#3) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-20 00:51:58 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#4) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-20 00:52:01 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#3) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-22 14:54:39 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#5) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-22 14:54:42 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#4) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-22 14:54:46 EDT ---

REVIEW: http://review.gluster.org/12414 (dht:heal layout after a nameless lookup) posted (#1) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-28 15:52:44 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#6) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-28 15:52:47 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#5) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-28 15:52:50 EDT ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#1) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-30 07:43:56 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#7) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-30 07:44:00 EDT ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#2) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-30 07:44:09 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#6) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-30 08:58:17 EDT ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#8) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-30 08:58:20 EDT ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#3) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-10-30 08:58:23 EDT ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#7) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-02 01:28:43 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#9) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-02 01:28:46 EST ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#4) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-02 01:28:49 EST ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#8) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-02 09:14:10 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#10) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-02 09:14:13 EST ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#5) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-02 09:14:16 EST ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#9) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-03 01:24:02 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#11) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-03 01:24:05 EST ---

REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#6) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-03 01:24:07 EST ---

REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#10) for review on master by mohammed rafi  kc (rkavunga@redhat.com)

--- Additional comment from Vijay Bellur on 2015-11-03 14:01:40 EST ---

REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#12) for review on master by mohammed rafi  kc (rkavunga@redhat.com)
Comment 3 Mohammed Rafi KC 2015-11-10 05:01:55 EST
I/O's were failed after attaching the tier is because, the fix-layout was not complete for some directories. So the directory structure was not proper on hot tier, and then trying to access such directories will result a failure.

Fix : 

after a nameless lookup if we get an incomplete layout, we will trigger a healing after getting full path from the server.
Comment 4 Bhaskarakiran 2015-11-17 07:08:13 EST
Checked on 3.7.5-6 build and there are errors with attach-tier during IO on nfs mount. Once the attach-tier is done, IO resumes normally.

linux-4.1.1/arch/arm/boot/dts/animeo_ip.dts
tar: linux-4.1.1/arch/arm/boot/dts/animeo_ip.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/arm-realview-pb1176.dts
tar: linux-4.1.1/arch/arm/boot/dts/arm-realview-pb1176.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-db.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-db.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-mirabox.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-mirabox.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn102.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn102.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn104.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn104.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-rd.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-rd.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-synology-ds213j.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-synology-ds213j.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370-xp.dtsi
tar: linux-4.1.1/arch/arm/boot/dts/armada-370-xp.dtsi: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-370.dtsi
tar: linux-4.1.1/arch/arm/boot/dts/armada-370.dtsi: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-375-db.dts
tar: linux-4.1.1/arch/arm/boot/dts/armada-375-db.dts: Cannot open: Invalid argument
linux-4.1.1/arch/arm/boot/dts/armada-375.dtsi
tar: linux-4.1.1/arch/arm/boot/dts/armada-375.dtsi: Cannot open: Invalid argument
Comment 5 Vivek Agarwal 2015-11-17 07:11:49 EST
Are these log entries only or are these actual I/O errors?
Comment 6 Mohammed Rafi KC 2015-11-17 09:35:54 EST
tried to reproduce with same machine and configuration. I couldn't find I/O's failing. I will keep on trying to reproduce this.
Comment 7 Mohammed Rafi KC 2015-11-19 02:30:29 EST
I reproduced this error in tiered volume as well as in non-tiered volume. This is inconsistently happening in a heavy parallel i/o's. Also sometimes, mount hangs indefinitely as server is not responding after attach/add a brick. But restarting the nfs-server will resume the i/o gracefully if it is hanged.

This is not a specific tier problem as It can reproduced on a non-tiered volume the same way in tiered volume
Comment 8 Jiffin 2015-11-20 04:16:08 EST
Adding certain points to Rafi's comment.

The issue is reproduced in non-tier distributed replicated volume while a add-brick happens. Most of time I/O hangs(say 90 reproducible). And also I got "invalidate error on the mount" twice (one in tier and another non tier volume)

After discussing with Niels, he suggested me run the same test with reducing epoll threads to one. When I ran with epoll thread=1 , then out of four run , only one hangs in non tier distributed replicated volume. (But issue still exists)

As mentioned in Rafi's comment when we restart the nfs-server again, it resumes gracefully.

I suspect mount hang on client may be related to nfs issue, but "invalidate error" may be related to some other component.
 
And one more thing, issue is reproduced consistently(I mean hung) in my setup due to the low system(hardware) configuration.
Comment 9 Soumya Koduri 2015-11-24 04:16:14 EST
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1282771#c2, this operation in the large workload seem to work if disabled throttling. Request QE to disable it and confirm if the issue still persist.https://bugzilla.redhat.com/show_bug.cgi?id=1282771#c2
Comment 10 Susant Kumar Palai 2015-11-27 05:32:50 EST
*** Bug 1049181 has been marked as a duplicate of this bug. ***
Comment 11 Bhaskarakiran 2015-11-30 04:35:13 EST
I can still see the invalid argument errors even after setting throttling to 0. The packet trace and setup details are provided to DEV. This holds good for the other bug  https://bugzilla.redhat.com/show_bug.cgi?id=1282771 (Detach-tier + NFS) as well.
Comment 14 Joseph Elwin Fernandes 2016-01-19 07:24:36 EST
This will probably fix with bug 1296048 , it is worth of revisiting after the bug  bug 1296048 fixed.
Comment 19 Bhaskarakiran 2016-01-28 10:53:55 EST
verification of bug 1296048 and this one is same. Moving the state.
Comment 21 errata-xmlrpc 2016-03-01 00:52:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.