+++ This bug was initially created as a clone of Bug #1272949 +++ Description of problem: on going I/o's are failing when attaching tier Version-Release number of selected component (if applicable): How reproducible: 100 Steps to Reproduce: 1.create a dist-rep volume 2.mount and start linux untar on mount point 3.attach-tier Actual results: i/o failure Expected results: i/o should not fail Additional info: --- Additional comment from Vijay Bellur on 2015-10-19 05:46:22 EDT --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#2) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-19 05:47:10 EDT --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#3) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-20 00:51:58 EDT --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#4) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-20 00:52:01 EDT --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#3) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-22 14:54:39 EDT --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#5) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-22 14:54:42 EDT --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is nor present) posted (#4) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-22 14:54:46 EDT --- REVIEW: http://review.gluster.org/12414 (dht:heal layout after a nameless lookup) posted (#1) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-28 15:52:44 EDT --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#6) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-28 15:52:47 EDT --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#5) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-28 15:52:50 EDT --- REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#1) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-30 07:43:56 EDT --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#7) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-30 07:44:00 EDT --- REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#2) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-30 07:44:09 EDT --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#6) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-30 08:58:17 EDT --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#8) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-30 08:58:20 EDT --- REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#3) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-10-30 08:58:23 EDT --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#7) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-02 01:28:43 EST --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#9) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-02 01:28:46 EST --- REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#4) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-02 01:28:49 EST --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#8) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-02 09:14:10 EST --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#10) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-02 09:14:13 EST --- REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#5) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-02 09:14:16 EST --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#9) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-03 01:24:02 EST --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#11) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-03 01:24:05 EST --- REVIEW: http://review.gluster.org/12449 (dht: update cached subvolume during readdirp cbk) posted (#6) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-03 01:24:07 EST --- REVIEW: http://review.gluster.org/12376 (dht: heal directory path if the directory is not present) posted (#10) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2015-11-03 14:01:40 EST --- REVIEW: http://review.gluster.org/12375 (Revert "fuse: resolve complete path after a graph switch") posted (#12) for review on master by mohammed rafi kc (rkavunga)
I/O's were failed after attaching the tier is because, the fix-layout was not complete for some directories. So the directory structure was not proper on hot tier, and then trying to access such directories will result a failure. Fix : after a nameless lookup if we get an incomplete layout, we will trigger a healing after getting full path from the server.
Checked on 3.7.5-6 build and there are errors with attach-tier during IO on nfs mount. Once the attach-tier is done, IO resumes normally. linux-4.1.1/arch/arm/boot/dts/animeo_ip.dts tar: linux-4.1.1/arch/arm/boot/dts/animeo_ip.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/arm-realview-pb1176.dts tar: linux-4.1.1/arch/arm/boot/dts/arm-realview-pb1176.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-db.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-370-db.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-mirabox.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-370-mirabox.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn102.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn102.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn104.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-370-netgear-rn104.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-rd.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-370-rd.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-synology-ds213j.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-370-synology-ds213j.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370-xp.dtsi tar: linux-4.1.1/arch/arm/boot/dts/armada-370-xp.dtsi: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-370.dtsi tar: linux-4.1.1/arch/arm/boot/dts/armada-370.dtsi: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-375-db.dts tar: linux-4.1.1/arch/arm/boot/dts/armada-375-db.dts: Cannot open: Invalid argument linux-4.1.1/arch/arm/boot/dts/armada-375.dtsi tar: linux-4.1.1/arch/arm/boot/dts/armada-375.dtsi: Cannot open: Invalid argument
Are these log entries only or are these actual I/O errors?
tried to reproduce with same machine and configuration. I couldn't find I/O's failing. I will keep on trying to reproduce this.
I reproduced this error in tiered volume as well as in non-tiered volume. This is inconsistently happening in a heavy parallel i/o's. Also sometimes, mount hangs indefinitely as server is not responding after attach/add a brick. But restarting the nfs-server will resume the i/o gracefully if it is hanged. This is not a specific tier problem as It can reproduced on a non-tiered volume the same way in tiered volume
Adding certain points to Rafi's comment. The issue is reproduced in non-tier distributed replicated volume while a add-brick happens. Most of time I/O hangs(say 90 reproducible). And also I got "invalidate error on the mount" twice (one in tier and another non tier volume) After discussing with Niels, he suggested me run the same test with reducing epoll threads to one. When I ran with epoll thread=1 , then out of four run , only one hangs in non tier distributed replicated volume. (But issue still exists) As mentioned in Rafi's comment when we restart the nfs-server again, it resumes gracefully. I suspect mount hang on client may be related to nfs issue, but "invalidate error" may be related to some other component. And one more thing, issue is reproduced consistently(I mean hung) in my setup due to the low system(hardware) configuration.
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1282771#c2, this operation in the large workload seem to work if disabled throttling. Request QE to disable it and confirm if the issue still persist.https://bugzilla.redhat.com/show_bug.cgi?id=1282771#c2
*** Bug 1049181 has been marked as a duplicate of this bug. ***
I can still see the invalid argument errors even after setting throttling to 0. The packet trace and setup details are provided to DEV. This holds good for the other bug https://bugzilla.redhat.com/show_bug.cgi?id=1282771 (Detach-tier + NFS) as well.
This will probably fix with bug 1296048 , it is worth of revisiting after the bug bug 1296048 fixed.
verification of bug 1296048 and this one is same. Moving the state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html