Description of problem: ======================= Created a 2x(4+2) EC volume and nfs mounted on the client with quota and uss enabled. Started IO (linux untar, mkdir's, dd (parallel - 1000's). Tried attaching the tier (2x2 dist-rep) and seeing invalid argument errors continuously. The same errors were seen during detach-tier but IO resumed after some time. In this case, complete IO fails with the error messages. If quota and uss are turned off, below errors are seen for some time and then the IO resumes. tar: linux-4.1.1/Documentation/devicetree/bindings/input/touchscreen/zforce_ts.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt tar: linux-4.1.1/Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/input/twl4030-keypad.txt tar: linux-4.1.1/Documentation/devicetree/bindings/input/twl4030-keypad.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/input/twl4030-pwrbutton.txt tar: linux-4.1.1/Documentation/devicetree/bindings/input/twl4030-pwrbutton.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/ linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/abilis,tb10x-ictl.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/abilis,tb10x-ictl.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun4i-ic.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun4i-ic.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun67i-sc-nmi.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun67i-sc-nmi.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm3380-l2-intc.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm3380-l2-intc.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7120-l2-intc.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7120-l2-intc.txt: Cannot open: File exists linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,l2-intc.txt tar: linux-4.1.1/Documentation/devicetree/bindings/interrupt-controller/brcm,l2-intc.txt: Cannot open: File exists Version-Release number of selected component (if applicable): =============================================================== 3.7.5-14 How reproducible: ================= 100% Steps to Reproduce: 1. Create a disperse volume 2x(4+2) 2. NFS mount on the client. 3. Start IO (linux untar - 2 instances, mkdir (1000 in parallel), dd (1000 in parallel) 4. Attach tier (2x2 dist-rep) volume Actual results: =============== Invalid argument errors Expected results: ================= No errors to be seen and IO should be smooth. Additional info: ================ sosrepots in rhsqe.
RCA: After add-brick, NFS server will be restarted to load new graph. ie NFS server inode table will be fresh after restarting the process. So as part of the fop, resolver will send a lookup on an entry if inode is not lookedup before. During the lookup if healing requires for the entry from DHT (when directories are not present on all of the subvol), we will initiate a healing to create the directories on all of the subvolume. As part of the healing, we are doing a series of named lookup on all the parents starting from root if the inodes are not present, so for a successful lookup we will link the inode to inode table also. This lookup will be initiated from dht, so inode ctx will be created only for the xlators which are beneath of dht. Since we already linked the inode, ie resolver will not do a lookup for next fop. So xlator which are above dht will not have inode ctx. Here in this case, svc_access was complaining about missing inode_ctx. Possible solutions: 1) Move dht healing code to interface layer, if healing is required then dht should let the interface layer about healing, and need to give a path to heal. So that each interface layer should do a healing which include fuse, nfs, gfapi. 2) Do not link the inode from any of the xlators other than master xlators, ie do not link from dht. This will cause a huge performance degradation in healing code path, and we might need to do some hack to heal without a linked inode. 3) During resolving of an entry, currently resolving will be successful if there is an inode in the inode table. Make an extra check to see if the inode_ctx is present or not, if inode_ctx is not present for a linked inode, then resolver should consider as an invalid inode and need to do a lookup with the same inode.
one more patch required to fix this problem completely.
upstream patches http://review.gluster.org/#/c/11892/ http://review.gluster.org/#/c/13224/ http://review.gluster.org/#/c/13225/ http://review.gluster.org/#/c/13226/ http://review.gluster.org/#/c/13227/
Verified this on 3.7.5-17 and didn't hit the issue. Marking this as verified.
However there will a pause of the IOs for sometime(may be about 4-5min) while running IOs with attach tier on NFS
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html