Bug 1306917

Summary: With USS enabled, and during Attach tier, Seeing IO error "Cannot open: Stale file handle"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: krishnaram Karthick <kramdoss>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, asrivast, kramdoss, mzywusko, nbalacha, nchilaka, rcyriac, rhinduja, rhs-bugs, sankarshan, smohan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: tier-interops
Fixed In Version: Doc Type: Known Issue
Doc Text:
When a User Serviceable Snapshot is enabled, attaching a tier succeeds, but any I/O operations in progress during the attach tier operation may fail with stale file handle errors. Workaround: Disable User Serviceable Snapshots before performing attach tier. Once attach tier has succeeded, User Serviceable Snapshots can be enabled.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-06 17:34:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1268895    

Description Nag Pavan Chilakam 2016-02-12 07:52:50 UTC
Description of problem:
========================
After enabling USS and quotas, I did IO population using 3 NFS clients(used different servers for different clients to mount), with one doing linux untar, the other two doing file creates using dd command.
I then triggered an attach tier with IOs still going on.
I saw the following error for linux untar :
inux-4.4.1/arch/powerpc/include/uapi/asm/termios.h
tar: linux-4.4.1/arch/powerpc/include/uapi/asm/termios.h: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/include/uapi/asm/tm.h
tar: linux-4.4.1/arch/powerpc/include/uapi/asm/tm.h: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/include/uapi/asm/types.h
tar: linux-4.4.1/arch/powerpc/include/uapi/asm/types.h: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/include/uapi/asm/ucontext.h
tar: linux-4.4.1/arch/powerpc/include/uapi/asm/ucontext.h: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/include/uapi/asm/unistd.h
tar: linux-4.4.1/arch/powerpc/include/uapi/asm/unistd.h: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/
tar: linux-4.4.1/arch/powerpc/kernel: Cannot mkdir: Stale file handle
linux-4.4.1/arch/powerpc/kernel/.gitignore
tar: linux-4.4.1/arch/powerpc/kernel/.gitignore: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/Makefile
tar: linux-4.4.1/arch/powerpc/kernel/Makefile: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/align.c
tar: linux-4.4.1/arch/powerpc/kernel/align.c: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/asm-offsets.c
tar: linux-4.4.1/arch/powerpc/kernel/asm-offsets.c: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/audit.c
tar: linux-4.4.1/arch/powerpc/kernel/audit.c: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/btext.c
tar: linux-4.4.1/arch/powerpc/kernel/btext.c: Cannot open: Stale file handle
linux-4.4.1/arch/powerpc/kernel/cacheinfo.c


And the untar failed .



Version-Release number of selected component (if applicable):
=========================
3.7.5-19


How reproducible:
==============
i either hit this bug or the 1306194: NFS+attach tier:IOs hang while attach tier is issued 
Out of 5 times I retried, I hit twice this bug and the remaining 3 times the nfs hang

Steps to reproduce:
=================

1)client1:created a 300Mb file and started to copy the file to new files 
for i in {2..50};do cp hlfile.1 hlfile.$i;done

2)client2:created 50Mb file and initiated a rename of file continuously 
for i in {2..1000};do cp rename.1 rename.$i;done

3)client3: linux untar
4)copying a 3GB file to create new files in loop 
for i in {1..10};do cp File.mkv cheema$i.mkv;done



Volume Name: finalvol
Type: Tier
Volume ID: 15a9fbaa-7e45-4302-b246-19e48cbdf059
Status: Started
Number of Bricks: 36
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 6 x 2 = 12
Brick1: 10.70.35.239:/bricks/brick7/final_hot
Brick2: 10.70.35.133:/bricks/brick7/final_hot
Brick3: 10.70.37.202:/bricks/brick7/final_hot
Brick4: 10.70.37.195:/bricks/brick7/final_hot
Brick5: 10.70.37.120:/bricks/brick7/final_hot
Brick6: 10.70.37.60:/bricks/brick7/final_hot
Brick7: 10.70.37.69:/bricks/brick7/final_hot
Brick8: 10.70.37.101:/bricks/brick7/final_hot
Brick9: 10.70.35.163:/bricks/brick7/final_hot
Brick10: 10.70.35.173:/bricks/brick7/final_hot
Brick11: 10.70.35.232:/bricks/brick7/final_hot
Brick12: 10.70.35.176:/bricks/brick7/final_hot
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (8 + 4) = 24
Brick13: 10.70.37.202:/bricks/brick1/finalvol
Brick14: 10.70.37.195:/bricks/brick1/finalvol
Brick15: 10.70.35.133:/bricks/brick1/finalvol
Brick16: 10.70.35.239:/bricks/brick1/finalvol
Brick17: 10.70.35.225:/bricks/brick1/finalvol
Brick18: 10.70.35.11:/bricks/brick1/finalvol
Brick19: 10.70.35.10:/bricks/brick1/finalvol
Brick20: 10.70.35.231:/bricks/brick1/finalvol
Brick21: 10.70.35.176:/bricks/brick1/finalvol
Brick22: 10.70.35.232:/bricks/brick1/finalvol
Brick23: 10.70.35.173:/bricks/brick1/finalvol
Brick24: 10.70.35.163:/bricks/brick1/finalvol
Brick25: 10.70.37.101:/bricks/brick1/finalvol
Brick26: 10.70.37.69:/bricks/brick1/finalvol
Brick27: 10.70.37.60:/bricks/brick1/finalvol
Brick28: 10.70.37.120:/bricks/brick1/finalvol
Brick29: 10.70.37.202:/bricks/brick2/finalvol
Brick30: 10.70.37.195:/bricks/brick2/finalvol
Brick31: 10.70.35.133:/bricks/brick2/finalvol
Brick32: 10.70.35.239:/bricks/brick2/finalvol
Brick33: 10.70.35.225:/bricks/brick2/finalvol
Brick34: 10.70.35.11:/bricks/brick2/finalvol
Brick35: 10.70.35.10:/bricks/brick2/finalvol
Brick36: 10.70.35.231:/bricks/brick2/finalvol
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
[root@dhcp37-202 ~]# 



NOTE: If we did an uss disable I see the nfs hang issue

Comment 5 Nag Pavan Chilakam 2016-02-15 05:36:11 UTC
mount points:
Server:Client-->IOtype
Mount1:
10.70.35.133:rhs-client4----> file rename
#for i in {1..2000};do mv -f foolu.$i qwer.$i ;done

MOunt2:
10.70.35.225:rhs-client9.lab.eng.blr.redhat.com---> linux untar as below
#date;date >> untar.log;for i in {1..5};do mkdir dir.$i;echo "created dir.$i" >>untar.log;cp linux-4.4.1.tar.xz dir.$i/;echo "copied kernel tar to dir.$i and will start untarring kernel" >>untar.log ;tar -xvf dir.$i/linux-4.4.1.tar.xz -C dir.$i/;echo "linux untar done in dir.$i" >>untar.log;date >> untar.log;done;date

Mount3:
10.70.37.101:rhs-client30 --->"for i in {1..1000};do dd if=/dev/urandom of=file.$i bs=1024 count=10000;done"

Comment 7 Nag Pavan Chilakam 2016-02-17 06:20:31 UTC
@ Laura:
this is a separate issue for which Avra has mentioned the doc text field to be uupdated

Comment 11 Nag Pavan Chilakam 2016-03-01 13:23:48 UTC
I am removing the need info tag , kindly re-tag if the above discussion doesnt solve the purpose

Comment 12 Avra Sengupta 2016-03-03 08:40:05 UTC
We tried following the steps mentioned in the bug, where we created a volume, enabled uss on it, and mounted it via nfs on 3 mount points. Stared copying /etc dir from 2 mount points in loop, and untared linux  tarball from the other mount point. While this i/o was going on we tried attaching tier. Attach tier was successful, and there was neither any i/o hang nor any stale file handle.

We repeated the above 7 times, with the same outcome. It would be great if we can get some help reproducing the issue so that we can rca it.

Comment 14 Atin Mukherjee 2016-03-10 12:33:05 UTC
Nagpavan,

Can you please retest it and see if its reproducible?

~Atin

Comment 16 Nag Pavan Chilakam 2016-05-09 12:03:40 UTC
changed needinfo assignee to karthick as he works on tiering

Comment 17 krishnaram Karthick 2016-05-25 16:24:41 UTC
Have not seen this issue with the recent tests on tiering with nfs mount. I'll update the bug with the logs if the issue is seen in the future.

Comment 21 Shyamsundar 2018-02-06 17:34:40 UTC
Thank you for your bug report.

This bug has documentation updated on the problem reported.

Further we are no longer releasing any bug fixes or, other updates for Tier. This bug will be set to CLOSED WONTFIX to reflect this.