Bug 763067 (GLUSTER-1335)

Summary: compile bench fails with ESTALE on 4 replica gnfs mount
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: protocolAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.1-alphaCC: gluster-bugs, shehjart
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lakshmipathi G 2010-08-11 06:21:08 UTC
Running compilebench on glusterfs+gnfs,4 replicate  fails.
logs can be found under - 
ec2-75-101-204-250.compute-1.amazonaws.com:/mnt

create dir kernel-0 222MB in 4382.49 seconds (0.05 MB/s)
create dir kernel-1 222MB in 1275.13 seconds (0.17 MB/s)
patch dir kernel-1 109MB in 6319.23 seconds (0.02 MB/s)
compile dir kernel-1 691MB in 5160.77 seconds (0.13 MB/s)
compile dir kernel-0 680MB in 4519.97 seconds (0.15 MB/s)
patch dir kernel-0 691MB in 4193.45 seconds (0.16 MB/s)
read dir kernel-1 in 6327.46 0.15 MB/s
read dir kernel-1 in 4573.13 0.20 MB/s
create dir kernel-3116 222MB in 4559.12 seconds (0.05 MB/s)
clean kernel-1 691MB in 69.26 seconds (9.99 MB/s)
read dir kernel-3116 in 430.07 0.52 MB/s
stat dir kernel-3116 in 35.34 seconds
compile dir kernel-3116 680MB in 889.18 seconds (0.77 MB/s)
clean kernel-0 691MB in 99.08 seconds (6.98 MB/s)
clean kernel-3116 680MB in 74.09 seconds (9.19 MB/s)
patch dir kernel-3116 109MB in 367.12 seconds (0.30 MB/s)
stat dir kernel-0 in 71.47 seconds
create dir kernel-6231 222MB in 971.01 seconds (0.23 MB/s)
delete kernel-6231 in 417.51 seconds
compile dir kernel-0 691MB in 834.64 seconds (0.83 MB/s)
Traceback (most recent call last):
  File "./compilebench", line 631, in <module>
    total_runs += func(dset, rnd)
  File "./compilebench", line 451, in create_one_dir
    mbs = run_directory(dset.unpatched, dirname, "create dir")
  File "./compilebench", line 225, in run_directory
    fp = file(fname, 'a+')
IOError: [Errno 116] Stale NFS file handle: '/mnt/cb2/kernel-70151/include/video/sisfb.h'

Comment 1 Lakshmipathi G 2010-08-12 03:16:24 UTC
Running compilebench with 2 replicate setup.this it completed without any errors. 
this bug might be related to http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1234

Comment 2 Lakshmipathi G 2010-08-14 03:07:41 UTC
started compile bench with nfs-beta-rc10,4 replicate, two days back. It's still running. It didn't give any errors.

Comment 3 Lakshmipathi G 2010-08-14 04:21:20 UTC
Compile bench with nfs-beta-rc10 completed successfully.

run complete:
==========================================================================
intial create total runs 2 avg 0.17 MB/s (user 0.03s sys 0.07s)
create total runs 17 avg 0.07 MB/s (user 0.03s sys 0.09s)
patch total runs 10 avg 0.21 MB/s (user 0.02s sys 0.18s)
compile total runs 16 avg 0.70 MB/s (user 0.03s sys 0.45s)
clean total runs 9 avg 3.96 MB/s (user 0.00s sys 0.00s)
read tree total runs 7 avg 0.59 MB/s (user 0.05s sys 0.07s)
read compiled tree total runs 6 avg 2.03 MB/s (user 0.06s sys 0.36s)
delete tree total runs 5 avg 1933.63 seconds (user 0.04s sys 0.04s)
delete compiled tree total runs 13 avg 2417.66 seconds (user 0.04s sys 0.03s)
stat tree total runs 10 avg 45.59 seconds (user 0.06s sys 0.04s)
stat compiled tree total runs 7 avg 40.76 seconds (user 0.06s sys 0.04s)

Comment 4 Lakshmipathi G 2010-08-16 02:26:11 UTC
with 3.1.0qa5 ,compilebench fails with 4 replicate setup.
log file can be found under /share/tickets/1335

[root@ip-10-245-210-193 compilebench-0.6]# ./compilebench -D /mnt/gg -i 2
using working directory /mnt/gg, 2 intial dirs 100 runs
Traceback (most recent call last):
  File "./compilebench", line 567, in <module>
    dset = dataset(options.sources, rnd)
  File "./compilebench", line 319, in __init__
    self.unpatched = native_order(self.unpatched, "unpatched")
  File "./compilebench", line 97, in native_order
    run_directory(tmplist, dirname, "native %s" % tag)
  File "./compilebench", line 225, in run_directory
    fp = file(fname, 'a+')
IOError: [Errno 116] Stale NFS file handle: '/mnt/gg/native-0/arch/mips/momentum/ocelot_g/reset.c'

Comment 5 Shehjar Tikoo 2010-08-16 09:27:26 UTC
[2010-08-14 06:41:55.833456] D
[rpcsvc.c:1285:nfs_rpcsvc_program_actor] nfsrpc: Actor found: NFS3 -
LOOKUP
[2010-08-14 06:41:55.833474] D
[nfs3-helpers.c:2239:nfs3_log_fh_entry_call] nfs-nfsv3: XID: d02c0566,
LOOKUP: args: FH: hashcount 6, xlid 0, gen 550
5183148939416227, ino 194054708, name: reset.c
[2010-08-14 06:41:55.833492] T [nfs3.c:1040:nfs3_lookup] nfs-nfsv3: FH
to Volume: mirror
[2010-08-14 06:41:55.833507] T
[nfs3-helpers.c:2940:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard
resolution: ino: 194054708, gen: 5505183148939416
227, entry: reset.c, hashidx: 0
[2010-08-14 06:41:55.833529] T
[nfs3-helpers.c:2948:nfs3_fh_resolve_entry_hard] nfs-nfsv3: Entry
needs lookup: /gg/native-0/arch/mips/momentum/ocelo
t_g/reset.c

##########################################################
## NFs is able to resolve the path using the inode table##
##########################################################
[2010-08-14 06:41:55.833544] T [nfs-fops.c:279:nfs_fop_lookup] nfs:
Lookup: /gg/native-0/arch/mips/momentum/ocelot_g/reset.c
[2010-08-14 06:41:55.833597] T [rpc-clnt.c:1184:rpc_clnt_record] :
Auth Info: pid: 2891215424, uid: 0, gid: 0, owner: 46912524022608
[2010-08-14 06:41:55.833616] T
[rpc-clnt.c:1076:rpc_clnt_record_build_header] rpc-clnt: Request
fraglen 608, payload: 480, rpc hdr: 128
[2010-08-14 06:41:55.833659] T [rpc-clnt.c:1184:rpc_clnt_record] :
Auth Info: pid: 2891215424, uid: 0, gid: 0, owner: 46912524022608
[2010-08-14 06:41:55.833676] T
[rpc-clnt.c:1076:rpc_clnt_record_build_header] rpc-clnt: Request
fraglen 608, payload: 480, rpc hdr: 128
[2010-08-14 06:41:55.833715] T [rpc-clnt.c:1184:rpc_clnt_record] :
Auth Info: pid: 2891215424, uid: 0, gid: 0, owner: 46912524022608
[2010-08-14 06:41:55.833741] T
[rpc-clnt.c:1076:rpc_clnt_record_build_header] rpc-clnt: Request
fraglen 608, payload: 480, rpc hdr: 128

###############################################################
## The immediate reason for the ESTALE is the client not ######
## finding the right stuff in the inode ctx.             ######
###############################################################
[2010-08-14 06:41:55.833774] T
[client3_1-fops.c:2512:client3_1_lookup]
ec2-72-44-60-35.compute-1.amazonaws.com-1: LOOKUP 194054708/reset.c
(/gg/native-0/arch/mips/momentum/ocelot_g/reset.c): failed to get
remote inode number for parent
[2010-08-14 06:41:55.834164] T [rpc-clnt.c:616:rpc_clnt_reply_init]
rpc-clnt: RPC XID: 35330 Program: GlusterFS 3.1, ProgVers: 310, Proc:
27
[2010-08-14 06:41:55.834354] T [rpc-clnt.c:616:rpc_clnt_reply_init]
rpc-clnt: RPC XID: 25628 Program: GlusterFS 3.1, ProgVers: 310, Proc:
27
[2010-08-14 06:41:55.834392] T [rpc-clnt.c:616:rpc_clnt_reply_init]
rpc-clnt: RPC XID: 25664 Program: GlusterFS 3.1, ProgVers: 310, Proc:
27
[2010-08-14 06:41:55.834437] T
[nfs3-helpers.c:2552:nfs3_fh_resolve_entry_lookup_cbk] nfs-nfsv3:
Lookup failed: /gg/native-0/arch/mips/momentum/ocel
ot_g/reset.c: Stale NFS file handle
[2010-08-14 06:41:55.834456] D
[nfs3-helpers.c:2359:nfs3_log_common_res] nfs-nfsv3: XID: d02c0566,
LOOKUP: NFS: 70(Invalid file handle), POSIX: 14(Bad address)

Comment 6 Lakshmipathi G 2010-08-18 02:50:09 UTC
Running compilebench with 3.1.0qa6 passed with 4x3 dht+afr setup.

Comment 7 Amar Tumballi 2010-08-29 08:42:04 UTC
Hi Lakshmi,

Should I assume this bug is not valid anymore as it ran fine for you on 3.1.0qa6 ?? If its not valid, please resolve the bug.

-Amar

Comment 8 Lakshmipathi G 2010-08-30 03:09:13 UTC
(In reply to comment #7)
> Hi Lakshmi,
> 
> Should I assume this bug is not valid anymore as it ran fine for you on
> 3.1.0qa6 ?? If its not valid, please resolve the bug.
> 
> -Amar

Hi Amar,
I'll check with upcoming gnfs-glfs version with dvm integration for this issue and update/resolve the status/bug.

Comment 9 Amar Tumballi 2010-09-07 01:46:37 UTC
reducing the severity as 4 subvolumes for replicate is not so supported config.

Comment 10 Amar Tumballi 2010-10-04 02:10:14 UTC
4 replica is not supported in 3.1.0 version, hence taking it post 3.1.0

Comment 11 Shehjar Tikoo 2010-10-05 09:17:55 UTC
Re-assigning to myself to test and update the status of this and other replicate problems with nfs.

Comment 12 Shehjar Tikoo 2010-11-09 08:56:16 UTC
4 replica compilebench works on mainline. Closing.