Bug 762669 (GLUSTER-937)

Summary: compilebench fails on NFS+dist+repl+all perf xlators
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: replicateAssignee: Pavan Vilas Sondur <pavan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: nfs-alphaCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shehjar Tikoo 2010-05-21 03:18:34 UTC
In the third run, dmesg output also shows:
NFS: nfs_lookup_revalidate(net/phy) is invalid
NFS: dentry_delete(net/phy, 18)
NFS: lookup(net/phy)
NFS: dentry_delete(net/phy, 10)


Confirming that the problem occurs because the lookup on net/phy failed.

Comment 1 Shehjar Tikoo 2010-05-21 06:12:59 UTC
compilebench fails on one NFS client when there are two NFS clients running a cbench run in their own directories on the same export.


#################FIRST RUN##########################
#################FIRST RUN##########################
#################FIRST RUN##########################
The client that fails has the following error message.
[root@domU-12-31-39-10-66-21 compilebench-0.6]# ./compilebench -D /mnt/shehjart/mount/bench2/
using working directory /mnt/shehjart/mount/bench2/, 30 intial dirs 100 runs
Traceback (most recent call last):
  File "./compilebench", line 567, in <module>
    dset = dataset(options.sources, rnd)
  File "./compilebench", line 319, in __init__
    self.unpatched = native_order(self.unpatched, "unpatched")
  File "./compilebench", line 97, in native_order
    run_directory(tmplist, dirname, "native %s" % tag)
  File "./compilebench", line 225, in run_directory
    fp = file(fname, 'a+')
IOError: [Errno 526] Unknown error 526: '/mnt/shehjart/mount/bench2/native-0/drivers/pnp/pnpacpi/Kconfig'


Wireshark trace shows a NFS lookup reply with NFS3ERR_SERVERFAULT.


#################SECOND RUN##########################
#################SECOND RUN##########################
#################SECOND RUN##########################

[root@domU-12-31-39-10-66-21 compilebench-0.6]# ./compilebench -D /mnt/shehjart/mount/c2/
using working directory /mnt/shehjart/mount/c2/, 30 intial dirs 100 runs
Traceback (most recent call last):
  File "./compilebench", line 567, in <module>
    dset = dataset(options.sources, rnd)
  File "./compilebench", line 319, in __init__
    self.unpatched = native_order(self.unpatched, "unpatched")
  File "./compilebench", line 97, in native_order
    run_directory(tmplist, dirname, "native %s" % tag)
  File "./compilebench", line 225, in run_directory
    fp = file(fname, 'a+')
IOError: [Errno 526] Unknown error 526: '/mnt/shehjart/mount/c2/native-0/include/video/cyblafb.h'


Note: No error mesg in the nfs server log for cyblafb.h


#################THIRD RUN##########################
#################THIRD RUN##########################
#################THIRD RUN##########################
[root@domU-12-31-39-10-66-21 compilebench-0.6]# ./compilebench -D /mnt/shehjart/mount/c2/
using working directory /mnt/shehjart/mount/c2/, 30 intial dirs 100 runs
Traceback (most recent call last):
  File "./compilebench", line 567, in <module>
    dset = dataset(options.sources, rnd)
  File "./compilebench", line 319, in __init__
    self.unpatched = native_order(self.unpatched, "unpatched")
  File "./compilebench", line 97, in native_order
    run_directory(tmplist, dirname, "native %s" % tag)
  File "./compilebench", line 225, in run_directory
    fp = file(fname, 'a+')
IOError: [Errno 526] Unknown error 526: '/mnt/shehjart/mount/c2/native-0/drivers/net/phy/broadcom.c'


NFS server log contains:

[2010-05-21 01:40:36] D [nfs3-helpers.c:2069:nfs3_log_fh_entry_call] nfs-nfsv3: XID: 3d12dccd, LOOKUP: args: FH: hashcount 4, xlid 1, gen 5473591126151662475, ino 15402732, name: phy


###############################################################
[2010-05-21 01:40:36] D [dht-layout.c:653:dht_layout_dir_mismatch] dr: /c2/native-0/drivers/net/phy - disk layout missing
[2010-05-21 01:40:36] D [dht-common.c:278:dht_revalidate_cbk] dr: mismatching layouts for /c2/native-0/drivers/net/phy
################################################################

[2010-05-21 01:40:36] D [nfs3-helpers.c:2242:nfs3_log_newfh_res] nfs-nfsv3: XID: 3d12dccd, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 116(Stale NFS file handle), FH: hashcount 0, xlid 0, gen 0, ino 0

[2010-05-21 01:40:36] D [rpcsvc.c:1817:rpcsvc_request_create] rpc-service: RPC XID: 3e12dccd, Ver: 2, Program: 100003, ProgVers: 3, Proc: 3

[2010-05-21 01:40:36] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - LOOKUP

[2010-05-21 01:40:36] D [nfs3-helpers.c:2069:nfs3_log_fh_entry_call] nfs-nfsv3: XID: 3e12dccd, LOOKUP: args: FH: hashcount 4, xlid 1, gen 5473591126151662475, ino 15402732, name: phy

###############################################################
[2010-05-21 01:40:36] D [dht-layout.c:653:dht_layout_dir_mismatch] dr: /c2/native-0/drivers/net/phy - disk layout missing

[2010-05-21 01:40:36] D [dht-common.c:278:dht_revalidate_cbk] dr: mismatching layouts for /c2/native-0/drivers/net/phy
#################################################################

[2010-05-21 01:40:36] D [nfs3-helpers.c:2242:nfs3_log_newfh_res] nfs-nfsv3: XID: 3e12dccd, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 116(Stale NFS file handle), FH: hashcount 0, xlid 0, gen 0, ino 0


The NFS3ERR_SERVERFAULT is returned due to an error returned by distribute.

Comment 2 Shehjar Tikoo 2010-05-22 03:46:56 UTC
Happens on a single client also but not on every run. Whenever it does happen, the problem is because of dht layout mismatch errors.

Comment 3 Shehjar Tikoo 2010-06-25 04:31:08 UTC
Latest news on this.

I've not been able to reproduce this over:
-distribute with no perf translators
-distribute with all perf translators

Consistently occurs with:
dist-repl with all performance translators and only when running two clients each with a compilebench load.

The log suggests the following:
#########################################
##### MKDIR is received for lib-64 ######
#########################################
[2010-06-25 02:02:08] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - MKDIR
[2010-06-25 02:02:08] D [nfs3-helpers.c:2183:nfs3_log_fh_entry_call] nfs-nfsv3: XID: 46e2ee2a, MKDIR: args: FH: hashcount 4, xlid 0, gen 5486586052216884168, ino 15402804, name: lib-
64
[2010-06-25 02:02:08] T [nfs3.c:2427:nfs3_mkdir] nfs-nfsv3: FH to Volume: dr
[2010-06-25 02:02:08] T [nfs3-helpers.c:2884:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 15402804, gen: 5486586052216884168, entry: lib-64, hashidx: 0
[2010-06-25 02:02:08] T [nfs3-helpers.c:2892:nfs3_fh_resolve_entry_hard] nfs-nfsv3: Entry needs lookup: /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] T [nfs-fops.c:279:nfs_fop_lookup] nfs: Lookup: /c1/native-0/arch/mips/lib-64


#########################################
##### The lookup above failsfor lib-64 ##
#########################################
[2010-06-25 02:02:08] T [server-protocol.c:2341:server_lookup_cbk] localsrv: 55350: LOOKUP /c1/native-0/arch/mips/lib-64 (0) ==> -1 (No such file or directory)
[2010-06-25 02:02:08] T [access-control.c:205:ac_test_access] access-control: Root has access
[2010-06-25 02:02:08] T [nfs3-helpers.c:2496:nfs3_fh_resolve_entry_lookup_cbk] nfs-nfsv3: Lookup failed: /c1/native-0/arch/mips/lib-64: No such file or directory
[2010-06-25 02:02:08] T [nfs.c:407:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-06-25 02:02:08] T [nfs-fops.c:697:nfs_fop_mkdir] nfs: Mkdir: /c1/native-0/arch/mips/lib-64


#################################################
##### DHT does its thing for dire creation ######
#################################################
[2010-06-25 02:02:08] T [dht-layout.c:352:dht_layout_merge] dr: missing disk layout on repl2. err = -1
.....
.....
.....
[2010-06-25 02:02:08] T [nfs.c:407:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-06-25 02:02:08] T [dht-layout.c:352:dht_layout_merge] dr: missing disk layout on repl1. err = -1
[2010-06-25 02:02:08] T [dht-selfheal.c:354:dht_selfheal_layout_new_directory] dr: gave fix: 0 - 2147483646 on repl1 for /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] T [dht-selfheal.c:354:dht_selfheal_layout_new_directory] dr: gave fix: 2147483647 - 4294967293 on repl2 for /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] T [dht-selfheal.c:174:dht_selfheal_dir_xattr] dr: 2 subvolumes missing xattr for /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] T [dht-selfheal.c:124:dht_selfheal_dir_xattr_persubvol] dr: setting hash range 0 - 2147483646 (type 0) on subvolume repl1 for /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] T [dht-selfheal.c:124:dht_selfheal_dir_xattr_persubvol] dr: setting hash range 2147483647 - 4294967295 (type 0) on subvolume repl2 for /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] D [nfs3-helpers.c:2356:nfs3_log_newfh_res] nfs-nfsv3: XID: 

#################################################
##### NFS mkdir finishes                   ######
#################################################
46e2ee2a, MKDIR: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), FH: hashcount 5, xlid 0, gen 5486586052216894264, ino 15536128


#################################################
##### NFS mkdir is soon followed by a lookup ####
##### on the created directory               ####
#################################################
[2010-06-25 02:02:08] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - LOOKUP
[2010-06-25 02:02:08] D [nfs3-helpers.c:2183:nfs3_log_fh_entry_call] nfs-nfsv3: XID: 49e2ee2a, LOOKUP: args: FH: hashcount 4, xlid 0, gen 5486586052216884168, ino 15402804, name: lib-64
[2010-06-25 02:02:08] T [nfs3.c:1038:nfs3_lookup] nfs-nfsv3: FH to Volume: dr
[2010-06-25 02:02:08] T [nfs3-helpers.c:2884:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 15402804, gen: 5486586052216884168, entry: lib-64, hashidx: 0
[2010-06-25 02:02:08] T [nfs.c:407:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-06-25 02:02:08] T [nfs-fops.c:279:nfs_fop_lookup] nfs: Lookup: /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] D [dht-layout.c:653:dht_layout_dir_mismatch] dr: /c1/native-0/arch/mips/lib-64 - disk layout missing
[2010-06-25 02:02:08] D [dht-common.c:278:dht_revalidate_cbk] dr: mismatching layouts for /c1/native-0/arch/mips/lib-64
[2010-06-25 02:02:08] D [nfs3-helpers.c:2356:nfs3_log_newfh_res] nfs-nfsv3: XID: 49e2ee2a, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 116(Stale NFS file handle), FH: hashcount 0, xlid 0, gen 0, ino 0

Comment 4 Shehjar Tikoo 2010-06-25 04:59:35 UTC
Going back to the start-up phase when the working directories are created i.e. directories c1 and c2 for the two clients, distribute reports the following when creating the directory.





###################################################
##### MKdir for the c1 directory ##################
###################################################
[2010-06-25 01:56:47] D [nfs3-helpers.c:2183:nfs3_log_fh_entry_call] nfs-nfsv3: XID: 7c4aee2a, MKDIR: args: FH: hashcount 0, xlid 0, gen 0, ino 1, name: c1
[2010-06-25 01:56:47] T [nfs3.c:2427:nfs3_mkdir] nfs-nfsv3: FH to Volume: dr
[2010-06-25 01:56:47] T [nfs3-helpers.c:2884:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 1, gen: 0, entry: c1, hashidx: 0
[2010-06-25 01:56:47] T [nfs3-helpers.c:2892:nfs3_fh_resolve_entry_hard] nfs-nfsv3: Entry needs lookup: /c1
[2010-06-25 01:56:47] T [nfs-fops.c:279:nfs_fop_lookup] nfs: Lookup: /c1
[2010-06-25 01:56:47] T [server-protocol.c:2341:server_lookup_cbk] localsrv: 16: LOOKUP /c1 (0) ==> -1 (No such file or directory)
[2010-06-25 01:56:47] T [nfs3-helpers.c:2496:nfs3_fh_resolve_entry_lookup_cbk] nfs-nfsv3: Lookup failed: /c1: No such file or directory
[2010-06-25 01:56:47] T [nfs.c:407:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-06-25 01:56:47] T [nfs-fops.c:697:nfs_fop_mkdir] nfs: Mkdir: /c1
[2010-06-25 01:56:47] D [dht-diskusage.c:71:dht_du_info_cbk] dr: on subvolume 'repl2': avail_percent is: 51.00 and avail_space is: 228606398464
[2010-06-25 01:56:47] D [dht-diskusage.c:71:dht_du_info_cbk] dr: on subvolume 'repl1': avail_percent is: 99.00 and avail_space is: 443556425728
[2010-06-25 01:56:47] T [access-control.c:205:ac_test_access] access-control: Root has access
[2010-06-25 01:56:47] T [dht-layout.c:352:dht_layout_merge] dr: missing disk layout on repl2. err = -1
[2010-06-25 01:56:47] T [dht-layout.c:352:dht_layout_merge] dr: missing disk layout on repl1. err = -1
[2010-06-25 01:56:47] T [dht-selfheal.c:354:dht_selfheal_layout_new_directory] dr: gave fix: 0 - 2147483646 on repl1 for /c1
[2010-06-25 01:56:47] T [dht-selfheal.c:354:dht_selfheal_layout_new_directory] dr: gave fix: 2147483647 - 4294967293 on repl2 for /c1


##########################################################
##### MKdir for the c1 directory reports missing #########
##### xattrs even after giving fixes             #########
##########################################################
[2010-06-25 01:56:47] T [dht-selfheal.c:174:dht_selfheal_dir_xattr] dr: 2 subvolumes missing xattr for /c1
[2010-06-25 01:56:47] T [dht-selfheal.c:124:dht_selfheal_dir_xattr_persubvol] dr: setting hash range 0 - 2147483646 (type 0) on subvolume repl1 for /c1
[2010-06-25 01:56:47] T [dht-selfheal.c:124:dht_selfheal_dir_xattr_persubvol] dr: setting hash range 2147483647 - 4294967295 (type 0) on subvolume repl2 for /c1

###################################################
##### MKdir for the c1 directory succeeds #########
###################################################

[2010-06-25 01:56:47] D [nfs3-helpers.c:2356:nfs3_log_newfh_res] nfs-nfsv3: XID: 7c4aee2a, MKDIR: NFS: 0(Call completed successfully.), POSIX: 117(Structure needs cleaning), FH: hashcount 1, xlid 0, gen 5486586052216881154, ino 15335436

Comment 5 Shehjar Tikoo 2010-07-04 06:10:38 UTC
Following command is the smallest compilebench load that triggers the bug:

compilebench -D <working-dir> -r 100 -i 5

Comment 6 Shehjar Tikoo 2010-07-21 10:58:47 UTC
Further investigation reveals the code path where it fails:


###############################################################
The point where ESTALE is handed to NFS
###############################################################

dht-common.c:313:
                if (local->layout_mismatch) {
                        local->op_ret = -1;
                        local->op_errno = ESTALE;
                }


###############################################################
Dht itself decides to return ESTALE by setting layout_mismatch =1
###############################################################

dht-common.c:271:
                if (is_dir) {
                        ret = dht_layout_dir_mismatch (this, layout,
                                                       prev->this, &local->loc,
                                                       xattr);
                        if (ret != 0) {
                                gf_log (this->name, GF_LOG_DEBUG,
                                        "mismatching layouts for %s",
                                        local->loc.path);

                                local->layout_mismatch = 1;

                                goto unlock;
                        }
                }


###############################################################
That in turn happens because dht cannot find the required xattr
in the dictionary returned by lookup.
###############################################################

In dht_layout_dir_mismatch:
 dict_ret = dict_get_ptr (xattr, "trusted.glusterfs.dht",
                                 &disk_layout_raw);

        if (dict_ret < 0) {
                if (err == 0) {
                        gf_log (this->name, GF_LOG_DEBUG,
                                "%s - disk layout missing", loc->path);
                        ret = -1;
                }
                goto out;
        }

Comment 7 Shehjar Tikoo 2010-07-21 11:03:04 UTC

Setting up gdb to intercept glusterfs process at the point where it'll return an ESTALE:




Breakpoint 1, dht_revalidate_cbk (frame=0x10b5d70, cookie=0x10b5b00, this=0x6147b0, op_ret=<value optimized out>, op_errno=0, inode=<value optimized out>, stbuf=0x10b7ed0,
    xattr=0xfe19d0, postparent=0x10b7fa0) at dht-common.c:280
280                                     local->layout_mismatch = 1;
(gdb) p op_ret
$1 = <value optimized out>
(gdb) p xattr
$2 = (dict_t *) 0xfe19d0
(gdb) p *xattr
$3 = {is_static = 0 '\0', hash_size = 1, count = 3, refcount = 3, members = 0x10bb2e0, members_list = 0x10ba1b0, extra_free = 0x10ba130 "", lock = 1}
(gdb) p *xattr->members
$4 = (data_pair_t *) 0x10ba1b0

#############################################################
The required xattr is not present in the dict
#############################################################


(gdb) p **xattr->members
$5 = {hash_next = 0x10bb2b0, prev = 0x0, next = 0x10bb2b0, value = 0x10bb330, key = 0x10bb3a0 "glusterfs.open-fd-count"}
(gdb) p *xattr->members_list
$6 = {hash_next = 0x10bb2b0, prev = 0x0, next = 0x10bb2b0, value = 0x10bb330, key = 0x10bb3a0 "glusterfs.open-fd-count"}
(gdb) p *xattr->members_list->next
$7 = {hash_next = 0x10baf60, prev = 0x10ba1b0, next = 0x10baf60, value = 0x10b99d0, key = 0x10bb360 "glusterfs.entrylk-count"}
(gdb) p *xattr->members_list->next->next
$8 = {hash_next = 0x0, prev = 0x10bb2b0, next = 0x0, value = 0x10baf30, key = 0x10b99b0 "glusterfs.inodelk-count"}
(gdb) p *xattr->members_list->next->next->next
Cannot access memory at address 0x0
(gdb) p *xattr->members_list->prev
Cannot access memory at address 0x0
(gdb) bt
#0  dht_revalidate_cbk (frame=0x10b5d70, cookie=0x10b5b00, this=0x6147b0, op_ret=<value optimized out>, op_errno=0, inode=<value optimized out>, stbuf=0x10b7ed0, xattr=0xfe19d0,
    postparent=0x10b7fa0) at dht-common.c:280
#1  0x00002aaaac12badb in afr_lookup_done (frame=0x10b5b00, this=<value optimized out>, lookup_buf=<value optimized out>) at afr.c:660
#2  0x00002aaaac12c77c in afr_revalidate_lookup_cbk (frame=0x10b5b00, cookie=0x0, this=0x612ba0, op_ret=0, op_errno=17517104, inode=0x10b4b90, buf=0x7fff9383f060, xattr=0xfe19d0,
    postparent=0x7fff9383eff0) at afr.c:936
#3  0x00002aaaabf11c00 in client_lookup_cbk (frame=0xfde3a0, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4757
#4  0x00002aaaabefd22a in protocol_client_pollin (this=0x60f080, trans=0x6217b0) at client-protocol.c:6435
#5  0x00002aaaabf0bba2 in notify (this=0x0, event=2, data=0x6217b0) at client-protocol.c:6554
#6  0x00002aaaaacdad13 in xlator_notify (xl=0x60f080, event=2, data=0x6217b0) at xlator.c:919
#7  0x00002aaaad44f0e3 in socket_event_handler (fd=<value optimized out>, idx=2, data=0x6217b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:831
#8  0x00002aaaaacf5dc5 in event_dispatch_epoll (event_pool=0x609360) at event.c:804
#9  0x0000000000404397 in main (argc=8, argv=0x7fff9383ffb8) at glusterfsd.c:1494
(gdb) fr 2
#2  0x00002aaaac12c77c in afr_revalidate_lookup_cbk (frame=0x10b5b00, cookie=0x0, this=0x612ba0, op_ret=0, op_errno=17517104, inode=0x10b4b90, buf=0x7fff9383f060, xattr=0xfe19d0,
    postparent=0x7fff9383eff0) at afr.c:936
936                     afr_lookup_done (frame, this, lookup_buf);

#############################################################
Not available at AFR level either.
#############################################################

(gdb) p *xattr
$9 = {is_static = 0 '\0', hash_size = 1, count = 3, refcount = 3, members = 0x10bb2e0, members_list = 0x10ba1b0, extra_free = 0x10ba130 "", lock = 1}
(gdb) p *xattr->members_list
$10 = {hash_next = 0x10bb2b0, prev = 0x0, next = 0x10bb2b0, value = 0x10bb330, key = 0x10bb3a0 "glusterfs.open-fd-count"}
(gdb) p *xattr->members_list->nexyt
There is no member named nexyt.
(gdb) p *xattr->members_list->next
$11 = {hash_next = 0x10baf60, prev = 0x10ba1b0, next = 0x10baf60, value = 0x10b99d0, key = 0x10bb360 "glusterfs.entrylk-count"}
(gdb) p *xattr->members_list->next->next
$12 = {hash_next = 0x0, prev = 0x10bb2b0, next = 0x0, value = 0x10baf30, key = 0x10b99b0 "glusterfs.inodelk-count"}
(gdb) p *xattr->members_list->next->next->next
Cannot access memory at address 0x0
(gdb) bt
#0  dht_revalidate_cbk (frame=0x10b5d70, cookie=0x10b5b00, this=0x6147b0, op_ret=<value optimized out>, op_errno=0, inode=<value optimized out>, stbuf=0x10b7ed0, xattr=0xfe19d0,
    postparent=0x10b7fa0) at dht-common.c:280
#1  0x00002aaaac12badb in afr_lookup_done (frame=0x10b5b00, this=<value optimized out>, lookup_buf=<value optimized out>) at afr.c:660
#2  0x00002aaaac12c77c in afr_revalidate_lookup_cbk (frame=0x10b5b00, cookie=0x0, this=0x612ba0, op_ret=0, op_errno=17517104, inode=0x10b4b90, buf=0x7fff9383f060, xattr=0xfe19d0,
    postparent=0x7fff9383eff0) at afr.c:936
#3  0x00002aaaabf11c00 in client_lookup_cbk (frame=0xfde3a0, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4757
#4  0x00002aaaabefd22a in protocol_client_pollin (this=0x60f080, trans=0x6217b0) at client-protocol.c:6435
#5  0x00002aaaabf0bba2 in notify (this=0x0, event=2, data=0x6217b0) at client-protocol.c:6554
#6  0x00002aaaaacdad13 in xlator_notify (xl=0x60f080, event=2, data=0x6217b0) at xlator.c:919
#7  0x00002aaaad44f0e3 in socket_event_handler (fd=<value optimized out>, idx=2, data=0x6217b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:831
#8  0x00002aaaaacf5dc5 in event_dispatch_epoll (event_pool=0x609360) at event.c:804
#9  0x0000000000404397 in main (argc=8, argv=0x7fff9383ffb8) at glusterfsd.c:1494
(gdb) fr 3
#3  0x00002aaaabf11c00 in client_lookup_cbk (frame=0xfde3a0, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4757
4757            STACK_UNWIND (frame, op_ret, op_errno, inode, &stbuf, xattr,
(gdb) p *xattr
$13 = {is_static = 0 '\0', hash_size = 1, count = 3, refcount = 3, members = 0x10bb2e0, members_list = 0x10ba1b0, extra_free = 0x10ba130 "", lock = 1}

#############################################################
Not available at client either.
Definitely not being sent by storage/posix.
#############################################################

(gdb) p *xattr->members_list
$14 = {hash_next = 0x10bb2b0, prev = 0x0, next = 0x10bb2b0, value = 0x10bb330, key = 0x10bb3a0 "glusterfs.open-fd-count"}
(gdb) p *xattr->members_list->next
$15 = {hash_next = 0x10baf60, prev = 0x10ba1b0, next = 0x10baf60, value = 0x10b99d0, key = 0x10bb360 "glusterfs.entrylk-count"}
(gdb) p *xattr->members_list->next->next
$16 = {hash_next = 0x0, prev = 0x10bb2b0, next = 0x0, value = 0x10baf30, key = 0x10b99b0 "glusterfs.inodelk-count"}
(gdb) p *xattr->members_list->next->next->next
Cannot access memory at address 0x0
(gdb) bt
#0  dht_revalidate_cbk (frame=0x10b5d70, cookie=0x10b5b00, this=0x6147b0, op_ret=<value optimized out>, op_errno=0, inode=<value optimized out>, stbuf=0x10b7ed0, xattr=0xfe19d0,
    postparent=0x10b7fa0) at dht-common.c:280
#1  0x00002aaaac12badb in afr_lookup_done (frame=0x10b5b00, this=<value optimized out>, lookup_buf=<value optimized out>) at afr.c:660
#2  0x00002aaaac12c77c in afr_revalidate_lookup_cbk (frame=0x10b5b00, cookie=0x0, this=0x612ba0, op_ret=0, op_errno=17517104, inode=0x10b4b90, buf=0x7fff9383f060, xattr=0xfe19d0,
    postparent=0x7fff9383eff0) at afr.c:936
#3  0x00002aaaabf11c00 in client_lookup_cbk (frame=0xfde3a0, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4757
#4  0x00002aaaabefd22a in protocol_client_pollin (this=0x60f080, trans=0x6217b0) at client-protocol.c:6435
#5  0x00002aaaabf0bba2 in notify (this=0x0, event=2, data=0x6217b0) at client-protocol.c:6554
#6  0x00002aaaaacdad13 in xlator_notify (xl=0x60f080, event=2, data=0x6217b0) at xlator.c:919
#7  0x00002aaaad44f0e3 in socket_event_handler (fd=<value optimized out>, idx=2, data=0x6217b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:831
#8  0x00002aaaaacf5dc5 in event_dispatch_epoll (event_pool=0x609360) at event.c:804
#9  0x0000000000404397 in main (argc=8, argv=0x7fff9383ffb8) at glusterfsd.c:1494


#############################################################
Get the path for the directory being looked up.
We'll next check for the xattr on the bricks for confirm.
#############################################################

(gdb) p local->loc
$17 = {path = 0xa8c450 "/c1/native-0/drivers/scsi/ibmvscsi", name = 0xa8c46a "ibmvscsi", ino = 15348260, inode = 0x10b4b90, parent = 0xdaa2e0}
(gdb) p local->loc2
$18 = {path = 0x0, name = 0x0, ino = 0, inode = 0x0, parent = 0x0}
(gdb) p this->name
$19 = 0x60f060 "client1"


#############################################################
Brick1 has it
#############################################################
[root@domU-12-31-39-0E-8E-31 testdir1]# getfattr -d -m ".*" c1/native-0/drivers/scsi/ibmvscsi
# file: c1/native-0/drivers/scsi/ibmvscsi
trusted.glusterfs.dht=0sAAAAAQAAAAB//////////w==
trusted.posix1.gen=0sTEbLiwAADIg=



#############################################################
So does brick2
#############################################################
[root@domU-12-31-39-0B-E0-82 testdir1]# getfattr -d -m ".*" c1/native-0/drivers/scsi/ibmvscsi
# file: c1/native-0/drivers/scsi/ibmvscsi
trusted.glusterfs.dht=0sAAAAAQAAAAB//////////w==
trusted.posix1.gen=0sTEbLjwAADIg=

#############################################################
So does brick3
#############################################################

[root@domU-12-31-39-0E-85-62 testdir1]# getfattr -d -m ".*" c1/native-0/drivers/scsi/ibmvscsi
# file: c1/native-0/drivers/scsi/ibmvscsi
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAf////g==
trusted.posix1.gen=0sTEbLkgAADIg=



#############################################################
So does brick4
#############################################################
[root@domU-12-31-39-10-6D-E2 testdir1]# getfattr -d -m ".*" c1/native-0/drivers/scsi/ibmvscsi
# file: c1/native-0/drivers/scsi/ibmvscsi
trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAf////g==
trusted.posix1.gen=0sTEbmcQAADIg=

Comment 8 Shehjar Tikoo 2010-07-22 04:19:47 UTC
Avati's solution is to change wait-count in replicate to child-count so that every transaction waits for all children to reply before replicate unwinds to NFS.

i.e.

Change:
xlators/cluster/afr/src/afr.c:2967
        }

        priv->wait_count = 1;

To:
        priv->wait_count = child_count;

This change does prevent the ESTALE seen above with compilebench.

Comment 9 Shehjar Tikoo 2010-07-22 04:21:53 UTC
(In reply to comment #8)
> Avati's solution is to change wait-count in replicate to child-count so that
> every transaction waits for all children to reply before replicate unwinds to
> NFS.
> 
> i.e.
> 
> Change:
> xlators/cluster/afr/src/afr.c:2967
>         }
> 
>         priv->wait_count = 1;
> 
> To:
>         priv->wait_count = child_count;
> 
> This change does prevent the ESTALE seen above with compilebench.

The change prevents a race condition that occurs because nfs sends a lookup on the created directory even before all the setxattrs callbacks have been received by replicate from the children.

Comment 10 Shehjar Tikoo 2010-07-27 09:59:13 UTC
Regression test at:
http://test.gluster.com/show_bug.cgi?id=89

Comment 11 Anand Avati 2010-07-28 06:24:52 UTC
PATCH: http://patches.gluster.com/patch/3922 in master (replicate: Set setxattr wait-count to child-count to avoid race)