Bug 1473150 - features/shard:Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]
features/shard:Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe...
Product: GlusterFS
Classification: Community
Component: sharding (Show other bugs)
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Krutika Dhananjay
: Triaged
Depends On:
  Show dependency treegraph
Reported: 2017-07-20 02:44 EDT by Jianwei Zhang
Modified: 2018-01-18 00:27 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jianwei Zhang 2017-07-20 02:44:41 EDT
Description of problem:
  Using postmark test the write-behind+shard+dht+ec(2*(2+1)) cluster storage, I found more ERROR messages in the /var/log/mnt-test-.log,
such as "[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]", the [No data available] errno=61.
  This errno(61) is not handled in shard_writev(), it lead to shard_writev() failed and callback up layer xlator failed (op_ret=-1, op_errno=61).
  Usually, shard_writev() STACK_UNWIND(op_ret=-1, op_errno=61) failed, the file should be ERROR when md5sum, but it is right!! I see the codes wb_fulfill_cbk(), it add the wb_request to TODO list and retry writev again, So the file md5sum is right, not error.
  ===>Close the write-behind, I can't reproduce.<====

  The errno(61) generates by the [storage/posix]posix_lookup(), because the buf.ia_gfid is NULL and set errno=ENODATA.
        op_ret = -1;
        if (gf_uuid_is_null (loc->pargfid) || (loc->name == NULL)) { // not run
                /* nameless lookup */
                MAKE_INODE_HANDLE (real_path, this, loc, &buf);
        } else {  // run here
                MAKE_ENTRY_HANDLE (real_path, par_path, this, loc, &buf); //<1> get buf failed, op_ret=-1

                if (gf_uuid_is_null (loc->inode->gfid)) { // local->inode->gfid is NULL
                        posix_gfid_heal (this, real_path, loc, xdata); //<2> run here 
                              if (!xattr_req)
                                    goto out;

                              if (sys_lstat (path, &stat) != 0)
                                    goto out; //<3> here return ret=-1 and errno=2(ENOENT), so GOTO OUT, it indicate the path file is not exists

                              ret = sys_lgetxattr (path, GFID_XATTR_KEY, uuid_curr, 16);
                              if (ret != 16) {
                                        if (is_fresh_file (&stat)) {
                                                 ret = -1;
                                                 errno = ENOENT;
                                                 goto out;

                             ret = posix_gfid_set (this, path, loc, xattr_req);
                               return ret;

                        MAKE_ENTRY_HANDLE (real_path, par_path, this, //<4> BUT run here, get buf success, op_ret=0, but it's buf->ia_gfid is NULL, it indicate the file is exist but not set GFID.
                                           loc, &buf);

        if (!op_ret && !gfidless && gf_uuid_is_null (buf.ia_gfid)) { // <5> run here, op_ret=0, gfidless==0, buf.ia_gfid==NULL
                gf_msg (this->name, GF_LOG_ERROR, ENODATA, P_MSG_NULL_GFID,
                        "buf->ia_gfid is null for "
                        "%s", (real_path) ? real_path: "");
                op_ret = -1;
                op_errno = ENODATA; //<6> run here, op_errno=ENODATA


Version-Release number of selected component (if applicable):
  git url: https://review.gluster.org/glusterfs
  branch: master (commit 11eb8ba870457337c6067284dde4277e09764c0a)

GlusterFS configure summary
FUSE client          : yes
Infiniband verbs     : yes
epoll IO multiplex   : yes
argp-standalone      : no
fusermount           : yes
readline             : yes
georeplication       : yes
Linux-AIO            : yes
Enable Debug         : yes
Block Device xlator  : yes
glupy                : yes
Use syslog           : yes
XML output           : yes
Encryption xlator    : yes
Unit Tests           : no
Track priv ports     : yes
POSIX ACLs           : yes
Data Classification  : yes
SELinux features     : yes
firewalld-config     : no
Experimental xlators : yes
Events               : yes
EC dynamic support   : x64 sse avx
Use memory pools     : yes
Nanosecond m/atimes  : yes
Legacy gNFS server   : no

How reproducible:
write-behind + shard + dht + ec
Three CentOS7.3 nodes and 2 disks on every node.

Steps to Reproduce:
1.create a cluster storage service write-behind + shard + dht + ec (2 * (2+1)) 
Volume Name: test
Type: Distributed-Disperse
Volume ID: a110d762-6ce0-4309-b9b2-29c4b9d3dd84
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Brick1: node-1.test:/mnt/disk1
Brick2: node-2.test:/mnt/disk3
Brick3: node-3.test:/mnt/disk5
Brick4: node-1.test:/mnt/disk2
Brick5: node-2.test:/mnt/disk4
Brick6: node-3.test:/mnt/disk6
Options Reconfigured:
performance.nfs.write-behind: disable
network.ping-timeout: 20
performance.least-prio-threads: 64
performance.high-prio-threads: 64
cluster.readdir-optimize: on
performance.io-thread-count: 64
features.lock-heal: on
nfs.rpc-auth-reject: *.*.*.*
server.allow-insecure: on
cluster.heal-timeout: 60
performance.stat-prefetch: off
performance.low-prio-threads: 64
features.shard: on
transport.address-family: inet
nfs.disable: on

mount.gluster /mnt/test

2. Terminal-1# ./postmark conf
set size 104800000 104800000
set number 66
set location /mnt/test
set read 65536
set write 65536
set bias create -1
set bias read 0
set transactions 1
run result.txt
3.Terminal-2# cd /var/log/glusterfs after postmark over.
grep ' E ' /var/log/glusterfs/mnt-test-.log
[2017-07-20 05:53:06.944290] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]
[2017-07-20 05:53:25.095401] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 7 failed. Base file gfid = 1cbf5af2-b8a8-4327-a450-7335b7f84c52 [No data available]
[2017-07-20 05:53:38.661017] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 4 failed. Base file gfid = 7cf1712c-0576-428f-a5ce-052eb5b6eddb [No data available]
[2017-07-20 05:53:43.018596] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 20 failed. Base file gfid = acf750ff-c737-4806-a5e6-d2538579b984 [No data available]
[2017-07-20 05:54:21.841236] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 21 failed. Base file gfid = 67e415ea-ba27-4a2c-89d0-c5330a624d9d [No data available]
[2017-07-20 05:54:32.529585] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 4 failed. Base file gfid = f437eda1-ddd0-409b-b6d1-32201f478cb0 [No data available]
[2017-07-20 05:54:33.727440] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 22 failed. Base file gfid = f437eda1-ddd0-409b-b6d1-32201f478cb0 [No data available]
[2017-07-20 05:54:43.243084] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 15 failed. Base file gfid = 4f56cfac-9062-4c37-8c30-e660b807b6ad [No data available]

Actual results:
shard_common_lookup_shards_cbk() recieve the op_ret=-1, op_errno=ENODATA;

Expected results:
shard_common_lookup_shards_cbk() should handle the op_errno=ENODATA or posix_lookup() should handle the ENODATA. After all ENODATA not indicate file not exist and shard can be to writev().

Additional info:
Comment 1 Jianwei Zhang 2017-07-20 02:56:48 EDT
# cd <glusterfs-DIR>
./autogen.sh && 
./configure --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-systemtap=no --enable-bd-xlator --enable-debug && make -j8 && make install
Comment 2 Krutika Dhananjay 2017-07-23 04:09:06 EDT

Thanks again for the bug report. It is interesting to note that in your case you are seeing this with distributed-dispersed-sharded volume.
I'm able to recreate the issue on my colleague's setup with plain distribute volume. I am analysing the tcpdump output to know what went wrong. I will get back to you in a couple of days with the RC.

Comment 3 Krutika Dhananjay 2017-07-24 05:15:08 EDT

I've sent the patch https://review.gluster.org/#/c/17821 to fix this issue.
I even tested it on plain distribute-sharded volume (where we were seeing this issue before) and it worked fine.

Do you mind testing it out and letting us know if it fixes your problem, since yours is ec+dist+shard?

Comment 4 Krutika Dhananjay 2018-01-18 00:27:34 EST

Note You need to log in before you can comment on or make changes to this bug.