Bug 1473150

Summary:	features/shard:Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]
Product:	[Community] GlusterFS	Reporter:	Jianwei Zhang <jianwei1216>
Component:	sharding	Assignee:	Krutika Dhananjay <kdhananj>
Status:	CLOSED CURRENTRELEASE	QA Contact:	bugs <bugs>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, kdhananj
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-4.1.3 (or higher)	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-29 03:18:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jianwei Zhang 2017-07-20 06:44:41 UTC

Description of problem:
  Using postmark test the write-behind+shard+dht+ec(2*(2+1)) cluster storage, I found more ERROR messages in the /var/log/mnt-test-.log,
such as "[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]", the [No data available] errno=61.
  This errno(61) is not handled in shard_writev(), it lead to shard_writev() failed and callback up layer xlator failed (op_ret=-1, op_errno=61).
  Usually, shard_writev() STACK_UNWIND(op_ret=-1, op_errno=61) failed, the file should be ERROR when md5sum, but it is right!! I see the codes wb_fulfill_cbk(), it add the wb_request to TODO list and retry writev again, So the file md5sum is right, not error.
  ===>Close the write-behind, I can't reproduce.<====

  The errno(61) generates by the [storage/posix]posix_lookup(), because the buf.ia_gfid is NULL and set errno=ENODATA.
 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 posix_lookup()
 {
        op_ret = -1;
        if (gf_uuid_is_null (loc->pargfid) || (loc->name == NULL)) { // not run
                /* nameless lookup */
                MAKE_INODE_HANDLE (real_path, this, loc, &buf);
        } else {  // run here
                MAKE_ENTRY_HANDLE (real_path, par_path, this, loc, &buf); //<1> get buf failed, op_ret=-1

                if (gf_uuid_is_null (loc->inode->gfid)) { // local->inode->gfid is NULL
                        posix_gfid_heal (this, real_path, loc, xdata); //<2> run here 
                        {
                              if (!xattr_req)
                                    goto out;

                              if (sys_lstat (path, &stat) != 0)
                                    goto out; //<3> here return ret=-1 and errno=2(ENOENT), so GOTO OUT, it indicate the path file is not exists

                              ret = sys_lgetxattr (path, GFID_XATTR_KEY, uuid_curr, 16);
                              if (ret != 16) {
                                        if (is_fresh_file (&stat)) {
                                                 ret = -1;
                                                 errno = ENOENT;
                                                 goto out;
                                        }
                              }

                             ret = posix_gfid_set (this, path, loc, xattr_req);
                        out:
                               return ret;
                        }

                        MAKE_ENTRY_HANDLE (real_path, par_path, this, //<4> BUT run here, get buf success, op_ret=0, but it's buf->ia_gfid is NULL, it indicate the file is exist but not set GFID.
                                           loc, &buf);
                }
        }

  out:
        if (!op_ret && !gfidless && gf_uuid_is_null (buf.ia_gfid)) { // <5> run here, op_ret=0, gfidless==0, buf.ia_gfid==NULL
                gf_msg (this->name, GF_LOG_ERROR, ENODATA, P_MSG_NULL_GFID,
                        "buf->ia_gfid is null for "
                        "%s", (real_path) ? real_path: "");
                op_ret = -1;
                op_errno = ENODATA; //<6> run here, op_errno=ENODATA
        }
  }
 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

  

Version-Release number of selected component (if applicable):
  git url: https://review.gluster.org/glusterfs
  branch: master (commit 11eb8ba870457337c6067284dde4277e09764c0a)


GlusterFS configure summary
===========================
FUSE client          : yes
Infiniband verbs     : yes
epoll IO multiplex   : yes
argp-standalone      : no
fusermount           : yes
readline             : yes
georeplication       : yes
Linux-AIO            : yes
Enable Debug         : yes
Block Device xlator  : yes
glupy                : yes
Use syslog           : yes
XML output           : yes
Encryption xlator    : yes
Unit Tests           : no
Track priv ports     : yes
POSIX ACLs           : yes
Data Classification  : yes
SELinux features     : yes
firewalld-config     : no
Experimental xlators : yes
Events               : yes
EC dynamic support   : x64 sse avx
Use memory pools     : yes
Nanosecond m/atimes  : yes
Legacy gNFS server   : no

How reproducible:
posmark
write-behind + shard + dht + ec
Three CentOS7.3 nodes and 2 disks on every node.
X86_64

Steps to Reproduce:
1.create a cluster storage service write-behind + shard + dht + ec (2 * (2+1)) 
Volume Name: test
Type: Distributed-Disperse
Volume ID: a110d762-6ce0-4309-b9b2-29c4b9d3dd84
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: node-1.test:/mnt/disk1
Brick2: node-2.test:/mnt/disk3
Brick3: node-3.test:/mnt/disk5
Brick4: node-1.test:/mnt/disk2
Brick5: node-2.test:/mnt/disk4
Brick6: node-3.test:/mnt/disk6
Options Reconfigured:
performance.nfs.write-behind: disable
network.ping-timeout: 20
performance.least-prio-threads: 64
performance.high-prio-threads: 64
cluster.readdir-optimize: on
performance.io-thread-count: 64
features.lock-heal: on
nfs.rpc-auth-reject: *.*.*.*
server.allow-insecure: on
cluster.heal-timeout: 60
performance.stat-prefetch: off
performance.low-prio-threads: 64
features.shard: on
transport.address-family: inet
nfs.disable: on

mount.gluster 127.0.0.1:/test /mnt/test

2. Terminal-1# ./postmark conf
[conf]
set size 104800000 104800000
set number 66
set location /mnt/test
set read 65536
set write 65536
set bias create -1
set bias read 0
set transactions 1
run result.txt
show
quit
3.Terminal-2# cd /var/log/glusterfs after postmark over.
grep ' E ' /var/log/glusterfs/mnt-test-.log
[2017-07-20 05:53:06.944290] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]
[2017-07-20 05:53:25.095401] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 7 failed. Base file gfid = 1cbf5af2-b8a8-4327-a450-7335b7f84c52 [No data available]
[2017-07-20 05:53:38.661017] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 4 failed. Base file gfid = 7cf1712c-0576-428f-a5ce-052eb5b6eddb [No data available]
[2017-07-20 05:53:43.018596] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 20 failed. Base file gfid = acf750ff-c737-4806-a5e6-d2538579b984 [No data available]
[2017-07-20 05:54:21.841236] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 21 failed. Base file gfid = 67e415ea-ba27-4a2c-89d0-c5330a624d9d [No data available]
[2017-07-20 05:54:32.529585] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 4 failed. Base file gfid = f437eda1-ddd0-409b-b6d1-32201f478cb0 [No data available]
[2017-07-20 05:54:33.727440] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 22 failed. Base file gfid = f437eda1-ddd0-409b-b6d1-32201f478cb0 [No data available]
[2017-07-20 05:54:43.243084] E [MSGID: 133010] [shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 15 failed. Base file gfid = 4f56cfac-9062-4c37-8c30-e660b807b6ad [No data available]

Actual results:
shard_common_lookup_shards_cbk() recieve the op_ret=-1, op_errno=ENODATA;

Expected results:
shard_common_lookup_shards_cbk() should handle the op_errno=ENODATA or posix_lookup() should handle the ENODATA. After all ENODATA not indicate file not exist and shard can be to writev().

Additional info:

Comment 1 Jianwei Zhang 2017-07-20 06:56:48 UTC

# cd <glusterfs-DIR>
./autogen.sh && 
./configure --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-systemtap=no --enable-bd-xlator --enable-debug && make -j8 && make install

Comment 2 Krutika Dhananjay 2017-07-23 08:09:06 UTC

Hi,

Thanks again for the bug report. It is interesting to note that in your case you are seeing this with distributed-dispersed-sharded volume.
I'm able to recreate the issue on my colleague's setup with plain distribute volume. I am analysing the tcpdump output to know what went wrong. I will get back to you in a couple of days with the RC.

-Krutika

Comment 3 Krutika Dhananjay 2017-07-24 09:15:08 UTC

Hi,

I've sent the patch https://review.gluster.org/#/c/17821 to fix this issue.
I even tested it on plain distribute-sharded volume (where we were seeing this issue before) and it worked fine.

Do you mind testing it out and letting us know if it fixes your problem, since yours is ec+dist+shard?

-Krutika

Comment 4 Krutika Dhananjay 2018-01-18 05:27:34 UTC

https://review.gluster.org/#/c/17821