Bug 1476205
| Summary: | [EC]: md5sum mismatches every time for a file from the fuse client on EC volume | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Ashish Pandey <aspandey> | |
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | amukherj, bugs, nchilaka, pkarampu, rcyriac, rhinduja, rhs-bugs, sheggodu, storage-qa-internal | |
| Target Milestone: | --- | Keywords: | Regression | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.13.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1472764 | |||
| : | 1476868 1476870 (view as bug list) | Environment: | ||
| Last Closed: | 2017-08-10 11:13:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1472764 | |||
| Bug Blocks: | 1476868, 1476870 | |||
|
Comment 1
Ashish Pandey
2017-07-28 09:42:15 UTC
RCA by Pranith -
Some notes, didn't find the RCA yet.
Only the last fragment of the file is corrupted. On closer inspection, it is found that the file on 5th brick is the bad-one. It is as if the write didn't happen on the 5th brick but only happened on the rest of the bricks. So the next step is to find why was the write got skipped.
Rahul,
For more debugging we need /etc/services file from the machine where you ran crefi. Could you attach it to the bz?
Pranith
--- Additional comment from Pranith Kumar K on 2017-07-27 21:51:51 EDT ---
(In reply to Pranith Kumar K from comment #10)
> Some notes, didn't find the RCA yet.
>
> Only the last fragment of the file is corrupted. On closer inspection, it is
> found that the file on 5th brick is the bad-one. It is as if the write
> didn't happen on the 5th brick but only happened on the rest of the bricks.
> So the next step is to find why was the write got skipped.
>
> Rahul,
> For more debugging we need /etc/services file from the machine where you
> ran crefi. Could you attach it to the bz?
>
> Pranith
I found it in one of the rhgs machines and the md5sum is matching with what we have in the bz. I will use that
--- Additional comment from Pranith Kumar K on 2017-07-27 22:12:20 EDT ---
Some notes, still no RC
07:24:01 :) ⚡ md5sum /home/gfs/ec_?/5*
ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/596c62e1%%HPJLBVY86T
6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/596c62e1%%HPJLBVY86T
6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/596c62e1%%HPJLBVY86T
bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/596c62e1%%HPJLBVY86T
6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/596c62e1%%HPJLBVY86T <<<---- this is the bad fragment
b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/596c62e1%%HPJLBVY86T
I did the following to see what fragment corresponds to the bad md5sum that is present in the file we got.
07:24:53 :) ⚡ for i in {12288..14336}; do Crefi/crefi.py --multi -n 1 -b 1 -d 1 --size $i -t text --fop=create /mnt/ec2; done
root@dhcp35-190 - ~
07:26:49 :) ⚡ md5sum /home/gfs/ec_4/level00/* | grep 6921a33bd9c1083aaec582cde9973b61
6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK
root@dhcp35-190 - ~
07:27:07 :) ⚡ ls -l /mnt/ec2/level00/597a99d1%%4SKACGWCDK
-rw-r--r--. 1 root root 14336 Jul 28 07:26 /mnt/ec2/level00/597a99d1%%4SKACGWCDK
07:39:48 :) ⚡ truncate -s 13526 /mnt/ec2/level00/597a99d1%%4SKACGWCDK
root@dhcp35-190 - ~
07:40:10 :) ⚡ md5sum /home/gfs/ec_?/level00/597a99d1%%4SKACGWCDK
ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/level00/597a99d1%%4SKACGWCDK
6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/level00/597a99d1%%4SKACGWCDK
6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/level00/597a99d1%%4SKACGWCDK
bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/level00/597a99d1%%4SKACGWCDK
f039f8c213c0b31b2d16da87de8da6b7 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK<------ This fragment is correct.
b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/level00/597a99d1%%4SKACGWCDK
So This means either the write was never wound on the file when this bug happened or it failed silently.
This will be the next things to look into.
--- Additional comment from Pranith Kumar K on 2017-07-28 01:27:26 EDT ---
Ashish,
Didn't find any evidence for how it failed, but there is a bug in EC. When writev in truncate fails, then it is not marking that brick as bad. Verified it in gdb. We may need more help from Rahul to find how write exactly failed. Are the quota errors contributing to any failures is also something we may need to check.
I put one of the brick processes in gdb and made it return error without performing any write:
Thread 7 "glusterfsd" hit Breakpoint 3, server3_3_writev (req=0x7fbf100ca140) at server-rpc-fops.c:4100
4100 server_state_t *state = NULL;
(gdb) n
...
(gdb)
4119 frame = get_frame_from_request (req);
(gdb)
4120 if (!frame) {
(gdb) p frame=0
$3 = (call_frame_t *) 0x0
(gdb) n
4122 SERVER_REQ_SET_ERROR (req, ret);
(gdb)
4123 goto out;
(gdb)
4172 free (args.xdata.xdata_val);
(gdb)
4174 if (op_errno)
(gdb) c
I got the same scenario mentioned in this bug:
root@dhcp35-190 - ~
10:34:11 :) ⚡ getfattr -d -m. -e hex /home/gfs/ec_?/services
getfattr: Removing leading '/' from absolute path names
# file: home/gfs/ec_0/services
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000034d6
trusted.ec.version=0x00000000000000270000000000000027
trusted.gfid=0xeacb572971f5457a9a5706020255f312
# file: home/gfs/ec_1/services
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000034d6
trusted.ec.version=0x00000000000000270000000000000027
trusted.gfid=0xeacb572971f5457a9a5706020255f312
# file: home/gfs/ec_2/services
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000034d6
trusted.ec.version=0x00000000000000270000000000000027
trusted.gfid=0xeacb572971f5457a9a5706020255f312
# file: home/gfs/ec_3/services
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000034d6
trusted.ec.version=0x00000000000000270000000000000027
trusted.gfid=0xeacb572971f5457a9a5706020255f312
# file: home/gfs/ec_4/services
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000034d6
trusted.ec.version=0x00000000000000270000000000000027
trusted.gfid=0xeacb572971f5457a9a5706020255f312
# file: home/gfs/ec_5/services
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000000034d6
trusted.ec.version=0x00000000000000270000000000000027
trusted.gfid=0xeacb572971f5457a9a5706020255f312
root@dhcp35-190 - ~
10:34:13 :) ⚡ md5sum /home/gfs/ec_?/services
ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/services
6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/services
6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/services
bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/services
6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/services <<<<------ same md5sum as the bad file in Rahul's setup and all xattrs match as if everything is fine when it is not.
b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/services
Ashish,
Can you send out a patch? In both ec_truncate_open_cbk, ec_truncate_writev's cbk, we need to update lock->good_mask correctly. I don't find any other way for the file to end up in the state it did on Rahul's setup. We just need to confirm this is indeed the case. The only other psychotic case will be if the brick sends a success without performing the write.
Rahul,
Will it be possible to get a tcpdump of a run where this happens? That will confirm how this bug happened. The bug we found based on the information you had on your setup I explained above is a day-1 bug.
Pranith
REVIEW: https://review.gluster.org/17906 (ec/cluster: Update failure of fop on a brick properly) posted (#1) for review on master by Ashish Pandey (aspandey) REVIEW: https://review.gluster.org/17906 (ec/cluster: Update failure of fop on a brick properly) posted (#2) for review on master by Ashish Pandey (aspandey) COMMIT: https://review.gluster.org/17906 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 12906467f0f8df2edc07ea8412c0c6be6fae11b3 Author: Ashish Pandey <aspandey> Date: Fri Jul 28 15:18:29 2017 +0530 ec/cluster: Update failure of fop on a brick properly Problem: In case of truncate, if writev or open fails on a brick, in some cases it does not mark the failure onlock->good_mask. This causes the update of size and version on all the bricks even if it has failed on one of the brick. That ultimately causes a data corruption. Solution: In callback of such writev and open calls, mark fop->good for parent too. Thanks Pranith Kumar K <pkarampu> for finding the root cause. Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 BUG: 1476205 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: https://review.gluster.org/17906 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/ |