Bug 1472764
Summary: | [EC]: md5sum mismatches every time for a file from the fuse client on EC volume | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, nchilaka, pkarampu, rcyriac, rhinduja, rhs-bugs, sheggodu, storage-qa-internal | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | RHGS 3.3.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-37 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1476205 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 05:04:21 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1417151, 1476205 |
Description
Rahul Hinduja
2017-07-19 11:32:47 UTC
(In reply to Rahul Hinduja from comment #0) > Description of problem: > ----------------------- > > some files are giving different md5sum when executed from different mounts > on the same volume. > Example: > 10.70.43.4:/master on /mnt/glusterfs type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other, > max_read=131072) > 10.70.43.4:/master on /mnt/master type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other, > max_read=131072) > 10.70.43.27:/slave on /mnt/slave type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other, > max_read=131072) > [root@fan ~]# md5sum > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > 9c0de9e06e88ddba88248e9c8e502f82 > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > [root@fan ~]# ls -l > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > [root@fan ~]# ls -l > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > [root@fan ~]# md5sum > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > 16f92bf8f96094a214b1d0293c1accda > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > > Further Analysis by Ashish shows the following: > > Looks like files are corrupted at backend, to confirm mounted the volume on > 4 mount points and got the different md5sum. This could be because all mount > points are reading file from different sets of 4 bricks. To confirm this, > killed 2 bricks from each subvolume and again calculated md5sum. This time > all md5sums are same. > > > > [root@fan tmp]# md5sum > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > > [root@fan tmp]# md5sum > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > > [root@fan tmp]# md5sum > /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > > [root@fan tmp]# md5sum > /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > > > Version-Release number of selected component (if applicable): > ============================================================== > > glusterfs-3.8.4-18.5.el7rhgs.x86_64 > > > How reproducible: > ================= > Will retry > > > Steps carried: > ============== > > The steps carried were in geo-rep setup, however the checksum missmatches on > master. Providing only the steps that is needed to avoid confusion. > > 1. Create 6 node Master Cluster > 2. Create EC volume on Master (2x(4+2)) > 3. Mount the volume (Fuse) > 4. Using crefi, create the data set. fops carried inorder were: > > create, chmod, chown, chgrp, hardlink, symlink, truncate > > 5. After every fop, calculate the checksum of mount (At least twice) > 6. After truncate, checksum never matched > Just to clarify - - Fops order does not mention when did you (script) write data on the file - Is it possible that while IO is going on, you truncated the file? - Can we anyhow find out what truncate command actually truncated the file in question? I mean the "truncate -s <??>" - Also, what was the original size of the file if it happened after write completed? > Actual results: > =============== > > Checksum mismatches for a file (Might be data corruption) every time > calculated from different mount for the same volume. > > > Expected results: > ================= > > Checksum should match all the time from all the different mounts. > > Just to clarify - > > - Fops order does not mention when did you (script) write data on the file > - Is it possible that while IO is going on, you truncated the file? The next fop starts only once the earlier fop is completed. > - Can we anyhow find out what truncate command actually truncated the file > in > question? I mean the "truncate -s <??>" AFAIK, CREFI do not have a way, output do not help here. I will keep a watch when I try next time. > - Also, what was the original size of the file if it happened after write > completed? Its the random workload. Each file contains a different size between min=1K to Max=5k. After truncate, can not provide the actual size at the moment. Some notes, still no RC 07:24:01 :) ⚡ md5sum /home/gfs/ec_?/5* ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/596c62e1%%HPJLBVY86T 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/596c62e1%%HPJLBVY86T 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/596c62e1%%HPJLBVY86T bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/596c62e1%%HPJLBVY86T 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/596c62e1%%HPJLBVY86T <<<---- this is the bad fragment b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/596c62e1%%HPJLBVY86T I did the following to see what fragment corresponds to the bad md5sum that is present in the file we got. 07:24:53 :) ⚡ for i in {12288..14336}; do Crefi/crefi.py --multi -n 1 -b 1 -d 1 --size $i -t text --fop=create /mnt/ec2; done root@dhcp35-190 - ~ 07:26:49 :) ⚡ md5sum /home/gfs/ec_4/level00/* | grep 6921a33bd9c1083aaec582cde9973b61 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK root@dhcp35-190 - ~ 07:27:07 :) ⚡ ls -l /mnt/ec2/level00/597a99d1%%4SKACGWCDK -rw-r--r--. 1 root root 14336 Jul 28 07:26 /mnt/ec2/level00/597a99d1%%4SKACGWCDK 07:39:48 :) ⚡ truncate -s 13526 /mnt/ec2/level00/597a99d1%%4SKACGWCDK root@dhcp35-190 - ~ 07:40:10 :) ⚡ md5sum /home/gfs/ec_?/level00/597a99d1%%4SKACGWCDK ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/level00/597a99d1%%4SKACGWCDK 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/level00/597a99d1%%4SKACGWCDK 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/level00/597a99d1%%4SKACGWCDK bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/level00/597a99d1%%4SKACGWCDK f039f8c213c0b31b2d16da87de8da6b7 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK<------ This fragment is correct. b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/level00/597a99d1%%4SKACGWCDK So This means either the write was never wound on the file when this bug happened or it failed silently. This will be the next things to look into. Ashish, Didn't find any evidence for how it failed, but there is a bug in EC. When writev in truncate fails, then it is not marking that brick as bad. Verified it in gdb. We may need more help from Rahul to find how write exactly failed. Are the quota errors contributing to any failures is also something we may need to check. I put one of the brick processes in gdb and made it return error without performing any write: Thread 7 "glusterfsd" hit Breakpoint 3, server3_3_writev (req=0x7fbf100ca140) at server-rpc-fops.c:4100 4100 server_state_t *state = NULL; (gdb) n ... (gdb) 4119 frame = get_frame_from_request (req); (gdb) 4120 if (!frame) { (gdb) p frame=0 $3 = (call_frame_t *) 0x0 (gdb) n 4122 SERVER_REQ_SET_ERROR (req, ret); (gdb) 4123 goto out; (gdb) 4172 free (args.xdata.xdata_val); (gdb) 4174 if (op_errno) (gdb) c I got the same scenario mentioned in this bug: root@dhcp35-190 - ~ 10:34:11 :) ⚡ getfattr -d -m. -e hex /home/gfs/ec_?/services getfattr: Removing leading '/' from absolute path names # file: home/gfs/ec_0/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_1/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_2/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_3/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_4/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_5/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 root@dhcp35-190 - ~ 10:34:13 :) ⚡ md5sum /home/gfs/ec_?/services ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/services 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/services 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/services bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/services 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/services <<<<------ same md5sum as the bad file in Rahul's setup and all xattrs match as if everything is fine when it is not. b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/services Ashish, Can you send out a patch? In both ec_truncate_open_cbk, ec_truncate_writev's cbk, we need to update lock->good_mask correctly. I don't find any other way for the file to end up in the state it did on Rahul's setup. We just need to confirm this is indeed the case. The only other psychotic case will be if the brick sends a success without performing the write. Rahul, Will it be possible to get a tcpdump of a run where this happens? That will confirm how this bug happened. The bug we found based on the information you had on your setup I explained above is a day-1 bug. Pranith release-3.12 patch : https://review.gluster.org/17932 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/113931/ Have tried the use case mentioned in the description for about 7 times and haven't seen this issue with build glusterfs-3.8.4-37 and glusterfs-3.8.4-38. It was intermittent issue to start at the beginning. Have used truncate extensively in the testing. Moving this bug to verified state for now. If we hit this issue again, will reopen. I have tried truncates of files and md5sum is consistent across brick down scenarios, all bricks up scenarios when compared with md5sum of the same files on the client local filesystem 1)created a file of 1.5gb on localFS and copied same to mount ===>checksum consistent 2)truncate file to 1gb on both localfs and glusterfs mount--->again checksums match 3)did umount of client and remounted and check checksum, still consistent 4)brought down bricks in cyclic order, checksums still consistent testbuild:3.8.4-38 on 1x(4+2) ec vol Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |