+++ This bug was initially created as a clone of Bug #1476205 +++ +++ This bug was initially created as a clone of Bug #1472764 +++ Description of problem: ----------------------- some files are giving different md5sum when executed from different mounts on the same volume. Example: 10.70.43.4:/master on /mnt/glusterfs type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 10.70.43.4:/master on /mnt/master type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 10.70.43.27:/slave on /mnt/slave type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@fan ~]# md5sum /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 9c0de9e06e88ddba88248e9c8e502f82 /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan ~]# ls -l /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan ~]# ls -l /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan ~]# md5sum /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 16f92bf8f96094a214b1d0293c1accda /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T Further Analysis by Ashish shows the following: Looks like files are corrupted at backend, to confirm mounted the volume on 4 mount points and got the different md5sum. This could be because all mount points are reading file from different sets of 4 bricks. To confirm this, killed 2 bricks from each subvolume and again calculated md5sum. This time all md5sums are same. [root@fan tmp]# md5sum /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan tmp]# md5sum /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan tmp]# md5sum /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan tmp]# md5sum /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.8.4-18.5.el7rhgs.x86_64 How reproducible: ================= Will retry Steps carried: ============== The steps carried were in geo-rep setup, however the checksum missmatches on master. Providing only the steps that is needed to avoid confusion. 1. Create 6 node Master Cluster 2. Create EC volume on Master (2x(4+2)) 3. Mount the volume (Fuse) 4. Using crefi, create the data set. fops carried inorder were: create, chmod, chown, chgrp, hardlink, symlink, truncate 5. After every fop, calculate the checksum of mount (At least twice) 6. After truncate, checksum never matched Actual results: =============== Checksum mismatches for a file (Might be data corruption) every time calculated from different mount for the same volume. Expected results: ================= Checksum should match all the time from all the different mounts. --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-07-19 07:32:52 EDT --- This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Rahul Hinduja on 2017-07-19 08:53:05 EDT --- sosreports at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1472764/ --- Additional comment from Ashish Pandey on 2017-07-19 12:45:52 EDT --- (In reply to Rahul Hinduja from comment #0) > Description of problem: > ----------------------- > > some files are giving different md5sum when executed from different mounts > on the same volume. > Example: > 10.70.43.4:/master on /mnt/glusterfs type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other, > max_read=131072) > 10.70.43.4:/master on /mnt/master type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other, > max_read=131072) > 10.70.43.27:/slave on /mnt/slave type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other, > max_read=131072) > [root@fan ~]# md5sum > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > 9c0de9e06e88ddba88248e9c8e502f82 > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > [root@fan ~]# ls -l > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > [root@fan ~]# ls -l > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > [root@fan ~]# md5sum > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > 16f92bf8f96094a214b1d0293c1accda > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > > Further Analysis by Ashish shows the following: > > Looks like files are corrupted at backend, to confirm mounted the volume on > 4 mount points and got the different md5sum. This could be because all mount > points are reading file from different sets of 4 bricks. To confirm this, > killed 2 bricks from each subvolume and again calculated md5sum. This time > all md5sums are same. > > > > [root@fan tmp]# md5sum > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > > [root@fan tmp]# md5sum > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > > [root@fan tmp]# md5sum > /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/ > level61/level71/596c62e1%%HPJLBVY86T > > [root@fan tmp]# md5sum > /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > 0bc19eac0693355a2e954ba3d66d4ef1 > /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/ > level71/596c62e1%%HPJLBVY86T > > > Version-Release number of selected component (if applicable): > ============================================================== > > glusterfs-3.8.4-18.5.el7rhgs.x86_64 > > > How reproducible: > ================= > Will retry > > > Steps carried: > ============== > > The steps carried were in geo-rep setup, however the checksum missmatches on > master. Providing only the steps that is needed to avoid confusion. > > 1. Create 6 node Master Cluster > 2. Create EC volume on Master (2x(4+2)) > 3. Mount the volume (Fuse) > 4. Using crefi, create the data set. fops carried inorder were: > > create, chmod, chown, chgrp, hardlink, symlink, truncate > > 5. After every fop, calculate the checksum of mount (At least twice) > 6. After truncate, checksum never matched > Just to clarify - - Fops order does not mention when did you (script) write data on the file - Is it possible that while IO is going on, you truncated the file? - Can we anyhow find out what truncate command actually truncated the file in question? I mean the "truncate -s <??>" - Also, what was the original size of the file if it happened after write completed? > Actual results: > =============== > > Checksum mismatches for a file (Might be data corruption) every time > calculated from different mount for the same volume. > > > Expected results: > ================= > > Checksum should match all the time from all the different mounts. --- Additional comment from Rahul Hinduja on 2017-07-20 02:25:03 EDT --- > > Just to clarify - > > - Fops order does not mention when did you (script) write data on the file > - Is it possible that while IO is going on, you truncated the file? The next fop starts only once the earlier fop is completed. > - Can we anyhow find out what truncate command actually truncated the file > in > question? I mean the "truncate -s <??>" AFAIK, CREFI do not have a way, output do not help here. I will keep a watch when I try next time. > - Also, what was the original size of the file if it happened after write > completed? Its the random workload. Each file contains a different size between min=1K to Max=5k. After truncate, can not provide the actual size at the moment. --- Additional comment from Ashish Pandey on 2017-07-20 07:10:33 EDT --- I tried to reproduce it on my laptop but could not get success. Tried different set of option like write, chown with truncate but nothing helped. Gone through the bricks logs and did not find any significant error message except quota inode_link issue. Tried on 3.2 latest as well as upstream latest. Rahul, could you please give us the exact CREFI command you used so that I can run it for one file and see what happens. --- Additional comment from Rahul Hinduja on 2017-07-22 13:20:42 EDT --- > Rahul, could you please give us the exact CREFI command you used so that I > can run it for one file and see what happens. How to setup and run crefi: --------------------------- Setup: 1. git clone using: git clone https://github.com/vijaykumar-koppad/Crefi.git 2. Ensure you have following packages installed. If not, please subscribe to base channel and install them: a. gcc b. python-setuptools c. pyxattr pyxattr can be downloaded directly from: https://brewweb.engineering.redhat.com/brew/ 3. cd Crefi 4. python setup.py install Run: [root@fan ~]# crefi usage: crefi [-h] [-V] [-n FILES] [--size SIZE] [--random] [--max MAX] [--min MIN] [--single] [--multi] [-b BRDTH] [-d DEPTH] [-l FLEN] [-t {text,sparse,binary,tar}] [-I INTER] [--fop {create,rename,chmod,chown,chgrp,symlink,hardlink,truncate,setxattr}] [-R] [-T THREADS] mntpnt crefi: error: too few arguments [root@fan ~]# Example: crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=create <mnt> --- Additional comment from Rahul Hinduja on 2017-07-22 13:24:55 EDT --- I have hit this issue again by same steps as mentioned in the bug. I have carried these similar steps in earlier releases atleast >10 times but never have hit this. I am marking Regression (via usecase) and proposing this as blocker now due to data corruption. [root@dj ~]# df | grep master 10.70.37.189:/master 104857600 1623040 103234560 2% /mnt/glusterfs 10.70.37.189:/master 104857600 1623040 103234560 2% /mnt/master 10.70.37.189:/master 104857600 1623040 103234560 2% /mnt/new 10.70.37.189:/master 104857600 1623040 103234560 2% /mnt/new2 10.70.37.189:/master 104857600 1623040 103234560 2% /mnt/latestgluster [root@dj ~]# [root@dj ~]# md5sum /mnt/master/thread0/level05/level15/59729275%%OTYJP4U0L5 d061eb37da89f099e22b8e7a267ba4e0 /mnt/master/thread0/level05/level15/59729275%%OTYJP4U0L5 [root@dj ~]# mkdir /mnt/new [root@dj ~]# md5sum /mnt/glusterfs/thread0/level05/level15/59729275%%OTYJP4U0L5 b8e8eaabbbea12a160477f56bcb4485d /mnt/glusterfs/thread0/level05/level15/59729275%%OTYJP4U0L5 [root@dj ~]# mount -t glusterfs 10.70.37.189:/master /mnt/new/ [root@dj ~]# md5sum /mnt/new/thread0/level05/level15/59729275%%OTYJP4U0L5 d061eb37da89f099e22b8e7a267ba4e0 /mnt/new/thread0/level05/level15/59729275%%OTYJP4U0L5 [root@dj ~]# [root@dj ~]# md5sum /mnt/new2/thread0/level05/level15/59729275%%OTYJP4U0L5 d061eb37da89f099e22b8e7a267ba4e0 /mnt/new2/thread0/level05/level15/59729275%%OTYJP4U0L5 [root@dj ~]# md5sum /mnt/glusterfs/thread0/level05/level15/59729275%%OTYJP4U0L5 b8e8eaabbbea12a160477f56bcb4485d /mnt/glusterfs/thread0/level05/level15/59729275%%OTYJP4U0L5 [root@dj ~]# mkdir /mnt/latestgluster [root@dj ~]# mount -t glusterfs 10.70.37.189:/master /mnt/latestgluster/ [root@dj ~]# md5sum /mnt/latestgluster/thread0/level05/level15/59729275%%OTYJP4U0L5 330c99082ff298c3b75766be571cb4f7 /mnt/latestgluster/thread0/level05/level15/59729275%%OTYJP4U0L5 [root@dj ~]# I have hit this issue during rename fop carried after truncate fop. Exact crefi commands are as follows: [geo_rahul@skywalker ~]$ grep -ri "crefi" /home/geo_rahul/regression/3.3.0-RHEL7.4-reg/FUSE-RSYNC-EC.log | grep -i "RETCODE is 0" 2017-07-22 05:02:13,986 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=create /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 05:12:32,798 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chmod /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 05:18:09,531 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chown /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 05:23:24,575 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chgrp /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 05:29:21,515 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=symlink /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 05:42:05,076 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=hardlink /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 05:57:25,711 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=truncate /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 06:15:38,047 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=rename /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 06:51:36,294 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=create /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 07:02:11,336 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chmod /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 07:12:21,204 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chown /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 07:22:41,819 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chgrp /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 07:33:42,973 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=symlink /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 07:47:40,119 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=hardlink /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 08:03:19,522 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=truncate /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 08:22:11,782 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=create /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 08:38:03,876 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chmod /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 08:58:24,953 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chown /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 09:19:06,560 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=chgrp /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 09:40:13,765 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=symlink /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 09:58:55,441 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=hardlink /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 10:20:12,530 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=truncate /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 2017-07-22 18:03:04,933 INFO run "crefi --multi -n 5 -b 10 -d 10 --max=200K --min=10K --random -T 10 -t text --fop=rename /mnt/glusterfs 1>/dev/null 2>&1" on dj.lab.eng.blr.redhat.com: RETCODE is 0 [geo_rahul@skywalker ~]$ Where /mnt/glusterfs is the fuse mount --- Additional comment from Rejy M Cyriac on 2017-07-24 08:55:17 EDT --- At the 'RHGS 3.3.0 - Release Blocker Bug Triage and Status Check' meeting on 24 July, it was decided to ACCEPT this BZ for fix at the RHGS 3.3.0 release --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-07-24 08:55:23 EDT --- Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0' --- Additional comment from Pranith Kumar K on 2017-07-27 19:49:12 EDT --- Some notes, didn't find the RCA yet. Only the last fragment of the file is corrupted. On closer inspection, it is found that the file on 5th brick is the bad-one. It is as if the write didn't happen on the 5th brick but only happened on the rest of the bricks. So the next step is to find why was the write got skipped. Rahul, For more debugging we need /etc/services file from the machine where you ran crefi. Could you attach it to the bz? Pranith --- Additional comment from Pranith Kumar K on 2017-07-27 21:51:51 EDT --- (In reply to Pranith Kumar K from comment #10) > Some notes, didn't find the RCA yet. > > Only the last fragment of the file is corrupted. On closer inspection, it is > found that the file on 5th brick is the bad-one. It is as if the write > didn't happen on the 5th brick but only happened on the rest of the bricks. > So the next step is to find why was the write got skipped. > > Rahul, > For more debugging we need /etc/services file from the machine where you > ran crefi. Could you attach it to the bz? > > Pranith I found it in one of the rhgs machines and the md5sum is matching with what we have in the bz. I will use that --- Additional comment from Pranith Kumar K on 2017-07-27 22:12:20 EDT --- Some notes, still no RC 07:24:01 :) ⚡ md5sum /home/gfs/ec_?/5* ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/596c62e1%%HPJLBVY86T 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/596c62e1%%HPJLBVY86T 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/596c62e1%%HPJLBVY86T bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/596c62e1%%HPJLBVY86T 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/596c62e1%%HPJLBVY86T <<<---- this is the bad fragment b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/596c62e1%%HPJLBVY86T I did the following to see what fragment corresponds to the bad md5sum that is present in the file we got. 07:24:53 :) ⚡ for i in {12288..14336}; do Crefi/crefi.py --multi -n 1 -b 1 -d 1 --size $i -t text --fop=create /mnt/ec2; done root@dhcp35-190 - ~ 07:26:49 :) ⚡ md5sum /home/gfs/ec_4/level00/* | grep 6921a33bd9c1083aaec582cde9973b61 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK root@dhcp35-190 - ~ 07:27:07 :) ⚡ ls -l /mnt/ec2/level00/597a99d1%%4SKACGWCDK -rw-r--r--. 1 root root 14336 Jul 28 07:26 /mnt/ec2/level00/597a99d1%%4SKACGWCDK 07:39:48 :) ⚡ truncate -s 13526 /mnt/ec2/level00/597a99d1%%4SKACGWCDK root@dhcp35-190 - ~ 07:40:10 :) ⚡ md5sum /home/gfs/ec_?/level00/597a99d1%%4SKACGWCDK ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/level00/597a99d1%%4SKACGWCDK 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/level00/597a99d1%%4SKACGWCDK 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/level00/597a99d1%%4SKACGWCDK bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/level00/597a99d1%%4SKACGWCDK f039f8c213c0b31b2d16da87de8da6b7 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK<------ This fragment is correct. b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/level00/597a99d1%%4SKACGWCDK So This means either the write was never wound on the file when this bug happened or it failed silently. This will be the next things to look into. --- Additional comment from Pranith Kumar K on 2017-07-28 01:27:26 EDT --- Ashish, Didn't find any evidence for how it failed, but there is a bug in EC. When writev in truncate fails, then it is not marking that brick as bad. Verified it in gdb. We may need more help from Rahul to find how write exactly failed. Are the quota errors contributing to any failures is also something we may need to check. I put one of the brick processes in gdb and made it return error without performing any write: Thread 7 "glusterfsd" hit Breakpoint 3, server3_3_writev (req=0x7fbf100ca140) at server-rpc-fops.c:4100 4100 server_state_t *state = NULL; (gdb) n ... (gdb) 4119 frame = get_frame_from_request (req); (gdb) 4120 if (!frame) { (gdb) p frame=0 $3 = (call_frame_t *) 0x0 (gdb) n 4122 SERVER_REQ_SET_ERROR (req, ret); (gdb) 4123 goto out; (gdb) 4172 free (args.xdata.xdata_val); (gdb) 4174 if (op_errno) (gdb) c I got the same scenario mentioned in this bug: root@dhcp35-190 - ~ 10:34:11 :) ⚡ getfattr -d -m. -e hex /home/gfs/ec_?/services getfattr: Removing leading '/' from absolute path names # file: home/gfs/ec_0/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_1/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_2/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_3/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_4/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_5/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 root@dhcp35-190 - ~ 10:34:13 :) ⚡ md5sum /home/gfs/ec_?/services ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/services 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/services 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/services bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/services 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/services <<<<------ same md5sum as the bad file in Rahul's setup and all xattrs match as if everything is fine when it is not. b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/services Ashish, Can you send out a patch? In both ec_truncate_open_cbk, ec_truncate_writev's cbk, we need to update lock->good_mask correctly. I don't find any other way for the file to end up in the state it did on Rahul's setup. We just need to confirm this is indeed the case. The only other psychotic case will be if the brick sends a success without performing the write. Rahul, Will it be possible to get a tcpdump of a run where this happens? That will confirm how this bug happened. The bug we found based on the information you had on your setup I explained above is a day-1 bug. Pranith --- Additional comment from Ashish Pandey on 2017-07-28 05:42:15 EDT --- Description of problem: ----------------------- some files are giving different md5sum when executed from different mounts on the same volume. Example: :/master on /mnt/glusterfs type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) :/master on /mnt/master type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) :/slave on /mnt/slave type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@fan ~]# md5sum /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 9c0de9e06e88ddba88248e9c8e502f82 /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan ~]# ls -l /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan ~]# ls -l /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T -rwxrwxr--. 3 42023 13883 13526 Jul 18 05:29 /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan ~]# md5sum /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 16f92bf8f96094a214b1d0293c1accda /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T Further Analysis by Ashish shows the following: Looks like files are corrupted at backend, to confirm mounted the volume on 4 mount points and got the different md5sum. This could be because all mount points are reading file from different sets of 4 bricks. To confirm this, killed 2 bricks from each subvolume and again calculated md5sum. This time all md5sums are same. [root@fan tmp]# md5sum /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/glusterfs/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan tmp]# md5sum /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan tmp]# md5sum /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/temp_master/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T [root@fan tmp]# md5sum /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T 0bc19eac0693355a2e954ba3d66d4ef1 /mnt/temp-2/thread0/level01/level11/level21/level31/level41/level51/level61/level71/596c62e1%%HPJLBVY86T Steps carried: ============== The steps carried were in geo-rep setup, however the checksum missmatches on master. Providing only the steps that is needed to avoid confusion. 1. Create 6 node Master Cluster 2. Create EC volume on Master (2x(4+2)) 3. Mount the volume (Fuse) 4. Using crefi, create the data set. fops carried inorder were: create, chmod, chown, chgrp, hardlink, symlink, truncate 5. After every fop, calculate the checksum of mount (At least twice) 6. After truncate, checksum never matched Actual results: =============== Checksum mismatches for a file (Might be data corruption) every time calculated from different mount for the same volume. Expected results: ================= Checksum should match all the time from all the different mounts. --- Additional comment from Ashish Pandey on 2017-07-28 05:43:54 EDT --- RCA by Pranith - Some notes, didn't find the RCA yet. Only the last fragment of the file is corrupted. On closer inspection, it is found that the file on 5th brick is the bad-one. It is as if the write didn't happen on the 5th brick but only happened on the rest of the bricks. So the next step is to find why was the write got skipped. Rahul, For more debugging we need /etc/services file from the machine where you ran crefi. Could you attach it to the bz? Pranith --- Additional comment from Pranith Kumar K on 2017-07-27 21:51:51 EDT --- (In reply to Pranith Kumar K from comment #10) > Some notes, didn't find the RCA yet. > > Only the last fragment of the file is corrupted. On closer inspection, it is > found that the file on 5th brick is the bad-one. It is as if the write > didn't happen on the 5th brick but only happened on the rest of the bricks. > So the next step is to find why was the write got skipped. > > Rahul, > For more debugging we need /etc/services file from the machine where you > ran crefi. Could you attach it to the bz? > > Pranith I found it in one of the rhgs machines and the md5sum is matching with what we have in the bz. I will use that --- Additional comment from Pranith Kumar K on 2017-07-27 22:12:20 EDT --- Some notes, still no RC 07:24:01 :) ⚡ md5sum /home/gfs/ec_?/5* ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/596c62e1%%HPJLBVY86T 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/596c62e1%%HPJLBVY86T 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/596c62e1%%HPJLBVY86T bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/596c62e1%%HPJLBVY86T 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/596c62e1%%HPJLBVY86T <<<---- this is the bad fragment b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/596c62e1%%HPJLBVY86T I did the following to see what fragment corresponds to the bad md5sum that is present in the file we got. 07:24:53 :) ⚡ for i in {12288..14336}; do Crefi/crefi.py --multi -n 1 -b 1 -d 1 --size $i -t text --fop=create /mnt/ec2; done root@dhcp35-190 - ~ 07:26:49 :) ⚡ md5sum /home/gfs/ec_4/level00/* | grep 6921a33bd9c1083aaec582cde9973b61 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK root@dhcp35-190 - ~ 07:27:07 :) ⚡ ls -l /mnt/ec2/level00/597a99d1%%4SKACGWCDK -rw-r--r--. 1 root root 14336 Jul 28 07:26 /mnt/ec2/level00/597a99d1%%4SKACGWCDK 07:39:48 :) ⚡ truncate -s 13526 /mnt/ec2/level00/597a99d1%%4SKACGWCDK root@dhcp35-190 - ~ 07:40:10 :) ⚡ md5sum /home/gfs/ec_?/level00/597a99d1%%4SKACGWCDK ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/level00/597a99d1%%4SKACGWCDK 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/level00/597a99d1%%4SKACGWCDK 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/level00/597a99d1%%4SKACGWCDK bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/level00/597a99d1%%4SKACGWCDK f039f8c213c0b31b2d16da87de8da6b7 /home/gfs/ec_4/level00/597a99d1%%4SKACGWCDK<------ This fragment is correct. b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/level00/597a99d1%%4SKACGWCDK So This means either the write was never wound on the file when this bug happened or it failed silently. This will be the next things to look into. --- Additional comment from Pranith Kumar K on 2017-07-28 01:27:26 EDT --- Ashish, Didn't find any evidence for how it failed, but there is a bug in EC. When writev in truncate fails, then it is not marking that brick as bad. Verified it in gdb. We may need more help from Rahul to find how write exactly failed. Are the quota errors contributing to any failures is also something we may need to check. I put one of the brick processes in gdb and made it return error without performing any write: Thread 7 "glusterfsd" hit Breakpoint 3, server3_3_writev (req=0x7fbf100ca140) at server-rpc-fops.c:4100 4100 server_state_t *state = NULL; (gdb) n ... (gdb) 4119 frame = get_frame_from_request (req); (gdb) 4120 if (!frame) { (gdb) p frame=0 $3 = (call_frame_t *) 0x0 (gdb) n 4122 SERVER_REQ_SET_ERROR (req, ret); (gdb) 4123 goto out; (gdb) 4172 free (args.xdata.xdata_val); (gdb) 4174 if (op_errno) (gdb) c I got the same scenario mentioned in this bug: root@dhcp35-190 - ~ 10:34:11 :) ⚡ getfattr -d -m. -e hex /home/gfs/ec_?/services getfattr: Removing leading '/' from absolute path names # file: home/gfs/ec_0/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_1/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_2/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_3/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_4/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 # file: home/gfs/ec_5/services security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000000000034d6 trusted.ec.version=0x00000000000000270000000000000027 trusted.gfid=0xeacb572971f5457a9a5706020255f312 root@dhcp35-190 - ~ 10:34:13 :) ⚡ md5sum /home/gfs/ec_?/services ecb9f6f29ebbb72e3e612b0aec775b36 /home/gfs/ec_0/services 6fe5b21f27a5c9dc72ae3422102327b2 /home/gfs/ec_1/services 6f82c16ed99b901929737ddf11dd920b /home/gfs/ec_2/services bcac001cf1cac8860e4ae233a35cd7ef /home/gfs/ec_3/services 6921a33bd9c1083aaec582cde9973b61 /home/gfs/ec_4/services <<<<------ same md5sum as the bad file in Rahul's setup and all xattrs match as if everything is fine when it is not. b9d70081b5b41c9f980e20bf40e1fd13 /home/gfs/ec_5/services Ashish, Can you send out a patch? In both ec_truncate_open_cbk, ec_truncate_writev's cbk, we need to update lock->good_mask correctly. I don't find any other way for the file to end up in the state it did on Rahul's setup. We just need to confirm this is indeed the case. The only other psychotic case will be if the brick sends a success without performing the write. Rahul, Will it be possible to get a tcpdump of a run where this happens? That will confirm how this bug happened. The bug we found based on the information you had on your setup I explained above is a day-1 bug. Pranith --- Additional comment from Worker Ant on 2017-07-28 06:08:03 EDT --- REVIEW: https://review.gluster.org/17906 (ec/cluster: Update failure of fop on a brick properly) posted (#1) for review on master by Ashish Pandey (aspandey) --- Additional comment from Worker Ant on 2017-07-31 02:25:15 EDT --- REVIEW: https://review.gluster.org/17906 (ec/cluster: Update failure of fop on a brick properly) posted (#2) for review on master by Ashish Pandey (aspandey) --- Additional comment from Worker Ant on 2017-07-31 07:55:31 EDT --- COMMIT: https://review.gluster.org/17906 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 12906467f0f8df2edc07ea8412c0c6be6fae11b3 Author: Ashish Pandey <aspandey> Date: Fri Jul 28 15:18:29 2017 +0530 ec/cluster: Update failure of fop on a brick properly Problem: In case of truncate, if writev or open fails on a brick, in some cases it does not mark the failure onlock->good_mask. This causes the update of size and version on all the bricks even if it has failed on one of the brick. That ultimately causes a data corruption. Solution: In callback of such writev and open calls, mark fop->good for parent too. Thanks Pranith Kumar K <pkarampu> for finding the root cause. Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 BUG: 1476205 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: https://review.gluster.org/17906 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: https://review.gluster.org/17932 (ec/cluster: Update failure of fop on a brick properly) posted (#1) for review on release-3.12 by Ashish Pandey (aspandey)
COMMIT: https://review.gluster.org/17932 committed in release-3.12 by Shyamsundar Ranganathan (srangana) ------ commit ece2717e036089e6f727d59652c52ca7dd7c3179 Author: Ashish Pandey <aspandey> Date: Fri Jul 28 15:18:29 2017 +0530 ec/cluster: Update failure of fop on a brick properly Problem: In case of truncate, if writev or open fails on a brick, in some cases it does not mark the failure onlock->good_mask. This causes the update of size and version on all the bricks even if it has failed on one of the brick. That ultimately causes a data corruption. Solution: In callback of such writev and open calls, mark fop->good for parent too. Thanks Pranith Kumar K <pkarampu> for finding the root cause. >Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 >BUG: 1476205 >Signed-off-by: Ashish Pandey <aspandey> >Reviewed-on: https://review.gluster.org/17906 >Reviewed-by: Pranith Kumar Karampuri <pkarampu> >Smoke: Gluster Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Signed-off-by: Ashish Pandey <aspandey> Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 BUG: 1476868 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: https://review.gluster.org/17932 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/