Bug 1416321
Summary: | read error: Input/output error but not in brain split stat while file on bricks has different sizes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | likunbyl | ||||||
Component: | replicate | Assignee: | bugs <bugs> | ||||||
Status: | CLOSED EOL | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.8 | CC: | bugs, hchiramm, khiremat, likunbyl, pkarampu, ravishankar | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-11-07 10:42:53 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Hi, I think you already mailed on gluster-users, but 1.can you also attach the mount log at that time you got EIO? 2.Are the getfattr outputs before you ran heal full? 3.When you ran heal full, did you get messages similar to "Completed data self-heal on 0xa61d7b2fd53f4eb9be7a656b0915dbea" etc. in mount or glustershd.log? 1. There is mount log at that time I got EIO in my attached file. 2. yes, getfattr outputs was before I ran heal full 3.at ac08, I got the following log: [2017-01-03 02:30:55.147054] I [MSGID: 108026] [afr-self-heal-common.c:1076:afr_log_selfheal] 0-gvol0-replicate-17: Completed data selfheal on a61d7b2f-d53f-4eb9-be7a-656b0915dbea. sources=[2] sinks=0 1 a61d7b2f-d53f-4eb9-be7a-656b0915dbea is the same inode with the error file. There is no log in another two server. 2017-01-03 02:30:55 is the time that I ran heal full command. today, One of glusterfs server rebooted. from glusterfsd.log: 06:20:32, gluster selfheal began. 07:18:21, gluster selfheal finished. 08:15, my java application report EIO 08:19:08, start gluster volume heal gvol0 full 08:25:18, heal full finished. between 08:19:08 and 08:25:18, I got 46 "Completed data self-heal" logs. that means there were 46 files that were not healed by self-heal daemon, right? (In reply to likunbyl from comment #2) > 1. There is mount log at that time I got EIO in my attached file. I only see brick logs in https://bugzilla.redhat.com/attachment.cgi?id=1244198 , no mount (client) logs. Can you provide the mount log? I'm trying to see which translator returned EIO. The brick logs you provided also do not contain EIO related messages. Some other comments: The afr extended attributes you provided are all zeroes, meaning there was no pending heal. If you still got "Completed data selfheal" message, some FOP must have failed after you obtained the getfattr output, causing non-zero afr xattrs and thus a heal. (In reply to Ravishankar N from comment #4) > Can you provide the mount log? > where is the mount log located ? I am running kubernetes to manage all my apps, and mount glusterfs volume into pod with following configurations: { name: "glusterfs", glusterfs: { endpoints: "glusterfs-cluster", path: "gvol0", }, }, It is normally at /var/log/glusterfs/ on the machine where you have mounted the volume. Humble, would you know if the log location is any different for container set ups? There is no logs about glusterfs mount in the container and the servers that run the pod. is this problem related to the client side? before upgrade to glusterfs3.8.5, I have ran glusterfs 3.7 for several months, without the EIO error at all. the difference between the two stage is as the following: 1. different glusterfs version. 2. before, we mount volume through fuse, later, we mount volume througth k8s 3. before, the servers is stable, later, several servers reboot abnormally.
>
> core@ab09 /export/gluster/sdb3/vol/gps/data/2017-01/02 $ ls -l 4.txt
> -rw-r--r--. 2 root root 7490 Jan 2 00:00 4.txt
>
> core@ac08 /export/gluster/sdb3/vol/gps/data/2017-01/02 $ ls -l 4.txt
> -rw-r--r--. 2 root root 9046 Jan 2 00:00 4.txt
>
> core@ad08 /export/gluster/sdb3/vol/gps/data/2017-01/02 $ ls -l 4.txt
> -rw-r--r--. 2 root root 9046 Jan 2 00:00 4.txt
>
> And the getfattr output:
>
> core@ab09 /export/gluster/sdb3/vol/gps/data/2017-01/02 # getfattr -m . -d
> -e hex 4.txt
> # file: 4.txt
> security.
> selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.gvol0-client-51=0x000000000000000000000000
> trusted.afr.gvol0-client-52=0x000000000000000000000000
> trusted.afr.gvol0-client-53=0x000000000000000000000000
> trusted.bit-rot.version=0x02000000000000005850b03d00072c12
> trusted.gfid=0xa61d7b2fd53f4eb9be7a656b0915dbea
> trusted.glusterfs.quota.363d3c2d-5ccb-497e-9ecc-6eaf5f435026.contri.
> 1=0x0000000000001e000000000000000001
> trusted.pgfid.363d3c2d-5ccb-497e-9ecc-6eaf5f435026=0x00000001
>
> core@ac08 /export/gluster/sdb3/vol/gps/data/2017-01/02 # getfattr -m . -d
> -e hex 4.txt
> # file: 4.txt
> security.
> selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.gvol0-client-51=0x000000000000000000000000
> trusted.afr.gvol0-client-52=0x000000000000000000000000
> trusted.afr.gvol0-client-53=0x000000000000000000000000
> trusted.bit-rot.version=0x02000000000000005850b03d0007711e
> trusted.gfid=0xa61d7b2fd53f4eb9be7a656b0915dbea
> trusted.glusterfs.quota.363d3c2d-5ccb-497e-9ecc-6eaf5f435026.contri.
> 1=0x00000000000024000000000000000001
> trusted.pgfid.363d3c2d-5ccb-497e-9ecc-6eaf5f435026=0x00000001
>
> core@ad08 /export/gluster/sdb3/vol/gps/data/2017-01/02 # getfattr -m . -d
> -e hex 4.txt
> # file: 4.txt
> security.
> selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.gvol0-client-51=0x000000000000000000000000
> trusted.afr.gvol0-client-52=0x000000000000000000000000
> trusted.afr.gvol0-client-53=0x000000000000000000000000
> trusted.bit-rot.version=0x02000000000000005861dcf900054117
> trusted.gfid=0xa61d7b2fd53f4eb9be7a656b0915dbea
> trusted.glusterfs.quota.363d3c2d-5ccb-497e-9ecc-6eaf5f435026.contri.
> 1=0x00000000000024000000000000000001
> trusted.pgfid.363d3c2d-5ccb-497e-9ecc-6eaf5f435026=0x00000001
>
I see that the bit-rit versions are different for all the files. Did anyone get a chance to find why? Ideally they should be same for the same content right?
Kotresh,
Do you suspect there to be any issue?
So, since I ran glusterfs in kubernetes, no client mount log can be provided, is there anything else I can do? (In reply to likunbyl from comment #9) > So, since I ran glusterfs in kubernetes, no client mount log can be > provided, is there anything else I can do? Is this a kubernetes setup or openshift setup ? For example, in Openshift setup gluster logs are kept under below path: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/gluster-pv-local/glusterfs.log Where "/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/" is common for all the gluster logs. This common path is appended with 'volume name' and 'pod name' as shown below. --snip- p := path.Join(b.glusterfs.plugin.host.GetPluginDir(glusterfsPluginName), b.glusterfs.volName) .... log := path.Join(p, b.pod.Name+"-glusterfs.log") --/snip--
>
> Kotresh,
> Do you suspect there to be any issue?
Bitrot versions can be different for different replica bricks. It can happen if one of brick goes down and the I/O happens. In that case, bit-rot version would increase on the brick which is up. If multiple writes happen, the version keeps incrementing on it. Now when the brick which is down comes up, AFR would heal it at one shot, incrementing the version only once. So even though the data is consistent, the versions can differ. But the signature should be consistent across replicas.
(In reply to Humble Chirammal from comment #10) > > Is this a kubernetes setup or openshift setup ? Thank you very much, I have found the glusterfs client mount log, it's located under the /var/lib/kubelet/plugins/kubernetes.io/glusterfs/glusterfs/ directory. I have excerpted the logs about the EIO at that time in an attached file . I have several servers access these files at almost the same time. Created attachment 1255924 [details]
glusterfs client mount log
What version of gluster are you using for the clients? [2017-01-01 16:00:02.990799] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gvol0-replicate-17: Unable to self-heal contents of '/gps/up_data/2017-01/02/4.txt' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 0 0 ] [ 0 0 0 ] [ 0 0 0 ] ] [2017-01-01 16:00:02.991044] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gvol0-replicate-17: backgroung data self heal failed, on /gps/up_data/2017-01/02/4.txt [2017-01-01 16:00:02.991239] W [afr-open.c:209:afr_open] 0-gvol0-replicate-17: failed to open as split brain seen, returning EIO [2017-01-01 16:00:02.991265] W [fuse-bridge.c:2209:fuse_writev_cbk] 0-glusterfs-fuse: 340493338: WRITE => -1 (Input/output error) From the client logs, it seems AFR seems to have returned the EIO despite a pending matrix of all zeroes. But what is noteworthy is that the messages correspond to an older version of afr (i.e. afr-v1 in glusterfs-3.5 or older). 1. What version of gluster are you using for the clients? 2. It is recommended to update all the clients also to the same version (3.8.x) as the servers. (In reply to Ravishankar N from comment #14) > What version of gluster are you using for the clients? > > 1. What version of gluster are you using for the clients? > 2. It is recommended to update all the clients also to the same version > (3.8.x) as the servers. the client's glusterfs version is 3.5.2. We are now using coreos kubernetes 1.5.1 to manage our containers, and the glusterfs client software is packaged into hyperkube, so, the glusterfs client we can use is up to the update status of hyperkube. so, is the bug related to the glusterfs client version ? could keeping the client version same to the server avoid this problem ? (In reply to likunbyl from comment #15) > so, is the bug related to the glusterfs client version ? could keeping the > client version same to the server avoid this problem ? It could likely be, since the EIO is originating from AFR. gluster-3.5 is end of life[1] now, (as is 3.6 and soon 3.7 too, once 3.10 is out) and even it if were a bug in it, there won't be any fix available. I would urge you to have all gluster bits at 3.8 and see if the issue it hit. [1] https://www.gluster.org/community/release-schedule/ (In reply to Ravishankar N from comment #16) > (In reply to likunbyl from comment #15) > It could likely be, since the EIO is originating from AFR. gluster-3.5 is > end of life[1] now, (as is 3.6 and soon 3.7 too, once 3.10 is out) and even > it if were a bug in it, there won't be any fix available. > > I would urge you to have all gluster bits at 3.8 and see if the issue it hit. > So, back to the original problem, what was the root cause of the inconsistence of file 4.txt between three replicas, client side or server side ? (In reply to likunbyl from comment #17) > (In reply to Ravishankar N from comment #16) > > (In reply to likunbyl from comment #15) > > It could likely be, since the EIO is originating from AFR. gluster-3.5 is > > end of life[1] now, (as is 3.6 and soon 3.7 too, once 3.10 is out) and even > > it if were a bug in it, there won't be any fix available. > > > > I would urge you to have all gluster bits at 3.8 and see if the issue it hit. > > > > So, back to the original problem, what was the root cause of the > inconsistence of file 4.txt between three replicas, client side or server > side ? I thought I already answered that. AFR==client side. (In reply to Ravishankar N from comment #18) > > I thought I already answered that. AFR==client side. Do I need to close this case ? (In reply to likunbyl from comment #17) > (In reply to Ravishankar N from comment #16) > > (In reply to likunbyl from comment #15) > > It could likely be, since the EIO is originating from AFR. gluster-3.5 is > > end of life[1] now, (as is 3.6 and soon 3.7 too, once 3.10 is out) and even > > it if were a bug in it, there won't be any fix available. > > > > I would urge you to have all gluster bits at 3.8 and see if the issue it hit. > > > > So, back to the original problem, what was the root cause of the > inconsistence of file 4.txt between three replicas, client side or server > side ? Ravi, I am also curious to know what lead to the inconsistency to begin with. Do we know what might have happened which lead to different file sizes but no xattrs to indicate it. Pranith (In reply to likunbyl from comment #19) > (In reply to Ravishankar N from comment #18) > > > > I thought I already answered that. AFR==client side. > > Do I need to close this case ? I think we don't yet have any theory as to why this might have happened, so while upgrading to a supported version is recommended, it doesn't give us any confidence about if the problem we experienced in the existing setup is already solved or not in the newer version. We used to see this kind of problem when any of the disks crash where the sizes would mismatch but there wouldn't be any extended attributes to tell that this happened. (In reply to Pranith Kumar K from comment #21) > (In reply to likunbyl from comment #19) > > (In reply to Ravishankar N from comment #18) > > > > > > I thought I already answered that. AFR==client side. > > > > Do I need to close this case ? > > I think we don't yet have any theory as to why this might have happened, so > while upgrading to a supported version is recommended, it doesn't give us > any confidence about if the problem we experienced in the existing setup is > already solved or not in the newer version. > > We used to see this kind of problem when any of the disks crash where the > sizes would mismatch but there wouldn't be any extended attributes to tell > that this happened. Just to want to clarify that if a file ends up in the state you mentioned in the bz, the file will be automatically healed in versions > 3.6.x so if you upgrade to 3.8.x then you won't get EIO when the file ends up in this state. So upgrade is recommended. What I am curious to know is how did the file end up in this state. We used to see this kind of issue only when there is a disk/filesystem crash or ungraceful shutdown of the machine. Did any of the above happen on the brick servers where the first brick is present? (In reply to Pranith Kumar K from comment #22) > > Did any of the above happen on the brick servers where the first brick is > present? yes, one of the servers, ad08, rebooted at Dec 27, 2016, abnormally. (In reply to likunbyl from comment #23) > (In reply to Pranith Kumar K from comment #22) > > > > Did any of the above happen on the brick servers where the first brick is > > present? > > yes, one of the servers, ad08, rebooted at Dec 27, 2016, abnormally. Okay that probably is the bit-rot checksum mismatch even when the filesizes were same. It still doesn't explain the reason for less size in ab09. Anything happened on that machine? (In reply to Pranith Kumar K from comment #24) > It still doesn't explain the reason for less size in ab09. > Anything happened on that machine? Noop This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release. |
Created attachment 1244198 [details] volume info and volume status/heal info, brick logs at that time Description of problem: One of our glusterfs consumer reported an error: java.io.FileNotFoundException: /mnt/glusterfs/gps/data/2017-01/02/4.txt (Input/output error). Then I tried to access this file: # cd /mnt/glusterfs/gps/data/2017-01/02 # more 4.txt # cat 4.txt cat: read error: Input/output error Then I logined into the bricks that hold this file, the size of the file 4.txt was different: core@ab09 /export/gluster/sdb3/vol/gps/data/2017-01/02 $ ls -l 4.txt -rw-r--r--. 2 root root 7490 Jan 2 00:00 4.txt core@ac08 /export/gluster/sdb3/vol/gps/data/2017-01/02 $ ls -l 4.txt -rw-r--r--. 2 root root 9046 Jan 2 00:00 4.txt core@ad08 /export/gluster/sdb3/vol/gps/data/2017-01/02 $ ls -l 4.txt -rw-r--r--. 2 root root 9046 Jan 2 00:00 4.txt And the getfattr output: core@ab09 /export/gluster/sdb3/vol/gps/data/2017-01/02 # getfattr -m . -d -e hex 4.txt # file: 4.txt security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.gvol0-client-51=0x000000000000000000000000 trusted.afr.gvol0-client-52=0x000000000000000000000000 trusted.afr.gvol0-client-53=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005850b03d00072c12 trusted.gfid=0xa61d7b2fd53f4eb9be7a656b0915dbea trusted.glusterfs.quota.363d3c2d-5ccb-497e-9ecc-6eaf5f435026.contri.1=0x0000000000001e000000000000000001 trusted.pgfid.363d3c2d-5ccb-497e-9ecc-6eaf5f435026=0x00000001 core@ac08 /export/gluster/sdb3/vol/gps/data/2017-01/02 # getfattr -m . -d -e hex 4.txt # file: 4.txt security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.gvol0-client-51=0x000000000000000000000000 trusted.afr.gvol0-client-52=0x000000000000000000000000 trusted.afr.gvol0-client-53=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005850b03d0007711e trusted.gfid=0xa61d7b2fd53f4eb9be7a656b0915dbea trusted.glusterfs.quota.363d3c2d-5ccb-497e-9ecc-6eaf5f435026.contri.1=0x00000000000024000000000000000001 trusted.pgfid.363d3c2d-5ccb-497e-9ecc-6eaf5f435026=0x00000001 core@ad08 /export/gluster/sdb3/vol/gps/data/2017-01/02 # getfattr -m . -d -e hex 4.txt # file: 4.txt security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.gvol0-client-51=0x000000000000000000000000 trusted.afr.gvol0-client-52=0x000000000000000000000000 trusted.afr.gvol0-client-53=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005861dcf900054117 trusted.gfid=0xa61d7b2fd53f4eb9be7a656b0915dbea trusted.glusterfs.quota.363d3c2d-5ccb-497e-9ecc-6eaf5f435026.contri.1=0x00000000000024000000000000000001 trusted.pgfid.363d3c2d-5ccb-497e-9ecc-6eaf5f435026=0x00000001 gluster volume heal info didn’t report anything. Although a “gluster volume heal gvol0 full” command handled this, synced this file, the impact to glusterfs consumer was real, and would happen again in the near future, I supposed. Setup: OS: coreos 1185.5.0 kubernetes: v1.5.1 Image: official gluster-centos:gluster3u8_centos7 Gluster: 3.8.5 20 nodes cluster running 132 bricks dist-rep 44x3 volume Version-Release number of selected component (if applicable): 3.8.5 How reproducible: Not always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: will attach the volume info and volume status/heal info, brick logs at that time.