Bug 1240657
Summary: | Deceiving log messages like "Failing STAT on gfid : split-brain observed. [Input/output error]" reported | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Saurabh <saujain> | ||||
Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> | ||||
Status: | CLOSED ERRATA | QA Contact: | Shruti Sampat <ssampat> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.1 | CC: | annair, asrivast, byarlaga, divya, jthottan, kdhananj, kkeithle, mzywusko, ndevos, nlevinki, pkarampu, rhs-bugs, saujain, skoduri, ssampat, storage-qa-internal, vagarwal | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | RHGS 3.1.1 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.1-12 | Doc Type: | Bug Fix | ||||
Doc Text: |
Previously, AFR was logging messages about files and directories going into split-brain even in case of failures that were unrelated to split-brain. As a consequence, for each stat on a file and directory that fails, AFR would wrongly report that it is in split-brain. With this fix, AFR logs messages about split-brain only in case of a true split-brain.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1246052 (view as bug list) | Environment: | |||||
Last Closed: | 2015-10-05 07:18:43 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1216951, 1246052, 1246987, 1251815 | ||||||
Attachments: |
|
Description
Saurabh
2015-07-07 12:47:53 UTC
Created attachment 1049276 [details]
nfs11 ganesha-gfapi.log
rm -rf /mount-point/dir-name or rmdir /mount-point/dir-name Please provide the tests you have been running before you hit the issue and if its consistently reproducible and also the volume setup details (if in case any other features are on or any bricks unavailable?) It is pretty staright forward hence I just wrote the description. 1. create a volume of type 6x2, start it 2. mount the volume with vers=4, post configuring nfs-ganesha 3. mkdir /mount-point/<dirname> 4. rmdir /mount-point/<dirname> Thanks Saurabh. Have changed the bug summary to reflect that. These messages are related to AFR, changing the component. When a directory (or file) over NFS gets removed, a stat() on the filehandle gets done afterwards. This is needed for updating the inode-cache that could still be valid for hardlinks. It is not clear to me what a stat() on a GFID could return ENXIO instead of ENOENT. Doc text is edited. Please sign off to be included in Known Issues. Here are simpler steps to recreate the issue (the one that doesn't require you to set up NFS Ganesha): 1. Create a 1x2 replicate volume and start it. 2. Disable md-cache on the volume. (#gluster volume set <VOL> performance.stat-prefetch off) 2. Create two FUSE mounts with {entry,attribute} timeout set to 0, and readdirp disabled. #glusterfs --volfile-server=kritika --volfile-id=rep --attribute-timeout=0 --entry-timeout=0 --use-readdirp=no /mnt/glusterfs/0 and .. #glusterfs --volfile-server=kritika --volfile-id=rep --attribute-timeout=0 --entry-timeout=0 --use-readdirp=no /mnt/glusterfs/1 (This is to prevent STAT() from being served off the cache and in turn force calls to afr_stat()). 3. From the first mount point, create a directory. #mkdir /mnt/glusterfs/0/dir 4. From the second mount point, remove this directory. #rmdir /mnt/glusterfs/1/dir 5. Issue stat on this directory's path from the first mount point. #stat /mnt/glusterfs/0/dir. 6. stat will fail with ENOENT (expected). Check the log file of the first mount process and you'll find a message indicating split-brain of "dir" (not expected). Patch merged. Verified as fixed in glusterfs-3.7.1-14.el7rhgs.x86_64. Unable to reproduce the issue in 3.1.0 (without nfs-ganesha) using the test case described in comment #12. Tried on a volume exported via nfs-ganesha with the steps described in comment #6. Found to be fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html |