Bug 996987
Summary: | AFR: processes on mount point fail when one of the disks crash | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Sachidananda Urs <surs> |
Component: | glusterfs | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED ERRATA | QA Contact: | senaik |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 2.1 | CC: | amarts, pkarampu, rhs-bugs, surs, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0.22rhs-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-09-23 22:36:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 996089 |
Description
Sachidananda Urs
2013-08-14 12:21:42 UTC
Sachi, Bug 892730 was causing EIO errors to the client where as this issue is causing ENOTCONN to the client. I just verified that the test case which is attached to the commit (http://review.gluster.org/#/c/4376/2/tests/bugs/bug-892730.t) is succeeding on downstream. So this is not a regression of that bug and a new issue. This bug looks a bit similar to https://bugzilla.redhat.com/show_bug.cgi?id=996089. We are in the process of figuring out the root cause. Pranith. Sac, are you able to hit the issue consistently? I was not able to reproduce the issue on RHS-2.1-20130814 ISO. The test was the same i.e. kernel untar and bring down one of the replicas with xfstest-godown. Could you please upload the SOS report if you are able to hit it? Since the sosreports are ~30M, I've uploaded them to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/996987/ I've shown the steps to reproduce to Ravi. Setup is provided for investigation as well. More details on reproducing the issue: ===================================================================== 1.Create a 2x2 distributed-replicate volume 2.Fuse mount the volume and create some files on the mount point for i in {100..1000} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3.While file creation is in progress , bring down one of the bricks in the replica pair [root@boost b1]# gluster v status Vol3 Status of volume: Vol3 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.34.85:/rhs/brick1/c1 N/A N 10003 Brick 10.70.34.86:/rhs/brick1/c2 49281 Y 2536 Brick 10.70.34.87:/rhs/brick1/c3 49256 Y 20002 Brick 10.70.34.88:/rhs/brick1/c4 49201 Y 3810 NFS Server on localhost 2049 Y 10015 Self-heal Daemon on localhost N/A Y 10022 NFS Server on 10.70.34.86 2049 Y 2550 Self-heal Daemon on 10.70.34.86 N/A Y 2558 NFS Server on 10.70.34.87 2049 Y 20014 Self-heal Daemon on 10.70.34.87 N/A Y 20021 NFS Server on 10.70.34.88 2049 Y 3822 Self-heal Daemon on 10.70.34.88 N/A Y 3831 There are no active volume tasks 4. After file creation is completed , calculate are-equal check sum on the mount point [root@RHEL6 Vol3]# /opt/qa/tools/arequal-checksum /mnt/Vol3/ md5sum: /mnt/Vol3/f100: Transport endpoint is not connected /mnt/Vol3/f100: short read ftw (/mnt/Vol3/) returned -1 (Success), terminating Verified in Version : glusterfs-3.4.0.22rhs-1 Followed the same steps as mentioned in comment 5. Unable to reproduce. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html |