Bug 1558016

Summary: test ./tests/bugs/ec/bug-1236065.t is generating crash on build
Product: [Community] GlusterFS Reporter: Mohit Agrawal <moagrawa>
Component: disperseAssignee: Xavi Hernandez <jahernan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, jahernan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-v4.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1559079 (view as bug list) Environment:
Last Closed: 2018-06-20 18:02:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1559079    

Description Mohit Agrawal 2018-03-19 12:42:51 UTC
Description of problem:
Test ./tests/bugs/ec/bug-1236065.t is generated crash in multiple builds
https://build.gluster.org/job/centos7-regression/379/console
https://build.gluster.org/job/centos7-regression/380/console

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Run below command to generate the crash, After execute test case 3rd/4th attempt
it has generated crash. 
for i in {1..10}; do prove -vf ./tests/bugs/ec/bug-1236065.t; done

Actual results:


Expected results:


Additional info:

Comment 2 Xavi Hernandez 2018-03-19 19:29:38 UTC
A first analysis shows that a readdirp request is returning directory entries with empty iatt information. I'll check why this happens. Anyway a heal shouldn't be attempted with a NULL gfid. I'll add a check for this.

Comment 3 Worker Ant 2018-03-20 10:07:44 UTC
REVIEW: https://review.gluster.org/19746 (cluster/ec: fix SHD crash for null gfid's) posted (#1) for review on master by Xavi Hernandez

Comment 4 Worker Ant 2018-03-21 16:28:04 UTC
COMMIT: https://review.gluster.org/19746 committed in master by "Xavi Hernandez" <xhernandez> with a commit message- cluster/ec: fix SHD crash for null gfid's

When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1558016
Signed-off-by: Xavi Hernandez <xhernandez>

Comment 5 Shyamsundar 2018-06-20 18:02:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/