Bug 1553129

Summary: Memory corruption is causing crashes, hangs and invalid answers
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: protocolAssignee: Xavi Hernandez <jahernan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, jeff
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-v4.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1554235 (view as bug list) Environment:
Last Closed: 2018-06-20 18:01:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1554235    

Description Xavi Hernandez 2018-03-08 11:27:34 UTC
Description of problem:

I've detected this problem only by running some regression tests in a loop. I haven't seen this in a regular running system.

I'm not absolutely sure yet about the root cause of the memory corruption but some clues seem to indicate that it happens at the protocol/client layer. Still investigating.

Version-Release number of selected component (if applicable): mainline


How reproducible:

very rare

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Darcy 2018-03-08 13:28:11 UTC
One useful trick, from when I had to debug one of these in server code a while ago, is to use gdb's "find" function to search for the mem-pool header/footer around the pointer you're looking at. If it's a use-after-free situation, which is the most common cause of memory corruption, that and a little luck can conclusively identify a culprit.

Comment 2 Worker Ant 2018-03-09 22:32:33 UTC
REVIEW: https://review.gluster.org/19691 (protocol/client: fix memory corruption) posted (#1) for review on master by Xavi Hernandez

Comment 3 Worker Ant 2018-03-10 18:00:57 UTC
COMMIT: https://review.gluster.org/19691 committed in master by "Xavi Hernandez" <xhernandez> with a commit message- protocol/client: fix memory corruption

There was an issue when some accesses to saved_fds list were
protected by the wrong mutex (lock instead of fd_lock).

Additionally, the retrieval of fdctx from fd's context and any
checks done on it have also been protected by fd_lock to avoid
fdctx to become outdated just after retrieving it.

Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe
BUG: 1553129
Signed-off-by: Xavi Hernandez <xhernandez>

Comment 4 Shyamsundar 2018-06-20 18:01:56 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/