Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1554235

Summary: Memory corruption is causing crashes, hangs and invalid answers
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: protocolAssignee: Xavi Hernandez <jahernan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0CC: bugs, jeff
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1553129 Environment:
Last Closed: 2018-03-26 12:32:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1553129    
Bug Blocks:    

Description Xavi Hernandez 2018-03-12 07:44:03 UTC
+++ This bug was initially created as a clone of Bug #1553129 +++

Description of problem:

I've detected this problem only by running some regression tests in a loop. I haven't seen this in a regular running system.

I'm not absolutely sure yet about the root cause of the memory corruption but some clues seem to indicate that it happens at the protocol/client layer. Still investigating.

Version-Release number of selected component (if applicable): mainline


How reproducible:

very rare

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Jeff Darcy on 2018-03-08 14:28:11 CET ---

One useful trick, from when I had to debug one of these in server code a while ago, is to use gdb's "find" function to search for the mem-pool header/footer around the pointer you're looking at. If it's a use-after-free situation, which is the most common cause of memory corruption, that and a little luck can conclusively identify a culprit.

Comment 1 Worker Ant 2018-03-12 09:16:41 UTC
REVIEW: https://review.gluster.org/19699 (protocol/client: fix memory corruption) posted (#1) for review on release-4.0 by Xavi Hernandez

Comment 2 Worker Ant 2018-03-20 11:00:24 UTC
COMMIT: https://review.gluster.org/19699 committed in release-4.0 by "Shyamsundar Ranganathan" <srangana> with a commit message- protocol/client: fix memory corruption

There was an issue when some accesses to saved_fds list were
protected by the wrong mutex (lock instead of fd_lock).

Additionally, the retrieval of fdctx from fd's context and any
checks done on it have also been protected by fd_lock to avoid
fdctx to become outdated just after retrieving it.

Backport of:
> BUG: 1553129

Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe
BUG: 1554235
Signed-off-by: Xavi Hernandez <xhernandez>

Comment 3 Shyamsundar 2018-03-26 12:32:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.1, please open a new bug report.

glusterfs-4.0.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000093.html
[2] https://www.gluster.org/pipermail/gluster-users/