Bug 1655352

Summary:	[GSS] Gluster client process is crashing / getting killed by OOM killer.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ben Turner <bturner>
Component:	glusterfs	Assignee:	Sunny Kumar <sunkumar>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Bala Konda Reddy M <bmekala>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.2	CC:	bkunal, bturner, nbalacha, nbarry, rgowdapp, rhs-bugs, skoduri, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-06 05:58:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1647277
Bug Blocks:

Description Ben Turner 2018-12-03 00:38:33 UTC

Description of problem:

Every two weeks or so the gluster client process is crashing / getting killed by OOM killer when running commvault backups.

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-18.el6rhs.x86_64

How reproducible:

Intermittent, happens about once every two weeks when running daily backups where commvault writes to gluster.

Steps to Reproduce:
1.  Run commvault backup SW
2.  Backup to a gluster FUSE mount
3.  Crash / OOM killer kills the FUSE mount and the backup fails.

Actual results:

Crash / OOM kill takes down the gluster mount.

Expected results:

Normal operation.

Additional info:

We have 2 app cores from the gluster client mount process that need analysis.

Comment 9 Amar Tumballi 2018-12-04 08:52:48 UTC

Few more details required!

Size of RAM.

$ grep 'lru' <statedump-file>
$ grep 'active' <statedump-file>

Comment 11 Nathan Barry 2018-12-04 21:11:39 UTC

RAM at time of coredump was 16GB, client has since been upgraded to 32GB RAM.
Statedumps will be uploaded to case

Comment 12 Ben Turner 2018-12-10 15:28:27 UTC

Its been since 12/3 and no updates have been added and no owner is assigned, what is the status of this bug?

Comment 17 Ben Turner 2018-12-10 16:44:30 UTC

Core was generated by `/usr/sbin/glusterfs --volfile-server=aclrhgs.noblehosted.com --volfile-server=a'.
Program terminated with signal 11, Segmentation fault.
#0  mem_get (mem_pool=0x7fcd5800f4e0) at mem-pool.c:523
523	        *pool_ptr = (struct mem_pool *)mem_pool;

(gdb) f 0
#0  mem_get (mem_pool=0x7fcd5800f4e0) at mem-pool.c:523
523	        *pool_ptr = (struct mem_pool *)mem_pool;

(gdb) p ptr
$1 = (void *) 0x0

(gdb) p pool_ptr
$2 = (struct mem_pool **) 0x10