1655352 – [GSS] Gluster client process is crashing / getting killed by OOM killer.

Bug 1655352 - [GSS] Gluster client process is crashing / getting killed by OOM killer.

Summary: [GSS] Gluster client process is crashing / getting killed by OOM killer.

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Sunny Kumar
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:	RHGS34MemoryLeak
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-03 00:38 UTC by Ben Turner
Modified:	2022-03-13 16:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-06 05:58:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ben Turner 2018-12-03 00:38:33 UTC

Description of problem:

Every two weeks or so the gluster client process is crashing / getting killed by OOM killer when running commvault backups.

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-18.el6rhs.x86_64

How reproducible:

Intermittent, happens about once every two weeks when running daily backups where commvault writes to gluster.

Steps to Reproduce:
1.  Run commvault backup SW
2.  Backup to a gluster FUSE mount
3.  Crash / OOM killer kills the FUSE mount and the backup fails.

Actual results:

Crash / OOM kill takes down the gluster mount.

Expected results:

Normal operation.

Additional info:

We have 2 app cores from the gluster client mount process that need analysis.

Comment 9 Amar Tumballi 2018-12-04 08:52:48 UTC

Few more details required!

Size of RAM.

$ grep 'lru' <statedump-file>
$ grep 'active' <statedump-file>

Comment 11 Nathan Barry 2018-12-04 21:11:39 UTC

RAM at time of coredump was 16GB, client has since been upgraded to 32GB RAM.
Statedumps will be uploaded to case

Comment 12 Ben Turner 2018-12-10 15:28:27 UTC

Its been since 12/3 and no updates have been added and no owner is assigned, what is the status of this bug?

Comment 17 Ben Turner 2018-12-10 16:44:30 UTC

Core was generated by `/usr/sbin/glusterfs --volfile-server=aclrhgs.noblehosted.com --volfile-server=a'.
Program terminated with signal 11, Segmentation fault.
#0  mem_get (mem_pool=0x7fcd5800f4e0) at mem-pool.c:523
523	        *pool_ptr = (struct mem_pool *)mem_pool;

(gdb) f 0
#0  mem_get (mem_pool=0x7fcd5800f4e0) at mem-pool.c:523
523	        *pool_ptr = (struct mem_pool *)mem_pool;

(gdb) p ptr
$1 = (void *) 0x0

(gdb) p pool_ptr
$2 = (struct mem_pool **) 0x10

Note You need to log in before you can comment on or make changes to this bug.