Bug 1655352

Summary: [GSS] Gluster client process is crashing / getting killed by OOM killer.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ben Turner <bturner>
Component: glusterfsAssignee: Sunny Kumar <sunkumar>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.2CC: bkunal, bturner, nbalacha, nbarry, rgowdapp, rhs-bugs, skoduri, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-06 05:58:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1647277    
Bug Blocks:    

Description Ben Turner 2018-12-03 00:38:33 UTC
Description of problem:

Every two weeks or so the gluster client process is crashing / getting killed by OOM killer when running commvault backups.

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-18.el6rhs.x86_64

How reproducible:

Intermittent, happens about once every two weeks when running daily backups where commvault writes to gluster.

Steps to Reproduce:
1.  Run commvault backup SW
2.  Backup to a gluster FUSE mount
3.  Crash / OOM killer kills the FUSE mount and the backup fails.

Actual results:

Crash / OOM kill takes down the gluster mount.

Expected results:

Normal operation.

Additional info:

We have 2 app cores from the gluster client mount process that need analysis.

Comment 9 Amar Tumballi 2018-12-04 08:52:48 UTC
Few more details required!

Size of RAM.

$ grep 'lru' <statedump-file>
$ grep 'active' <statedump-file>

Comment 11 Nathan Barry 2018-12-04 21:11:39 UTC
RAM at time of coredump was 16GB, client has since been upgraded to 32GB RAM.
Statedumps will be uploaded to case

Comment 12 Ben Turner 2018-12-10 15:28:27 UTC
Its been since 12/3 and no updates have been added and no owner is assigned, what is the status of this bug?

Comment 17 Ben Turner 2018-12-10 16:44:30 UTC
Core was generated by `/usr/sbin/glusterfs --volfile-server=aclrhgs.noblehosted.com --volfile-server=a'.
Program terminated with signal 11, Segmentation fault.
#0  mem_get (mem_pool=0x7fcd5800f4e0) at mem-pool.c:523
523	        *pool_ptr = (struct mem_pool *)mem_pool;

(gdb) f 0
#0  mem_get (mem_pool=0x7fcd5800f4e0) at mem-pool.c:523
523	        *pool_ptr = (struct mem_pool *)mem_pool;

(gdb) p ptr
$1 = (void *) 0x0

(gdb) p pool_ptr
$2 = (struct mem_pool **) 0x10