| Summary: | glusterfs process *replicate* consumed 75 GB of 96 GB, forcing node into OOM Kill | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Peter Portante <pportant> | ||||||
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | storage-qa-internal <storage-qa-internal> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rhgs-3.0 | CC: | amukherj, aspandey, perfbz, pportant, rhs-bugs, storage-qa-internal, vbellur | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-06-15 09:00:32 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Peter Portante
2016-04-07 01:15:09 UTC
This is for RHGS 3.0.4, on RHEL 6.6. This does NOT appear to be related to https://bugzilla.redhat.com/show_bug.cgi?id=1247221, as no find operations were being performed on the local disks. Changing the component to AFR as its the self heal daemon which is consuming this amount of memory. After restarting gluster on that host, it has returned to growing in its memory use again, at 16 GB now. How do I safely restart gluster at this time to avoid the memory growth causing a problem? Created attachment 1145815 [details]
First statedump
Attaching statedump #1
Created attachment 1145816 [details]
Second Statedump
Note that memory leaks seem to be stemming from gf_strdup: [cluster/replicate.pbench-replicate-0 - usage-type 40 memusage] type=gf_common_mt_strdup size=2918102943 num_allocs=32222745 max_size=2918102943 [cluster/replicate.pbench-replicate-4 - usage-type 40 memusage] type=gf_common_mt_strdup size=2917406121 num_allocs=32216871 max_size=2917406121 [cluster/replicate.pbench-replicate-8 - usage-type 40 memusage] type=gf_common_mt_strdup size=2134711491 num_allocs=9109248 max_size=2134711786 Peter,
In 3.0 afr-v1 was present where as from 3.1 onwards afr-v2 is present in self-heal daemon so the code is completely different. Since we are not going to make anymore releases on 3.0.x I am closing this bug for now. Please feel free to re-open/open new bug if you face same issue on 3.1
Pranith
|