Bug 1774711

Summary:

vm paused by io-error, crashed on resume, gf_thread_vcreate errors in logs

Product:

[Community] GlusterFS

Reporter:

Darrell <budic>

Component:

io-threads

Assignee:

Mohit Agrawal <moagrawa>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

bugs, moagrawa, pasik

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-02-20 04:28:53 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
libvirtd log for affected VM	none
state dump for affected volume	none
gluster vol info	none

Description Darrell 2019-11-20 18:53:05 UTC

Created attachment 1638202 [details]
libvirtd log for affected VM

Description of problem: vm locked up on io to gluster volume using libgfapi mounts in ovirt. Crashed when attempting to resume. Interestingly, this seems to happen approximately every 10 days on one volume.

[2019-11-20 17:48:42.189605] E [MSGID: 101072] [common-utils.c:4030:gf_thread_vc
reate] 0-gDBv2-io-threads: Thread creation failed [Resource temporarily unavaila
ble]


Version-Release number of selected component (if applicable):
Gluster 6, various versions, latest from 6.6

How reproducible:
has been repeating approximately every 10 days since upgrading to Gluster 6

Steps to Reproduce:
1. start vm
2. wait ~10 days
3. vm crashes

Actual results:
vm locks up and crashes when attempting to resume

Expected results:
vm does not lock up using gluster volume

Additional info:
This particular volume hosts a RRD database for Observium using rrd-daemon, so there is a heavy write load every 30 minutes. Volume had performance.io-thread-count = 32 when it crashed, just turned it up to 64 as a test.

Comment 1 Darrell 2019-11-20 18:53:44 UTC

Created attachment 1638203 [details]
state dump for affected volume

Comment 2 Darrell 2019-11-20 18:54:05 UTC

Created attachment 1638204 [details]
gluster vol info

Comment 3 Mohit Agrawal 2020-02-19 14:20:12 UTC

We fixed a leak issue(https://bugzilla.redhat.com/show_bug.cgi?id=1768726) in release 6.7. 
I believe you will not face the issue after upgrade the gluster on 6.7 release.
Kindly upgrade the current gluster version to 6.7 to resolve the same.

Comment 4 Darrell 2020-02-20 04:28:53 UTC

So far the VM that has been crashing regularly has been up and stable for 22 days after upgrading the servers to 6.7, so I think you probably got it. Thanks for the followup!