Bug 1774711

Summary: vm paused by io-error, crashed on resume, gf_thread_vcreate errors in logs
Product: [Community] GlusterFS Reporter: Darrell <budic>
Component: io-threadsAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 6CC: bugs, moagrawa, pasik
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-20 04:28:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd log for affected VM
none
state dump for affected volume
none
gluster vol info none

Description Darrell 2019-11-20 18:53:05 UTC
Created attachment 1638202 [details]
libvirtd log for affected VM

Description of problem: vm locked up on io to gluster volume using libgfapi mounts in ovirt. Crashed when attempting to resume. Interestingly, this seems to happen approximately every 10 days on one volume.

[2019-11-20 17:48:42.189605] E [MSGID: 101072] [common-utils.c:4030:gf_thread_vc
reate] 0-gDBv2-io-threads: Thread creation failed [Resource temporarily unavaila
ble]


Version-Release number of selected component (if applicable):
Gluster 6, various versions, latest from 6.6

How reproducible:
has been repeating approximately every 10 days since upgrading to Gluster 6

Steps to Reproduce:
1. start vm
2. wait ~10 days
3. vm crashes

Actual results:
vm locks up and crashes when attempting to resume

Expected results:
vm does not lock up using gluster volume

Additional info:
This particular volume hosts a RRD database for Observium using rrd-daemon, so there is a heavy write load every 30 minutes. Volume had performance.io-thread-count = 32 when it crashed, just turned it up to 64 as a test.

Comment 1 Darrell 2019-11-20 18:53:44 UTC
Created attachment 1638203 [details]
state dump for affected volume

Comment 2 Darrell 2019-11-20 18:54:05 UTC
Created attachment 1638204 [details]
gluster vol info

Comment 3 Mohit Agrawal 2020-02-19 14:20:12 UTC
We fixed a leak issue(https://bugzilla.redhat.com/show_bug.cgi?id=1768726) in release 6.7. 
I believe you will not face the issue after upgrade the gluster on 6.7 release.
Kindly upgrade the current gluster version to 6.7 to resolve the same.

Comment 4 Darrell 2020-02-20 04:28:53 UTC
So far the VM that has been crashing regularly has been up and stable for 22 days after upgrading the servers to 6.7, so I think you probably got it. Thanks for the followup!