Bug 1933338 - oVirt failed to update OVF due to stale file handle on gluster domain
Summary: oVirt failed to update OVF due to stale file handle on gluster domain
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: vdsm
Classification: oVirt
Component: Gluster
Version: 4.30.46
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Ritesh Chikatwar
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-26 18:47 UTC by Jürgen Walch
Modified: 2021-11-25 07:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 06:06:32 UTC
oVirt Team: Gluster
Embargoed:


Attachments (Terms of Use)
Excerpt from vdsm.log (5.59 KB, text/plain)
2021-02-26 18:47 UTC, Jürgen Walch
no flags Details

Description Jürgen Walch 2021-02-26 18:47:12 UTC
Created attachment 1759567 [details]
Excerpt from vdsm.log

Description of problem:

We are running an oVirt 4.3.10 production cluster with 9 hosts and 5 datastore domains, 4 of which are Gluster domains.
 
oVirt showed hourly error messages

	26.02.2021 02:00:48 VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: ()
	26.02.2021 02:00:48 Failed to update OVF disks 5aa438e3-8d22-4b6c-bccf-a843151ca0be, OVF data isn't updated on those OVF stores (Data Center datacenter01, Storage Domain vmstore13).
	26.02.2021 02:00:48 Failed to update VMs/Templates OVF data for Storage Domain vmstore13 in Data Center datacenter01.

Only one domain ("vmstore13") was affected.

Trying to update the OVF's manually from the engine web-gui lead to the same result. The vm's with discs on the affected domain were running fine, snapshots were working. I tried to move the SPM role to another host, which succeeded, but the error messages persisted.

The vdsm log on the SPM host contained something like

	2021-02-26 03:00:57,701+0100 INFO  (jsonrpc/2) [vdsm.api] START setVolumeDescription(sdUUID=u'9f731135-f5d9-4609-9e3b-fa9cae75e314', spUUID=u'33e8dc9e-8bc8-11ea-bd76-00163e741033',
imgUUID=u'5aa438e3-8d22-4b6c-bccf-a843151ca0be',
volUUID=u'0795e58c-4960-413a-a0b4-e8a6d547fda5',
description=u'{"Updated":false,"Last Updated":"Wed Feb 24
17:48:17 CET 2021","Storage
Domains":[{"uuid":"9f731135-f5d9-4609-9e3b-fa9cae75e314"}],"Disk
Description":"OVF_STORE"}', options=None) from=::ffff:10.70.1.1,46968,
flow_id=1f314676, task_id=9101db01-b4f0-447e-a5a9-b6af76278d55 (api:48)
	2021-02-26 03:00:57,712+0100 ERROR (jsonrpc/2) [storage.VolumeManifest] [Errno 116] Stale file handle (fileVolume:155)

for each error, I have attached the relevant part of vdsm.log.

Finally I managed to fix it by doing a

	cat /rhev/data-center/mnt/glusterSD/10.70.7.17\:_vmstore13/9f731135-f5d9-4609-9e3b-fa9cae75e314/images/5aa438e3-8d22-4b6c-bccf-a843151ca0be/0795e58c-4960-413a-a0b4-e8a6d547fda5.meta

on the host the gluster file system was mounted from (vmhost17, IP 10.70.7.17), got "file not found", repeated the same command, this time successful and the
problem went away.

Version-Release number of selected component (if applicable):

oVirt 4.3.10

How reproducible:

not reproducible

Additional info:

I posted the problem on users and Nir Soffer asked me to file a bug because "we may need to improve storage monitoring with Gluster
to handle [Errno 116] Stale file handle".

Comment 1 RHEL Program Management 2021-06-11 09:36:20 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Gobinda Das 2021-09-06 06:55:29 UTC
Most of the issues we have fixed in ovirt-4.4. So it's always better to upgrade to latest. So I strongly recommend to upgrade to ovirt-4.4.

Comment 3 Ritesh Chikatwar 2021-10-12 11:39:49 UTC
Jurgen,


Do let me know if you are facing the same issue with latest version if yes, please attach engine & vdsm logs.

Comment 4 Ritesh Chikatwar 2021-10-18 06:06:32 UTC
Closing this bug as not able to reproduce and no new info from reporter. Please feel free to re-open this bug if encountered the same issue with newer version.

Comment 5 Jürgen Walch 2021-11-25 07:52:43 UTC
Sorry for the late answer.
We have vot updated to oVirt 4.4 yet and will not for at least a few months


Note You need to log in before you can comment on or make changes to this bug.