1692230 – Snapshot creation seemed to succeed, but metadata not created:

Bug 1692230 - Snapshot creation seemed to succeed, but metadata not created:

Summary: Snapshot creation seemed to succeed, but metadata not created:

Keywords:
Status:	CLOSED DUPLICATE of bug 1553133
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.2.8-2
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.4.0
Target Release:	---
Assignee:	Fedor Gavrilov
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-25 05:43 UTC by Marcus West
Modified:	2023-10-06 18:11 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-06 01:43:35 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Description Marcus West 2019-03-25 05:43:22 UTC

## Description of problem:

Every evening, customer snapshosts all VM's as part of Commvault backup process.  We noticed a one-off failure to delete a snapshot.  On closer inspection, the snapshot creation initially reported to succeed, but we did see the following error in the logs:

  2019-03-22 01:19:31,835+13 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [...] Failed building DiskImage: candidate can not be null please use static method createGuidFromString
  2019-03-22 01:19:31,836+13 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [...] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand' return value '
  VolumeInfoReturn:{status='Status [code=0, message=Done]'}
  status = INVALID
  truesize = 0
  apparentsize = 0
  children:
  []
  '

The snapshot delete was failing as the metadata slot was empty:

  NONE=####################################################################....

## Version-Release number of selected component (if applicable):

ovirt-engine-4.2.8.2-0.1.el7ev.noarch  Tue Feb 12 09:58:45 2019
vdsm-4.20.46-1.el7ev.x86_64            Thu Jan 17 07:00:24 2019 (rhvh--4.2.8.0--0.20190116)


## How reproducible:

This environment does multiple snapshots through the evening (Commvault), but we have only one we have noticed so far.

## Steps to Reproduce:
1. take a snapshot to do a backup (Commvault)
2. delete snapshot after backup is done
3.

## Actual results:

Snapshot 'succeeds' but metadata information is not there

## Expected results:

If snapshot volume create failed, snapshot should be rolled back

## Additional info:

Why did snapshot create go ahead even though the voulume create failed? 
 
What happened to the metadata?  Even if this is a one-off storage issue, engine should be robust enough to roll back the snapshot from a failed VolumeCreate

Comment 2 Germano Veit Michel 2019-03-25 05:57:51 UTC

I was checking this with Marcus, we seem to have 2 problems here:

1) The metadata for the volume was empty ~20 seconds after volume creation (GetVolumeInfo at 01:19:31 shows empty metadata, volume just created at 01:19:07). So most likely this metadata was never written.

2) Engine went ahead with the snapshotVdsCommand even after seeing the empty metadata for the created volume. I don't think SnapshotVDSCommand should have been sent because GetVolumeInfoVDSCommand showed a bad volume.

Regarding 1, there seem to be something wrong with the LVM metadata of the SD, we see some random and weird failures in the logs. Its a big SD, 50T and 1300LVs with a lot of fragmentation.

Comment 8 Daniel Gur 2019-08-28 13:13:04 UTC

sync2jira

Comment 9 Daniel Gur 2019-08-28 13:17:16 UTC

sync2jira

Comment 10 Marcus West 2020-05-06 01:43:35 UTC

This is most likely related to BZ#1553133, which is fixed in vdsm-4.40.7

If we see this symptom again, we'll re-open and attach relevant logs.

*** This bug has been marked as a duplicate of bug 1553133 ***

Note You need to log in before you can comment on or make changes to this bug.