Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1231535

Summary:

VM block SNAPSHOT disks become illegal after failed Live Delete Snapshot Merge

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

rhev-integ

Component:

ovirt-engine

Assignee:

Greg Padgett <gpadgett>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Aharon Canan <acanan>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

3.5.1

CC:

acanan, alitke, amureini, ecohen, gklein, gpadgett, kgoldbla, lpeer, lsurette, rbalakri, Rhev-m-bugs, tnisan, yeylon, ylavi

Target Milestone:

---

Keywords:

ZStream

Target Release:

3.5.4

Flags:

ylavi: Triaged+

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

storage

Fixed In Version:

ovirt-engine-3.5.4

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

1213157

Environment:

Last Closed:

2015-09-06 17:09:31 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1213157

Bug Blocks:

Attachments:

Description	Flags
Logs01	none

Comment 1 Allon Mureinik 2015-06-14 14:22:19 UTC

Not sure why this BZ was merged without the due process, but setting to MODIFIED to signify that it is, indeed, merged.

Comment 2 Allon Mureinik 2015-07-05 14:32:05 UTC

Greg, can you please provide the QA with steps to reproduce THIS bug?

Comment 3 Greg Padgett 2015-07-09 23:52:02 UTC

(In reply to Allon Mureinik from comment #2)
> Greg, can you please provide the QA with steps to reproduce THIS bug?

Sure, same steps as the original bug inspiring this one:

Steps to Reproduce:
1. Create a VM with several disks including block preallocated and thin and nfs preallocated and thin
2. Start the VM 
3. Create 3 snapshots: snsa1, snsa2, snsa3
4. Deleted snapshot snsa2; while the snapshot is locked restarted the vdsm

Expected results:
If the deletion is successful, then the fix works

Actual results:
The deletion will fail and disks will be illegal.  Attempts to delete the snapshot again will fail.

Comment 4 Greg Padgett 2015-07-09 23:55:45 UTC

(In reply to Greg Padgett from comment #3)
> [...]
Also note that for reproducing this, the type of disk isn't as important as performing multiple deletions.

Comment 5 Aharon Canan 2015-07-12 11:35:24 UTC

Created attachment 1051099 [details]
Logs01

Comment 6 Aharon Canan 2015-07-12 11:35:44 UTC

Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3 steps

screenshot and logs attached.

Comment 7 Greg Padgett 2015-07-16 21:47:23 UTC

(In reply to Aharon Canan from comment #6)
> Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3
> steps
> 
> screenshot and logs attached.

Hi Aharon, I see several communication errors (non-responsive host) in the engine log and some storage-related errors in the vdsm log, which leads me to a couple questions:

1) Did the storage come back up as expected after the hosts were up?
2) Did you attempt to remove the snapshot again after the host was back up?

I didn't emphasize it much in the steps to reproduce, but the original issue left the snapshots in a state where subsequent removal after failure was impossible.  There are some cases (this may be one) where the deletion fails, but it /should/ allow you to remove it after a retry--this is the expected behavior.  Knowing more about the test would help determine if this is truly a bug vs an unfortunate but expected failure case.  Thanks.

Comment 8 Aharon Canan 2015-07-20 14:28:50 UTC

(In reply to Greg Padgett from comment #7)
> (In reply to Aharon Canan from comment #6)
> > Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3
> > steps
> > 
> > screenshot and logs attached.
> 
> Hi Aharon, I see several communication errors (non-responsive host) in the
> engine log and some storage-related errors in the vdsm log, which leads me
> to a couple questions:
> 
> 1) Did the storage come back up as expected after the hosts were up?

Yes

> 2) Did you attempt to remove the snapshot again after the host was back up?

Yes

> 
> I didn't emphasize it much in the steps to reproduce, but the original issue
> left the snapshots in a state where subsequent removal after failure was
> impossible.  There are some cases (this may be one) where the deletion
> fails, but it /should/ allow you to remove it after a retry--this is the
> expected behavior.  Knowing more about the test would help determine if this
> is truly a bug vs an unfortunate but expected failure case.  Thanks.

Let me know if you want me to try it again.

Comment 9 Greg Padgett 2015-07-22 15:43:08 UTC

(In reply to Aharon Canan from comment #8)
> (In reply to Greg Padgett from comment #7)
> > 1) Did the storage come back up as expected after the hosts were up?
> Yes
> > 2) Did you attempt to remove the snapshot again after the host was back up?
> Yes
[...]
> Let me know if you want me to try it again.

Thanks, so it sounds like there's a fair chance this is something I haven't seen before but the prior logs didn't quite have enough for me to go on.  It would be great if you could reproduce it and provide:

- steps/details (including # of disks, snapshots, storage type, etc)
- engine log
- host log
- engine db dump; OR point me to the environment where I can poke around a little

That should be enough to get started.

Comment 15 Aharon Canan 2015-08-19 06:53:20 UTC

Following comments #12 and #13, verified.

Comment 16 Eyal Edri 2015-09-06 17:09:31 UTC

RHEV 3.5.4 Released. closing current release.