Bug 1231535 - VM block SNAPSHOT disks become illegal after failed Live Delete Snapshot Merge
Summary: VM block SNAPSHOT disks become illegal after failed Live Delete Snapshot Merge
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.1
Hardware: x86_64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.5.4
Assignee: Greg Padgett
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On: 1213157
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-14 14:16 UTC by rhev-integ
Modified: 2016-02-10 18:12 UTC (History)
14 users (show)

Fixed In Version: ovirt-engine-3.5.4
Doc Type: Bug Fix
Doc Text:
Clone Of: 1213157
Environment:
Last Closed: 2015-09-06 17:09:31 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
ylavi: Triaged+


Attachments (Terms of Use)
Logs01 (1.15 MB, application/x-gzip)
2015-07-12 11:35 UTC, Aharon Canan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 40228 0 master NEW core: Fix errors for retry of failed DestroyImageCommand Never
oVirt gerrit 40453 0 None NEW core: Fix errors for retry of failed DestroyImageCommand Never

Comment 1 Allon Mureinik 2015-06-14 14:22:19 UTC
Not sure why this BZ was merged without the due process, but setting to MODIFIED to signify that it is, indeed, merged.

Comment 2 Allon Mureinik 2015-07-05 14:32:05 UTC
Greg, can you please provide the QA with steps to reproduce THIS bug?

Comment 3 Greg Padgett 2015-07-09 23:52:02 UTC
(In reply to Allon Mureinik from comment #2)
> Greg, can you please provide the QA with steps to reproduce THIS bug?

Sure, same steps as the original bug inspiring this one:

Steps to Reproduce:
1. Create a VM with several disks including block preallocated and thin and nfs preallocated and thin
2. Start the VM 
3. Create 3 snapshots: snsa1, snsa2, snsa3
4. Deleted snapshot snsa2; while the snapshot is locked restarted the vdsm

Expected results:
If the deletion is successful, then the fix works

Actual results:
The deletion will fail and disks will be illegal.  Attempts to delete the snapshot again will fail.

Comment 4 Greg Padgett 2015-07-09 23:55:45 UTC
(In reply to Greg Padgett from comment #3)
> [...]
Also note that for reproducing this, the type of disk isn't as important as performing multiple deletions.

Comment 5 Aharon Canan 2015-07-12 11:35:24 UTC
Created attachment 1051099 [details]
Logs01

Comment 6 Aharon Canan 2015-07-12 11:35:44 UTC
Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3 steps

screenshot and logs attached.

Comment 7 Greg Padgett 2015-07-16 21:47:23 UTC
(In reply to Aharon Canan from comment #6)
> Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3
> steps
> 
> screenshot and logs attached.

Hi Aharon, I see several communication errors (non-responsive host) in the engine log and some storage-related errors in the vdsm log, which leads me to a couple questions:

1) Did the storage come back up as expected after the hosts were up?
2) Did you attempt to remove the snapshot again after the host was back up?

I didn't emphasize it much in the steps to reproduce, but the original issue left the snapshots in a state where subsequent removal after failure was impossible.  There are some cases (this may be one) where the deletion fails, but it /should/ allow you to remove it after a retry--this is the expected behavior.  Knowing more about the test would help determine if this is truly a bug vs an unfortunate but expected failure case.  Thanks.

Comment 8 Aharon Canan 2015-07-20 14:28:50 UTC
(In reply to Greg Padgett from comment #7)
> (In reply to Aharon Canan from comment #6)
> > Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3
> > steps
> > 
> > screenshot and logs attached.
> 
> Hi Aharon, I see several communication errors (non-responsive host) in the
> engine log and some storage-related errors in the vdsm log, which leads me
> to a couple questions:
> 
> 1) Did the storage come back up as expected after the hosts were up?

Yes

> 2) Did you attempt to remove the snapshot again after the host was back up?

Yes

> 
> I didn't emphasize it much in the steps to reproduce, but the original issue
> left the snapshots in a state where subsequent removal after failure was
> impossible.  There are some cases (this may be one) where the deletion
> fails, but it /should/ allow you to remove it after a retry--this is the
> expected behavior.  Knowing more about the test would help determine if this
> is truly a bug vs an unfortunate but expected failure case.  Thanks.

Let me know if you want me to try it again.

Comment 9 Greg Padgett 2015-07-22 15:43:08 UTC
(In reply to Aharon Canan from comment #8)
> (In reply to Greg Padgett from comment #7)
> > 1) Did the storage come back up as expected after the hosts were up?
> Yes
> > 2) Did you attempt to remove the snapshot again after the host was back up?
> Yes
[...]
> Let me know if you want me to try it again.

Thanks, so it sounds like there's a fair chance this is something I haven't seen before but the prior logs didn't quite have enough for me to go on.  It would be great if you could reproduce it and provide:

- steps/details (including # of disks, snapshots, storage type, etc)
- engine log
- host log
- engine db dump; OR point me to the environment where I can poke around a little

That should be enough to get started.

Comment 15 Aharon Canan 2015-08-19 06:53:20 UTC
Following comments #12 and #13, verified.

Comment 16 Eyal Edri 2015-09-06 17:09:31 UTC
RHEV 3.5.4 Released. closing current release.


Note You need to log in before you can comment on or make changes to this bug.