Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1231535

Summary: VM block SNAPSHOT disks become illegal after failed Live Delete Snapshot Merge
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Greg Padgett <gpadgett>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: acanan, alitke, amureini, ecohen, gklein, gpadgett, kgoldbla, lpeer, lsurette, rbalakri, Rhev-m-bugs, tnisan, yeylon, ylavi
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.4Flags: ylavi: Triaged+
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-engine-3.5.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1213157 Environment:
Last Closed: 2015-09-06 17:09:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1213157    
Bug Blocks:    
Attachments:
Description Flags
Logs01 none

Comment 1 Allon Mureinik 2015-06-14 14:22:19 UTC
Not sure why this BZ was merged without the due process, but setting to MODIFIED to signify that it is, indeed, merged.

Comment 2 Allon Mureinik 2015-07-05 14:32:05 UTC
Greg, can you please provide the QA with steps to reproduce THIS bug?

Comment 3 Greg Padgett 2015-07-09 23:52:02 UTC
(In reply to Allon Mureinik from comment #2)
> Greg, can you please provide the QA with steps to reproduce THIS bug?

Sure, same steps as the original bug inspiring this one:

Steps to Reproduce:
1. Create a VM with several disks including block preallocated and thin and nfs preallocated and thin
2. Start the VM 
3. Create 3 snapshots: snsa1, snsa2, snsa3
4. Deleted snapshot snsa2; while the snapshot is locked restarted the vdsm

Expected results:
If the deletion is successful, then the fix works

Actual results:
The deletion will fail and disks will be illegal.  Attempts to delete the snapshot again will fail.

Comment 4 Greg Padgett 2015-07-09 23:55:45 UTC
(In reply to Greg Padgett from comment #3)
> [...]
Also note that for reproducing this, the type of disk isn't as important as performing multiple deletions.

Comment 5 Aharon Canan 2015-07-12 11:35:24 UTC
Created attachment 1051099 [details]
Logs01

Comment 6 Aharon Canan 2015-07-12 11:35:44 UTC
Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3 steps

screenshot and logs attached.

Comment 7 Greg Padgett 2015-07-16 21:47:23 UTC
(In reply to Aharon Canan from comment #6)
> Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3
> steps
> 
> screenshot and logs attached.

Hi Aharon, I see several communication errors (non-responsive host) in the engine log and some storage-related errors in the vdsm log, which leads me to a couple questions:

1) Did the storage come back up as expected after the hosts were up?
2) Did you attempt to remove the snapshot again after the host was back up?

I didn't emphasize it much in the steps to reproduce, but the original issue left the snapshots in a state where subsequent removal after failure was impossible.  There are some cases (this may be one) where the deletion fails, but it /should/ allow you to remove it after a retry--this is the expected behavior.  Knowing more about the test would help determine if this is truly a bug vs an unfortunate but expected failure case.  Thanks.

Comment 8 Aharon Canan 2015-07-20 14:28:50 UTC
(In reply to Greg Padgett from comment #7)
> (In reply to Aharon Canan from comment #6)
> > Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3
> > steps
> > 
> > screenshot and logs attached.
> 
> Hi Aharon, I see several communication errors (non-responsive host) in the
> engine log and some storage-related errors in the vdsm log, which leads me
> to a couple questions:
> 
> 1) Did the storage come back up as expected after the hosts were up?

Yes

> 2) Did you attempt to remove the snapshot again after the host was back up?

Yes

> 
> I didn't emphasize it much in the steps to reproduce, but the original issue
> left the snapshots in a state where subsequent removal after failure was
> impossible.  There are some cases (this may be one) where the deletion
> fails, but it /should/ allow you to remove it after a retry--this is the
> expected behavior.  Knowing more about the test would help determine if this
> is truly a bug vs an unfortunate but expected failure case.  Thanks.

Let me know if you want me to try it again.

Comment 9 Greg Padgett 2015-07-22 15:43:08 UTC
(In reply to Aharon Canan from comment #8)
> (In reply to Greg Padgett from comment #7)
> > 1) Did the storage come back up as expected after the hosts were up?
> Yes
> > 2) Did you attempt to remove the snapshot again after the host was back up?
> Yes
[...]
> Let me know if you want me to try it again.

Thanks, so it sounds like there's a fair chance this is something I haven't seen before but the prior logs didn't quite have enough for me to go on.  It would be great if you could reproduce it and provide:

- steps/details (including # of disks, snapshots, storage type, etc)
- engine log
- host log
- engine db dump; OR point me to the environment where I can poke around a little

That should be enough to get started.

Comment 15 Aharon Canan 2015-08-19 06:53:20 UTC
Following comments #12 and #13, verified.

Comment 16 Eyal Edri 2015-09-06 17:09:31 UTC
RHEV 3.5.4 Released. closing current release.