Not sure why this BZ was merged without the due process, but setting to MODIFIED to signify that it is, indeed, merged.
Greg, can you please provide the QA with steps to reproduce THIS bug?
(In reply to Allon Mureinik from comment #2) > Greg, can you please provide the QA with steps to reproduce THIS bug? Sure, same steps as the original bug inspiring this one: Steps to Reproduce: 1. Create a VM with several disks including block preallocated and thin and nfs preallocated and thin 2. Start the VM 3. Create 3 snapshots: snsa1, snsa2, snsa3 4. Deleted snapshot snsa2; while the snapshot is locked restarted the vdsm Expected results: If the deletion is successful, then the fix works Actual results: The deletion will fail and disks will be illegal. Attempts to delete the snapshot again will fail.
(In reply to Greg Padgett from comment #3) > [...] Also note that for reproducing this, the type of disk isn't as important as performing multiple deletions.
Created attachment 1051099 [details] Logs01
Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3 steps screenshot and logs attached.
(In reply to Aharon Canan from comment #6) > Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3 > steps > > screenshot and logs attached. Hi Aharon, I see several communication errors (non-responsive host) in the engine log and some storage-related errors in the vdsm log, which leads me to a couple questions: 1) Did the storage come back up as expected after the hosts were up? 2) Did you attempt to remove the snapshot again after the host was back up? I didn't emphasize it much in the steps to reproduce, but the original issue left the snapshots in a state where subsequent removal after failure was impossible. There are some cases (this may be one) where the deletion fails, but it /should/ allow you to remove it after a retry--this is the expected behavior. Knowing more about the test would help determine if this is truly a bug vs an unfortunate but expected failure case. Thanks.
(In reply to Greg Padgett from comment #7) > (In reply to Aharon Canan from comment #6) > > Issue reproduced on vt16.1 (rhevm-3.5.4-1.1.el6ev.noarch) using comment #3 > > steps > > > > screenshot and logs attached. > > Hi Aharon, I see several communication errors (non-responsive host) in the > engine log and some storage-related errors in the vdsm log, which leads me > to a couple questions: > > 1) Did the storage come back up as expected after the hosts were up? Yes > 2) Did you attempt to remove the snapshot again after the host was back up? Yes > > I didn't emphasize it much in the steps to reproduce, but the original issue > left the snapshots in a state where subsequent removal after failure was > impossible. There are some cases (this may be one) where the deletion > fails, but it /should/ allow you to remove it after a retry--this is the > expected behavior. Knowing more about the test would help determine if this > is truly a bug vs an unfortunate but expected failure case. Thanks. Let me know if you want me to try it again.
(In reply to Aharon Canan from comment #8) > (In reply to Greg Padgett from comment #7) > > 1) Did the storage come back up as expected after the hosts were up? > Yes > > 2) Did you attempt to remove the snapshot again after the host was back up? > Yes [...] > Let me know if you want me to try it again. Thanks, so it sounds like there's a fair chance this is something I haven't seen before but the prior logs didn't quite have enough for me to go on. It would be great if you could reproduce it and provide: - steps/details (including # of disks, snapshots, storage type, etc) - engine log - host log - engine db dump; OR point me to the environment where I can poke around a little That should be enough to get started.
Following comments #12 and #13, verified.
RHEV 3.5.4 Released. closing current release.