Bug 962549 - VM no longer bootable after snapshot removal
Summary: VM no longer bootable after snapshot removal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.1.3
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.2.0
Assignee: Eduardo Warszawski
QA Contact: Elad
URL:
Whiteboard: storage
: 950111 958242 958405 966022 982971 (view as bug list)
Depends On:
Blocks: 966085 966721 970623
TreeView+ depends on / blocked
 
Reported: 2013-05-13 20:39 UTC by Marina Kalinin
Modified: 2022-07-09 07:07 UTC (History)
24 users (show)

Fixed In Version: vdsm-4.10.2-22.0.el6ev sf17.2
Doc Type: Bug Fix
Doc Text:
After upgrading to 3.1, a snapshot of a virtual machine from the older environment can be successfully removed, but the virtual machine would fail to start. This was due to a failure to tear down the snapshot's volume path on the host storage manager prior to merging the snapshot, which left the volume activated on both the storage pool manager and the host storage manager. This update removes unnecessary volume paths and deactivates the snapshot volumes after they are deleted, so virtual machines can run successfully under these conditions.
Clone Of:
: 966085 (view as bug list)
Environment:
Last Closed: 2013-06-10 20:49:14 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47302 0 None None None 2022-07-09 07:07:17 UTC
Red Hat Knowledge Base (Solution) 374693 0 None None None Never
Red Hat Product Errata RHSA-2013:0886 0 normal SHIPPED_LIVE Moderate: rhev 3.2 - vdsm security and bug fix update 2013-06-11 00:25:02 UTC
oVirt gerrit 14869 0 None None None Never

Description Marina Kalinin 2013-05-13 20:39:36 UTC
Description of problem:
VM created from RHEV 2.2 template no longer bootable after snapshot removal on RHEV 3.1.

Version-Release number of selected component (if applicable):
rhevm-3.1.0-50.el6ev
vdsm-4.9.6-44.2.el6_3
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.3
libvirt-0.10.2-18.el6_4.4

How reproducible:
Always on customer environment.

Steps to Reproduce:
1. Take a template created on 2.2 with 1 Preallocated disk
2. Create a VM from template (probably on 3.0).
3. Make a snapshot on your VM - snap1.
4. Upgrade your environment to 3.1, so that iSCSI SD will be V3 type.
5. Create new snapshot snap2.
6. Delete snap1.
7. Snapshot deletion operation succeeds.
  
Actual results:
Cannot boot VM.

Expected results:
VM should boot successfully.

Additional info:
1. This might be the same issue as on bug#958242, but current description does not match, so opening this bug.
2. No error messages on log, except one time on libvirtd.log:
~~~
2013-04-29 23:24:12.375+0000: 9466: error : virNetClientProgramDispatchError:174 : Domain not found: no domain with matching uuid '6cfa4510-e859-4f6d-8012-be0d07bc6498'
2013-04-29 23:24:12.376+0000: 9466: debug : virDomainGetXMLDesc:4379 : dom=0x7f36b8017d70, (VM: name=vm_name, uuid=6cfa4510-e859-4f6d-8012-be0d07bc6498), flags=0
2013-04-29 23:24:12.380+0000: 9466: error : virNetClientProgramDispatchError:174 : Domain not found: no domain with matching uuid '6cfa4510-e859-4f6d-8012-be0d07bc6498'
2013-04-29 23:24:12.380+0000: 9466: debug : virDomainFree:2313 : dom=0x7f3730003c00, (VM: name=vm_name, uuid=6cfa4510-e859-4f6d-8012-be0d07bc6498)
~~~

3. Chain on the database is correct and matching the chain on the actual storage:
blank <- template(raw) <- base_img (cow) <- current_img (cow).
- qemu-img info reports correct chain
- all the lvs are present and available
- rhev metadata for the chain is correct

Please advise next steps.
This occurs to the customer on every single VM matching the scenario.
I will provide logs shortly.

Comment 2 Marina Kalinin 2013-05-13 20:59:37 UTC
More information:

The template:
# qemu-img info /dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/7bedb639-f146-4ce4-aed8-7a40207467fd
image: /dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/7bedb639-f146-4ce4-aed8-7a40207467fd
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 0

lvm info:  7bedb639-f146-4ce4-aed8-7a40207467fd 06fa7078-fff0-4464-8dc1-d6e541d6c9f0    1 -wi-a---  20.00g  -1  -1 253  18                                                     HwDVAD-irc8-U0sv-q1O3-SzL0-py5n-SKfFRd MD_6,PU_00000000-0000-0000-0000-000000000000,IU_995bd003-f6fb-4faf-a35b-9023df2f299a  
dd if=metadata bs=512 skip=6 count=1
DOMAIN=06fa7078-fff0-4464-8dc1-d6e541d6c9f0
VOLTYPE=SHARED
CTIME=1354408764
FORMAT=RAW
IMAGE=995bd003-f6fb-4faf-a35b-9023df2f299a
DISKTYPE=1
PUUID=00000000-0000-0000-0000-000000000000
LEGALITY=LEGAL
MTIME=1354408788
POOL_UUID=
SIZE=41943040
TYPE=PREALLOCATED
DESCRIPTION=_template_4/1/2010 1:26:21 PM_template
EOF

Base disk:
# qemu-img info /dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/cb43ac3e-37e5-402c-be6c-fb3346c99dc0
image:  /dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/cb43ac3e-37e5-402c-be6c-fb3346c99dc0
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 0
cluster_size: 65536
backing file: 
../761f0e7f-5c90-409b-89a4-ca86982aed2d/7bedb639-f146-4ce4-aed8-7a40207467fd (actual path: 
/dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/../761f0e7f-5c90-409b-89a4-ca86982aed2d/7bedb639-f146-4ce4-aed8-7a40207467fd)

lvm info:  cb43ac3e-37e5-402c-be6c-fb3346c99dc0 06fa7078-fff0-4464-8dc1-d6e541d6c9f0    5 -wi-a---  21.00g  -1  -1 253  20                                                     4Rm132-Wrem-dj8u-9kGc-TR9u-GBuu-ZrpKCS MD_16,IU_761f0e7f-5c90-409b-89a4-ca86982aed2d,PU_7bedb639-f146-4ce4-aed8-7a40207467fd 
dd if=metadata bs=512 skip=16 count=1
DOMAIN=06fa7078-fff0-4464-8dc1-d6e541d6c9f0
VOLTYPE=INTERNAL
CTIME=1354414106
FORMAT=COW
IMAGE=761f0e7f-5c90-409b-89a4-ca86982aed2d
DISKTYPE=1
PUUID=7bedb639-f146-4ce4-aed8-7a40207467fd
LEGALITY=LEGAL
MTIME=1367273673
POOL_UUID=
DESCRIPTION=_vm_name_11/6/2012 10:59:40 AM
TYPE=SPARSE
SIZE=41943040
EOF


Last Snapshot:
# qemu-img info /dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/498be3f9-0ab2-4905-9f57-ba53365f5fc4
image: /dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/498be3f9-0ab2-4905-9f57-ba53365f5fc4
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 0
cluster_size: 65536
backing file: 
../761f0e7f-5c90-409b-89a4-ca86982aed2d/cb43ac3e-37e5-402c-be6c-fb3346c99dc0 (actual path: 
/dev/06fa7078-fff0-4464-8dc1-d6e541d6c9f0/../761f0e7f-5c90-409b-89a4-ca86982aed2d/cb43ac3e-37e5-402c-be6c-fb3346c99dc0)

lvm info:  498be3f9-0ab2-4905-9f57-ba53365f5fc4 06fa7078-fff0-4464-8dc1-d6e541d6c9f0    1 -wi-a---   1.00g  -1  -1 253  21                                                     4UrmPb-3ltw-uIuc-Waqm-uJOF-2exS-FmomuC PU_cb43ac3e-37e5-402c-be6c-fb3346c99dc0,MD_109,IU_761f0e7f-5c90-409b-89a4-ca86982aed2d

dd if=metadata bs=512 skip=109 count=1
DOMAIN=06fa7078-fff0-4464-8dc1-d6e541d6c9f0
CTIME=1367273308
FORMAT=COW
DISKTYPE=2
LEGALITY=LEGAL
SIZE=41943040
VOLTYPE=LEAF
DESCRIPTION=
IMAGE=761f0e7f-5c90-409b-89a4-ca86982aed2d
PUUID=cb43ac3e-37e5-402c-be6c-fb3346c99dc0
MTIME=1367273308
POOL_UUID=
TYPE=SPARSE
EOF

Comment 17 Lee Yarwood 2013-05-20 08:23:48 UTC
*** Bug 958242 has been marked as a duplicate of this bug. ***

Comment 18 Lee Yarwood 2013-05-20 08:38:24 UTC
*** Bug 958405 has been marked as a duplicate of this bug. ***

Comment 19 Dan Kenigsberg 2013-05-21 09:24:05 UTC
Luckily, we do not have the offending patch http://gerrit.ovirt.org/13610 in 3.1.z, so no need to clone this bug to there.

Comment 20 Michal Skrivanek 2013-05-21 10:34:28 UTC
how did this become 3.1.5 only? This should have been cloned and kept as 3.2. Though after comment #19 this should only be 3.2, no need for a backport

Comment 21 Lee Yarwood 2013-05-21 11:28:29 UTC
(In reply to Dan Kenigsberg from comment #19)
> Luckily, we do not have the offending patch http://gerrit.ovirt.org/13610 in
> 3.1.z, so no need to clone this bug to there.

I'm not sure where you pulled this commit from Dan, I was able to reproduce this issue with 3.1.4 so this is most definitely an issue in 3.1.z.

Comment 22 Dan Kenigsberg 2013-05-21 15:01:39 UTC
(In reply to Lee Yarwood from comment #21)
> (In reply to Dan Kenigsberg from comment #19)
> > Luckily, we do not have the offending patch http://gerrit.ovirt.org/13610 in
> > 3.1.z, so no need to clone this bug to there.
> 
> I'm not sure where you pulled this commit from Dan, I was able to reproduce
> this issue with 3.1.4 so this is most definitely an issue in 3.1.z.

Lee, you are correct. My blaming of change 13610 was premature at best. The passing of objects to teardownVolumePath() (instead of dicts) was introduced back in http://gerrit.ovirt.org/1123/ which is included in rhev-3.1.4.

Comment 25 Ayal Baron 2013-05-23 20:32:31 UTC
*** Bug 966022 has been marked as a duplicate of this bug. ***

Comment 26 Lee Yarwood 2013-05-28 15:01:42 UTC
As requested by QE the steps I used to reproduce this on 3.1.4 are (as listed in comments #9 & #11 ):

- Create a guest on a block storage domain.
- Create a snapshot, can be either online or offline.
- Run this guest on a HSM host (non SPM).
- Shutdown the guest.
- Merge the snapshot.
- Attempt to run the guest on the same HSM host.

The fix should ensure that the LVs for the guest are deactivated once the guest is shutdown thus refreshing the dmsetup table mappings once it is started and these LVs are activated.

Lee

Comment 27 Elad 2013-05-30 09:50:05 UTC
Verified according to lee's steps


VM is bootable after snapshot removal and the LVs are deactivated.
dmsetup table is being updated once the guest start.

Checked on RHVM 3.2 (SF17.2):
rhevm-3.2.0-11.29.el6ev.noarch
vdsm-4.10.2-22.0.el6ev.x86_64


before snapshot creation:
[root@nott-vds1 ~]# dmsetup table |grep 4559672f
4559672f--7163--4bc7--9eee--8b2143f5e4fb-metadata: 0 1048576 linear 253:5 264192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-master: 0 2097152 linear 253:5 6293504
4559672f--7163--4bc7--9eee--8b2143f5e4fb-inbox: 0 262144 linear 253:5 5769216
4559672f--7163--4bc7--9eee--8b2143f5e4fb-outbox: 0 262144 linear 253:5 6031360
4559672f--7163--4bc7--9eee--8b2143f5e4fb-2670a99e--5b4c--4766--af35--13892e593a9a: 0 2097152 linear 253:5 98568192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-ids: 0 262144 linear 253:5 5507072
4559672f--7163--4bc7--9eee--8b2143f5e4fb-3d0978d8--2842--459f--a73e--14151873f1ac: 0 41943040 linear 253:5 54528000
4559672f--7163--4bc7--9eee--8b2143f5e4fb-leases: 0 4194304 linear 253:5 1312768



after snapshot creation:
[root@nott-vds1 ~]# dmsetup table |grep 4559672f
4559672f--7163--4bc7--9eee--8b2143f5e4fb-metadata: 0 1048576 linear 253:5 264192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-master: 0 2097152 linear 253:5 6293504
4559672f--7163--4bc7--9eee--8b2143f5e4fb-inbox: 0 262144 linear 253:5 5769216
4559672f--7163--4bc7--9eee--8b2143f5e4fb-outbox: 0 262144 linear 253:5 6031360
4559672f--7163--4bc7--9eee--8b2143f5e4fb-f7c453b4--5b2e--484d--a266--0c2901c826ec: 0 2097152 linear 253:5 100665344
4559672f--7163--4bc7--9eee--8b2143f5e4fb-2670a99e--5b4c--4766--af35--13892e593a9a: 0 2097152 linear 253:5 98568192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-ids: 0 262144 linear 253:5 5507072
4559672f--7163--4bc7--9eee--8b2143f5e4fb-3d0978d8--2842--459f--a73e--14151873f1ac: 0 41943040 linear 253:5 54528000
4559672f--7163--4bc7--9eee--8b2143f5e4fb-leases: 0 4194304 linear 253:5 1312768



after shutdown to vm:

[root@nott-vds1 ~]# dmsetup table |grep 4559672f
4559672f--7163--4bc7--9eee--8b2143f5e4fb-metadata: 0 1048576 linear 253:5 264192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-master: 0 2097152 linear 253:5 6293504
4559672f--7163--4bc7--9eee--8b2143f5e4fb-inbox: 0 262144 linear 253:5 5769216
4559672f--7163--4bc7--9eee--8b2143f5e4fb-outbox: 0 262144 linear 253:5 6031360
4559672f--7163--4bc7--9eee--8b2143f5e4fb-ids: 0 262144 linear 253:5 5507072
4559672f--7163--4bc7--9eee--8b2143f5e4fb-leases: 0 4194304 linear 253:5 1312768


after snapshot merge and start to vm:


[root@nott-vds1 ~]# dmsetup table |grep 4559672f
4559672f--7163--4bc7--9eee--8b2143f5e4fb-metadata: 0 1048576 linear 253:5 264192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-master: 0 2097152 linear 253:5 6293504
4559672f--7163--4bc7--9eee--8b2143f5e4fb-inbox: 0 262144 linear 253:5 5769216
4559672f--7163--4bc7--9eee--8b2143f5e4fb-outbox: 0 262144 linear 253:5 6031360
4559672f--7163--4bc7--9eee--8b2143f5e4fb-ids: 0 262144 linear 253:5 5507072
4559672f--7163--4bc7--9eee--8b2143f5e4fb-18159c58--1f63--44dd--b158--ced27842b566: 0 41943040 linear 253:4 264192
4559672f--7163--4bc7--9eee--8b2143f5e4fb-leases: 0 4194304 linear 253:5 1312768

Comment 28 Lee Yarwood 2013-06-04 15:36:47 UTC
*** Bug 950111 has been marked as a duplicate of this bug. ***

Comment 30 errata-xmlrpc 2013-06-10 20:49:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html

Comment 31 Allon Mureinik 2013-07-25 08:07:36 UTC
*** Bug 982971 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.