Description of problem: Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.Attempt to move the disk of a running vm to a different storage domain Actual results: Automatic snapshot and disk move fails Expected results: Disk moved Additional info:
Created attachment 894450 [details] VDSM Log
Created attachment 894451 [details] destination vdsm log
Maurice can u please also add the relevant engine log. Also the output of "ls -l" at /rhev/data-center/e0e65e47-52c8-41bd-8499-a3e025831215/21484146-1a6c-4a31-896e-da1156888dfc/images in the SPM host. and the output of tree command under /rhev/data-center at the SPM Host.
It seems that the VM did not contain the device with 'domainID': 'e0e65e47-52c8-41bd-8499-a3e025831215', 'volumeID': 'deae7162-1eb7-423e-9115-3e7de542c89c', 'imageID': '21484146-1a6c-4a31-896e-da1156888dfc' (at def _findDriveByUUIDs(self, drive) in vdsm/vm.py)
(In reply to Maor from comment #4) > It seems that the VM did not contain the device with 'domainID': > 'e0e65e47-52c8-41bd-8499-a3e025831215', 'volumeID': > 'deae7162-1eb7-423e-9115-3e7de542c89c', 'imageID': > '21484146-1a6c-4a31-896e-da1156888dfc' (at def _findDriveByUUIDs(self, > drive) in vdsm/vm.py) So where did this operation come from?
Created attachment 895663 [details] EngineSPM
This looks related https://bugzilla.redhat.com/show_bug.cgi?id=1009100
(In reply to Allon Mureinik from comment #5) > (In reply to Maor from comment #4) > > It seems that the VM did not contain the device with 'domainID': > > 'e0e65e47-52c8-41bd-8499-a3e025831215', 'volumeID': > > 'deae7162-1eb7-423e-9115-3e7de542c89c', 'imageID': > > '21484146-1a6c-4a31-896e-da1156888dfc' (at def _findDriveByUUIDs(self, > > drive) in vdsm/vm.py) > > So where did this operation come from? The operation came from live snapshot
Maurice, you are right, this does look much similar to https://bugzilla.redhat.com/show_bug.cgi?id=1009100 Just to be sure, can you attach older VDSM logs. It seems that they are starting from 2014-05-10 but the engine error was from 2014-05-09.
Created attachment 896629 [details] EngineSPM
Created attachment 896630 [details] Engine VDSM Log
Created attachment 896631 [details] destination vdsm log
I attached a fresh set of logs from the source and destination
(In reply to Maurice James from comment #13) > I attached a fresh set of logs from the source and destination Maurice from the engine logs it look that the VM was running on Host Titan: StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=48, mMessage=Snapshot failed]] 2014-05-17 13:50:52,700 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (org.ovirt.thread.pool-6-thread-45) HostName = Titan 2014-05-17 13:50:52,700 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (org.ovirt.thread.pool-6-thread-45) Command SnapshotVDSCommand(HostName = Titan, HostId = 5869805e-5b95-485a-bd8a-07b472d3fcaf, vmId=7f341f92-134a-47e7-b7ed-e7df772806f3) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48 I don't see any log of Titan from the attached files.
Created attachment 896807 [details] SourceVDSM
The vm was running on Titan but the disk is on beetlejuice. VMs move fine, its the disk I'm having issues with (In reply to Maor from comment #14) > (In reply to Maurice James from comment #13) > > I attached a fresh set of logs from the source and destination > > Maurice from the engine logs it look that the VM was running on Host Titan: > StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=48, > mMessage=Snapshot failed]] > 2014-05-17 13:50:52,700 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > (org.ovirt.thread.pool-6-thread-45) HostName = Titan > 2014-05-17 13:50:52,700 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > (org.ovirt.thread.pool-6-thread-45) Command SnapshotVDSCommand(HostName = > Titan, HostId = 5869805e-5b95-485a-bd8a-07b472d3fcaf, > vmId=7f341f92-134a-47e7-b7ed-e7df772806f3) execution failed. Exception: > VDSErrorException: VDSGenericException: VDSErrorException: Failed to > SnapshotVDS, error = Snapshot failed, code = 48 > > I don't see any log of Titan from the attached files.
As part of the disk live storage migration we create a live snapshot for the VM. The live snapshot operation is being done on the HSM that the VM is running on. The error that I see in the logs is of the snapshot command being executed for the VM, so we need to see the error of this operation in VDSM. (In reply to Maurice James from comment #16) > The vm was running on Titan but the disk is on beetlejuice. VMs move fine, > its the disk I'm having issues with > > > > (In reply to Maor from comment #14) > > (In reply to Maurice James from comment #13) > > > I attached a fresh set of logs from the source and destination > > > > Maurice from the engine logs it look that the VM was running on Host Titan: > > StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=48, > > mMessage=Snapshot failed]] > > 2014-05-17 13:50:52,700 INFO > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > (org.ovirt.thread.pool-6-thread-45) HostName = Titan > > 2014-05-17 13:50:52,700 ERROR > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > (org.ovirt.thread.pool-6-thread-45) Command SnapshotVDSCommand(HostName = > > Titan, HostId = 5869805e-5b95-485a-bd8a-07b472d3fcaf, > > vmId=7f341f92-134a-47e7-b7ed-e7df772806f3) execution failed. Exception: > > VDSErrorException: VDSGenericException: VDSErrorException: Failed to > > SnapshotVDS, error = Snapshot failed, code = 48 > > > > I don't see any log of Titan from the attached files.
I uploaded the logs from all 3 servers. Do you need anything else? (In reply to Maor from comment #17) > As part of the disk live storage migration we create a live snapshot for the > VM. > The live snapshot operation is being done on the HSM that the VM is running > on. > The error that I see in the logs is of the snapshot command being executed > for the VM, so we need to see the error of this operation in VDSM. > > (In reply to Maurice James from comment #16) > > The vm was running on Titan but the disk is on beetlejuice. VMs move fine, > > its the disk I'm having issues with > > > > > > > > (In reply to Maor from comment #14) > > > (In reply to Maurice James from comment #13) > > > > I attached a fresh set of logs from the source and destination > > > > > > Maurice from the engine logs it look that the VM was running on Host Titan: > > > StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=48, > > > mMessage=Snapshot failed]] > > > 2014-05-17 13:50:52,700 INFO > > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > > (org.ovirt.thread.pool-6-thread-45) HostName = Titan > > > 2014-05-17 13:50:52,700 ERROR > > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > > (org.ovirt.thread.pool-6-thread-45) Command SnapshotVDSCommand(HostName = > > > Titan, HostId = 5869805e-5b95-485a-bd8a-07b472d3fcaf, > > > vmId=7f341f92-134a-47e7-b7ed-e7df772806f3) execution failed. Exception: > > > VDSErrorException: VDSGenericException: VDSErrorException: Failed to > > > SnapshotVDS, error = Snapshot failed, code = 48 > > > > > > I don't see any log of Titan from the attached files.
Hi Maurice, I don't see the uploaded logs in the bug (In reply to Maurice James from comment #18) > I uploaded the logs from all 3 servers. Do you need anything else? > > > > (In reply to Maor from comment #17) > > As part of the disk live storage migration we create a live snapshot for the > > VM. > > The live snapshot operation is being done on the HSM that the VM is running > > on. > > The error that I see in the logs is of the snapshot command being executed > > for the VM, so we need to see the error of this operation in VDSM. > > > > (In reply to Maurice James from comment #16) > > > The vm was running on Titan but the disk is on beetlejuice. VMs move fine, > > > its the disk I'm having issues with > > > > > > > > > > > > (In reply to Maor from comment #14) > > > > (In reply to Maurice James from comment #13) > > > > > I attached a fresh set of logs from the source and destination > > > > > > > > Maurice from the engine logs it look that the VM was running on Host Titan: > > > > StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=48, > > > > mMessage=Snapshot failed]] > > > > 2014-05-17 13:50:52,700 INFO > > > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > > > (org.ovirt.thread.pool-6-thread-45) HostName = Titan > > > > 2014-05-17 13:50:52,700 ERROR > > > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > > > (org.ovirt.thread.pool-6-thread-45) Command SnapshotVDSCommand(HostName = > > > > Titan, HostId = 5869805e-5b95-485a-bd8a-07b472d3fcaf, > > > > vmId=7f341f92-134a-47e7-b7ed-e7df772806f3) execution failed. Exception: > > > > VDSErrorException: VDSGenericException: VDSErrorException: Failed to > > > > SnapshotVDS, error = Snapshot failed, code = 48 > > > > > > > > I don't see any log of Titan from the attached files.
(In reply to Maor from comment #19) > Hi Maurice, I don't see the uploaded logs in the bug > > (In reply to Maurice James from comment #18) > > I uploaded the logs from all 3 servers. Do you need anything else? > > > > > > > > (In reply to Maor from comment #17) > > > As part of the disk live storage migration we create a live snapshot for the > > > VM. > > > The live snapshot operation is being done on the HSM that the VM is running > > > on. > > > The error that I see in the logs is of the snapshot command being executed > > > for the VM, so we need to see the error of this operation in VDSM. > > > > > > (In reply to Maurice James from comment #16) > > > > The vm was running on Titan but the disk is on beetlejuice. VMs move fine, > > > > its the disk I'm having issues with They are listed as EngineSPM (358.86 KB, application/gzip) 2014-05-17 13:56 EDT, Maurice James no flags Details Engine VDSM Log (930.63 KB, application/gzip) 2014-05-17 13:56 EDT, Maurice James no flags Details destination vdsm log (242.94 KB, application/gzip) 2014-05-17 13:57 EDT, Maurice James no flags Details SourceVDSM (409.10 KB, application/gzip) 2014-05-18 12:05 EDT, Maurice James no flags Details They were uploaded on the 17th and 18th of MAY > > > > > > > > > > > > > > > > (In reply to Maor from comment #14) > > > > > (In reply to Maurice James from comment #13) > > > > > > I attached a fresh set of logs from the source and destination > > > > > > > > > > Maurice from the engine logs it look that the VM was running on Host Titan: > > > > > StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=48, > > > > > mMessage=Snapshot failed]] > > > > > 2014-05-17 13:50:52,700 INFO > > > > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > > > > (org.ovirt.thread.pool-6-thread-45) HostName = Titan > > > > > 2014-05-17 13:50:52,700 ERROR > > > > > [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] > > > > > (org.ovirt.thread.pool-6-thread-45) Command SnapshotVDSCommand(HostName = > > > > > Titan, HostId = 5869805e-5b95-485a-bd8a-07b472d3fcaf, > > > > > vmId=7f341f92-134a-47e7-b7ed-e7df772806f3) execution failed. Exception: > > > > > VDSErrorException: VDSGenericException: VDSErrorException: Failed to > > > > > SnapshotVDS, error = Snapshot failed, code = 48 > > > > > > > > > > I don't see any log of Titan from the attached files.
Hi Francesco, can you please take a look. Do you think this is the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1009100 Thanks, Maor
(In reply to Maor from comment #21) > Hi Francesco, can you please take a look. > Do you think this is the same issue as > https://bugzilla.redhat.com/show_bug.cgi?id=1009100 > > Thanks, > Maor In your test 3.4.1 environment, live migration works fine?
Created attachment 899965 [details] New logs Here are a fresh set of logs from a different system that is having the same problem
(In reply to Maurice James from comment #23) > Created attachment 899965 [details] > New logs > > Here are a fresh set of logs from a different system that is having the same > problem Maor/Francesco, please take a look at this.
I had a look at the logs provided in https://bugzilla.redhat.com/show_bug.cgi?id=1096529#c23 and this definitely doesn't seem the same case as per bz1009100 ashtivh04_vdsm.log:Thread-582288::ERROR::2014-05-28 08:20:29,333::vm::3915::vm.Vm::(snapshot) vmId=`508f2275-50d3-4fb2-a8e6-06e50c87d0d1`::The base volume doesn't exist: {'device': 'disk', 'domainID': 'b7663d70-e658-41fa-b9f0-8da83c9eddce', 'volumeID': '9e298151-23f7-4e46-8bec-71d644967f96', 'imageID': 'babe7494-bce9-4695-b341-fae61715f9e6'} At glance this looks a storage related issue.
forgot to clear my NEEDINFO
I found that all of the VMs that I created that were based on the out of the box "Blank" template would fail live disk migration. I created a new "Default" template and created a VM based on it and was able to live migrate the disk. I need to be able to change the template its based on because I already have close to 30 VMs created based on the old "Blank" template. Is this possible?
(In reply to Maurice James from comment #27) > I found that all of the VMs that I created that were based on the out of the > box "Blank" template would fail live disk migration. I created a new > "Default" template and created a VM based on it and was able to live migrate > the disk. I need to be able to change the template its based on because I > already have close to 30 VMs created based on the old "Blank" template. Is > this possible? AFAIK, this is not possible. Dainel/Maor - lets look into why migrating a disk based on the default template fails?
OK after repeating the stpes that I followed in my prior posts 1. Stop the VM 2. Export it 3. Delete it (the VM not the export) 4. Re-import the VM After following those steps I was able to live migrate the disks without error. I'm not sure why this fixed the problem,but what I can say is that there was a problem with the default "Blank" template. The problem started when i upgraded from 3.3.3 to 3.3.4. This carried over to the version 3.4.x
Perhaps there was a problem with the upgrade script from 3.3.3 -> 3.3.4?
I have one more VM left to export. Is there anything that I should compare to ones that Ive already fixed? Maybe we can get a root cause out of it
This is an automated message: This bug has been re-targeted from 3.4.2 to 3.5.0 since neither priority nor severity were high or urgent. Please re-target to 3.4.3 if relevant.
Maor, Maurice, what's up with this BZ? Are we going anywhere with it?
It could be that the problem is related to what Maurice described, regarding the upgrade process which failed. What was the origin of the problem in the upgrade phase from 3.3.3 to 3.3.4? If you still have this VM which you have trouble to migrate its disks, can u please try to move the disk again and attach also the /var/log/messages and /var/log/libvirt/libvirt.log messages? Does it reproduce on new VMs also?
This won't make oVirt 3.5.0, pushing out to 3.5.0. Any news on the needinfo?
I verified move disk with different scenarios, I suspect this is an upgrade issue got wrong, but for now without the info we can't do much about it. I'm closing the bug for now, please feel free to re-open it once it reproduces and have the right info.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days