Bug 1259467

Summary: Migration issues Importing Storage Domain no more VM
Product: [oVirt] ovirt-engine Reporter: Alain <avondra>
Component: BLL.StorageAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, amureini, avondra, bugs, ecohen, lsurette, mgoldboi, mlipchuk, rbalakri, tnisan, yeylon
Target Milestone: ovirt-3.6.0-rcFlags: rule-engine: ovirt-3.6.0+
ylavi: planning_ack+
amureini: devel_ack+
rule-engine: testing_ack+
Target Release: 3.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-04 11:17:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Engine.log
none
vdsm logs from Hypervisor 1
none
vdsm logs from Hypervisor 2
none
Engine.log from the new oVirt Manager 3.5.3 none

Description Alain 2015-09-02 16:27:51 UTC
Description of problem:
After a migration from an hosted-engine oVirt 3.5.0-1 to a new physical server with oVirt 3.5.3, all the work seems to be OK, creation of the DC, the cluster, the Hosts, the of import all the iSCSI Storage domains (containing all my VMs), e, but all the VMs from the Storage Domains has disappeared...
The only Storage showing an Import VM was on a Test Storage Domain of 
10 Go with a test Vm created some days ago.
All VMs created when the DC version was 3.4 disappeared 
during the import.
For info, the DC was migrated on 3.5 for about 1 year without any trouble, but something is not correct in my config.
In addition, I can't update my oVirt 3.5.0-1 to any newer version, I always have issues preventing the update.


Version-Release number of selected component (if applicable):
oVirt 3.5.0-1 on CentOS 6.7

How reproducible:
On my environment unlikely

Steps to Reproduce:
1. Using oVirt 3.5.0-1
2. Install another server with oVirt 3.5.3 with CentOS 7.1
3. Import iSCSI Storage Domains 
4. Verify the presence of VM in the "Import VM" tag

Actual results:
No VMs in the "Import VM" tag

Expected results:
All the Vms created on the Storage Domains are present and importable

Additional info:

Comment 1 Tal Nisan 2015-09-03 07:08:02 UTC
Maor, have a look asap please, seems like unregistered entities perhaps?

Comment 2 Alain 2015-09-03 07:35:37 UTC
Yes Tal, It seems something like that.
Do I upload vdsm.log or engine.log ?

Comment 3 Maor 2015-09-03 07:40:38 UTC
Hi Alain,

There are no logs, can you please upload them again.
also from the previous destroyed engine

Thanks,
Maor

Comment 4 Maor 2015-09-03 07:43:35 UTC
Alain,

I'm not sure that I understood from your description.
Was your DC was 3.4 or 3.5 version? what kind of configuration problem did you had?

Comment 5 Alain 2015-09-03 07:52:54 UTC
Maor,
My DC was migrated to 3.5 about one year ago from 3.4 (maybe I've done something wrong...).
When I want to migrate the oVirt manager to a new one, I can't import VMs from the Storage Domains previously attached to my actual DC in 3.5 version.
Do you need the entire engine and vdsm logs or only part of them (i.e with ERROR or WARNING tags) ?

Comment 6 Maor 2015-09-03 07:59:47 UTC
(In reply to Alain from comment #5)
> Maor,
> My DC was migrated to 3.5 about one year ago from 3.4 (maybe I've done
> something wrong...).

What do you mean by migrate?
Can you please describe the steps of this 3.5 upgrade?
Do you still have the logs from this upgrade?
Will it be possible to reproduce this and send the logs?

> When I want to migrate the oVirt manager to a new one, I can't import VMs
> from the Storage Domains previously attached to my actual DC in 3.5 version.
> Do you need the entire engine and vdsm logs or only part of them (i.e with
> ERROR or WARNING tags) ?

Please attach the full engine and vdsm logs (from the destroyed environment and the current environment)

Comment 7 Alain 2015-09-03 08:08:07 UTC
I've updated the DC via the Webmin Portal selecting the DC and "Edit Data Center" -> "Compatibility Version" and choosen "3.5".
Should I forgot something ?

Comment 8 Alain 2015-09-03 08:42:13 UTC
Created attachment 1069657 [details]
Engine.log

These logs are coming from the actual manager oVirt 3.5.0-1.
It actually runs as hosted engine as dezcribed in my article.

Comment 9 Alain 2015-09-03 08:48:57 UTC
Created attachment 1069658 [details]
vdsm logs from Hypervisor 1

These logs are actual logs, I don't have the logs of August 28th, date of the migration try, because I needed to restore the hypervisor from a 27th Acronis Backup.

Comment 10 Alain 2015-09-03 08:51:45 UTC
Created attachment 1069659 [details]
vdsm logs from Hypervisor 2

These logs are actual logs, As for the hypervisor 1 I don't have the logs of August 28th, date of the migration try, because I needed to restore the hypervisor from a 27th Acronis Backup.

Comment 11 Alain 2015-09-03 08:57:56 UTC
Created attachment 1069660 [details]
Engine.log from the new oVirt Manager 3.5.3

These logs come from the (almost) new manager with oVirt 3.5.3 who failed to import VMs from the Storage Domain imported.
Thes logs were generated during the migration on August 28th started at 01 PM

Comment 12 Maor 2015-09-03 09:40:33 UTC
Thanks for the logs.

It looks like that in your previous setup you have used 4 Storage Domains which had OVF_STORE disk, their ids are:
0fec0486-7863-49bc-a4ab-d2c7ac48258a
1f6dec51-12a6-41ed-9d14-8f0ad4e062d2
7e40772a-fe94-4fb2-94c4-6198bed04a6a
d7b9d7cc-f7d6-43c7-ae13-e720951657c9

It also looks that the VMs are fetched from the OVF_STORE disks as well like:
 "[1700470f] Retrieve OVF Entity from storage domain ID 7e40772a-fe94-4fb2-94c4-6198bed04a6a for entity ID 82d1653d-78ad-4859-b9af-8fb02bfdae15, entity name unc-srv-qual03 and VM Type of VM"

Can you please point me out to a specific Storage Domain that doesn't provide you the VMs or Templates to import? and if you remember also a name of a specific VM which you wanted to register?

Comment 13 Alain 2015-09-03 10:12:24 UTC
(In reply to Maor from comment #12)
> Thanks for the logs.
> 
> It looks like that in your previous setup you have used 4 Storage Domains
> which had OVF_STORE disk, their ids are:
> 0fec0486-7863-49bc-a4ab-d2c7ac48258a
> 1f6dec51-12a6-41ed-9d14-8f0ad4e062d2
> 7e40772a-fe94-4fb2-94c4-6198bed04a6a
> d7b9d7cc-f7d6-43c7-ae13-e720951657c9
> 
> It also looks that the VMs are fetched from the OVF_STORE disks as well like:
>  "[1700470f] Retrieve OVF Entity from storage domain ID
> 7e40772a-fe94-4fb2-94c4-6198bed04a6a for entity ID
> 82d1653d-78ad-4859-b9af-8fb02bfdae15, entity name unc-srv-qual03 and VM Type
> of VM"
> 
> Can you please point me out to a specific Storage Domain that doesn't
> provide you the VMs or Templates to import? and if you remember also a name
> of a specific VM which you wanted to register?

For instance the VOL-UNC-PROD-02 and the VM unc-srv-ad1

Comment 14 Maor 2015-09-03 10:28:49 UTC
It looks like your Hosts were running when doing the recover, so what happened is that the VM unc-srv-ad1 has been running as an external VM.
Before doing the recover of your setup, the Hosts must be rebooted as mentioned in the documentation.

It looks that you tried to import it but failed since there was a running external VM:
"2015-08-28 15:43:38,546 WARN  [org.ovirt.engine.core.bll.ImportVmFromConfigurationCommand] (ajp--127.0.0.1-8702-4) [5ef73a48] CanDoAction of action ImportVmFromConfiguration failed for user admin@internal. Reasons: VAR__ACTION__IMPORT,VAR__TYPE__VM,VM_CANNOT_IMPORT_VM_EXISTS,$VmName external-unc-srv-ad1"

I would suggest you to try to attach this Storage Domain to a new setup, using new Hosts (or rebooted hosts) and try to register this VM once again

Comment 15 Alain 2015-09-03 10:48:18 UTC
(In reply to Maor from comment #14)
> It looks like your Hosts were running when doing the recover, so what
> happened is that the VM unc-srv-ad1 has been running as an external VM.
> Before doing the recover of your setup, the Hosts must be rebooted as
> mentioned in the documentation.
> 
> It looks that you tried to import it but failed since there was a running
> external VM:
> "2015-08-28 15:43:38,546 WARN 
> [org.ovirt.engine.core.bll.ImportVmFromConfigurationCommand]
> (ajp--127.0.0.1-8702-4) [5ef73a48] CanDoAction of action
> ImportVmFromConfiguration failed for user admin@internal. Reasons:
> VAR__ACTION__IMPORT,VAR__TYPE__VM,VM_CANNOT_IMPORT_VM_EXISTS,$VmName
> external-unc-srv-ad1"
> 
> I would suggest you to try to attach this Storage Domain to a new setup,
> using new Hosts (or rebooted hosts) and try to register this VM once again

That's right I did'nt reboot the hosts, where is the documentation about that ?
I saw effectively, some Vms with the prefix external., but do you think it's normal that I don't see any VMs in the "Import VM" tag of the Storage Doamin ?

Comment 16 Maor 2015-09-03 10:59:09 UTC
(In reply to Alain from comment #15)
> (In reply to Maor from comment #14)
> > It looks like your Hosts were running when doing the recover, so what
> > happened is that the VM unc-srv-ad1 has been running as an external VM.
> > Before doing the recover of your setup, the Hosts must be rebooted as
> > mentioned in the documentation.
> > 
> > It looks that you tried to import it but failed since there was a running
> > external VM:
> > "2015-08-28 15:43:38,546 WARN 
> > [org.ovirt.engine.core.bll.ImportVmFromConfigurationCommand]
> > (ajp--127.0.0.1-8702-4) [5ef73a48] CanDoAction of action
> > ImportVmFromConfiguration failed for user admin@internal. Reasons:
> > VAR__ACTION__IMPORT,VAR__TYPE__VM,VM_CANNOT_IMPORT_VM_EXISTS,$VmName
> > external-unc-srv-ad1"
> > 
> > I would suggest you to try to attach this Storage Domain to a new setup,
> > using new Hosts (or rebooted hosts) and try to register this VM once again
> 
> That's right I did'nt reboot the hosts, where is the documentation about
> that ?

see http://www.ovirt.org/Features/ImportStorageDomain#Restrictions:
"In a disaster recovery scenario, if the Host, which the user about to use, was in the environment which was destroyed, it is recommended to reboot this Host before adding it to the new setup. The reason for that is first, to kill any qemu processes which are still running and might be automatically be added as VMs into the new setup, and also to avoid any sanlock issues."


> I saw effectively, some Vms with the prefix external., but do you think it's
> normal that I don't see any VMs in the "Import VM" tag of the Storage Doamin
> ?

weird, in the logs it looks like you were trying to import them, can you please try to attach this Storage Domain to a new setup with rebooted hosts, and let me know if you still don't see those VMs in the import subtab?

Comment 17 Alain 2015-09-03 12:30:40 UTC
(In reply to Maor from comment #16)
> (In reply to Alain from comment #15)
> > (In reply to Maor from comment #14)
> > > It looks like your Hosts were running when doing the recover, so what
> > > happened is that the VM unc-srv-ad1 has been running as an external VM.
> > > Before doing the recover of your setup, the Hosts must be rebooted as
> > > mentioned in the documentation.

I am sorry for my last question, of course I know where is the documentation :-)
Just to complete, I must rebbot the host after the installation in the DC or is it better to reboot also the host before creating its in the new DC ?


> > > 
> > > It looks that you tried to import it but failed since there was a running
> > > external VM:
> > > "2015-08-28 15:43:38,546 WARN 
> > > [org.ovirt.engine.core.bll.ImportVmFromConfigurationCommand]
> > > (ajp--127.0.0.1-8702-4) [5ef73a48] CanDoAction of action
> > > ImportVmFromConfiguration failed for user admin@internal. Reasons:
> > > VAR__ACTION__IMPORT,VAR__TYPE__VM,VM_CANNOT_IMPORT_VM_EXISTS,$VmName
> > > external-unc-srv-ad1"
> > > 
> > > I would suggest you to try to attach this Storage Domain to a new setup,
> > > using new Hosts (or rebooted hosts) and try to register this VM once again
> > 
> > That's right I did'nt reboot the hosts, where is the documentation about
> > that ?
> 
> see http://www.ovirt.org/Features/ImportStorageDomain#Restrictions:
> "In a disaster recovery scenario, if the Host, which the user about to use,
> was in the environment which was destroyed, it is recommended to reboot this
> Host before adding it to the new setup. The reason for that is first, to
> kill any qemu processes which are still running and might be automatically
> be added as VMs into the new setup, and also to avoid any sanlock issues."
> 
> 
> > I saw effectively, some Vms with the prefix external., but do you think it's
> > normal that I don't see any VMs in the "Import VM" tag of the Storage Doamin
> > ?
> 
> weird, in the logs it looks like you were trying to import them, can you
> please try to attach this Storage Domain to a new setup with rebooted hosts,
> and let me know if you still don't see those VMs in the import subtab?

Comment 18 Maor 2015-09-04 07:21:55 UTC
(In reply to Alain from comment #17)
> (In reply to Maor from comment #16)
> > (In reply to Alain from comment #15)
> > > (In reply to Maor from comment #14)
> > > > It looks like your Hosts were running when doing the recover, so what
> > > > happened is that the VM unc-srv-ad1 has been running as an external VM.
> > > > Before doing the recover of your setup, the Hosts must be rebooted as
> > > > mentioned in the documentation.
> 
> I am sorry for my last question, of course I know where is the documentation
> :-)
> Just to complete, I must rebbot the host after the installation in the DC or
> is it better to reboot also the host before creating its in the new DC ?

It is better to reboot the Hosts just before you add them to the new setup

> 
> 
> > > > 
> > > > It looks that you tried to import it but failed since there was a running
> > > > external VM:
> > > > "2015-08-28 15:43:38,546 WARN 
> > > > [org.ovirt.engine.core.bll.ImportVmFromConfigurationCommand]
> > > > (ajp--127.0.0.1-8702-4) [5ef73a48] CanDoAction of action
> > > > ImportVmFromConfiguration failed for user admin@internal. Reasons:
> > > > VAR__ACTION__IMPORT,VAR__TYPE__VM,VM_CANNOT_IMPORT_VM_EXISTS,$VmName
> > > > external-unc-srv-ad1"
> > > > 
> > > > I would suggest you to try to attach this Storage Domain to a new setup,
> > > > using new Hosts (or rebooted hosts) and try to register this VM once again
> > > 
> > > That's right I did'nt reboot the hosts, where is the documentation about
> > > that ?
> > 
> > see http://www.ovirt.org/Features/ImportStorageDomain#Restrictions:
> > "In a disaster recovery scenario, if the Host, which the user about to use,
> > was in the environment which was destroyed, it is recommended to reboot this
> > Host before adding it to the new setup. The reason for that is first, to
> > kill any qemu processes which are still running and might be automatically
> > be added as VMs into the new setup, and also to avoid any sanlock issues."
> > 
> > 
> > > I saw effectively, some Vms with the prefix external., but do you think it's
> > > normal that I don't see any VMs in the "Import VM" tag of the Storage Doamin
> > > ?
> > 
> > weird, in the logs it looks like you were trying to import them, can you
> > please try to attach this Storage Domain to a new setup with rebooted hosts,
> > and let me know if you still don't see those VMs in the import subtab?

Comment 19 Alain 2015-09-04 10:14:19 UTC
Ok Maor, I will plan to make another try on next week, I will keep you inform of the results.
Thank you
Regards

Comment 20 Maor 2015-09-04 14:41:41 UTC
(In reply to Alain from comment #19)
> Ok Maor, I will plan to make another try on next week, I will keep you
> inform of the results.
> Thank you
> Regards

Thanks, please let me know if you need any help on the process.
I'm changing the severity to undefined for now, until we will get more details about the other try.

Comment 21 Alain 2015-09-07 08:40:40 UTC
(In reply to Maor from comment #20)
> (In reply to Alain from comment #19)
> > Ok Maor, I will plan to make another try on next week, I will keep you
> > inform of the results.
> > Thank you
> > Regards
> 
> Thanks, please let me know if you need any help on the process.
> I'm changing the severity to undefined for now, until we will get more
> details about the other try.

I will make the operation tomorrow morning between 9h30 to 12h30.
If I have a big trouble, I will contact you if you're not too busy.
Thanks
Regards

Comment 22 Maor 2015-09-07 09:22:04 UTC
(In reply to Alain from comment #21)
> (In reply to Maor from comment #20)
> > (In reply to Alain from comment #19)
> > > Ok Maor, I will plan to make another try on next week, I will keep you
> > > inform of the results.
> > > Thank you
> > > Regards
> > 
> > Thanks, please let me know if you need any help on the process.
> > I'm changing the severity to undefined for now, until we will get more
> > details about the other try.
> 
> I will make the operation tomorrow morning between 9h30 to 12h30.
> If I have a big trouble, I will contact you if you're not too busy.
> Thanks
> Regards

no problem, I will try to be available then

Comment 23 Alain 2015-09-08 07:49:22 UTC
(In reply to Maor from comment #22)
> (In reply to Alain from comment #21)
> > (In reply to Maor from comment #20)
> > > (In reply to Alain from comment #19)
> > > > Ok Maor, I will plan to make another try on next week, I will keep you
> > > > inform of the results.
> > > > Thank you
> > > > Regards
> > > 
> > > Thanks, please let me know if you need any help on the process.
> > > I'm changing the severity to undefined for now, until we will get more
> > > details about the other try.
> > 
> > I will make the operation tomorrow morning between 9h30 to 12h30.
> > If I have a big trouble, I will contact you if you're not too busy.
> > Thanks
> > Regards
> 
> no problem, I will try to be available then

Hi Maor,
I am ready to begin, last question, do you think I'd better remove the hosts from the old DC before create them in the new one ?
Thnaks
Regards

Comment 24 Maor 2015-09-08 09:08:58 UTC
(In reply to Alain from comment #23)
> (In reply to Maor from comment #22)
> > (In reply to Alain from comment #21)
> > > (In reply to Maor from comment #20)
> > > > (In reply to Alain from comment #19)
> > > > > Ok Maor, I will plan to make another try on next week, I will keep you
> > > > > inform of the results.
> > > > > Thank you
> > > > > Regards
> > > > 
> > > > Thanks, please let me know if you need any help on the process.
> > > > I'm changing the severity to undefined for now, until we will get more
> > > > details about the other try.
> > > 
> > > I will make the operation tomorrow morning between 9h30 to 12h30.
> > > If I have a big trouble, I will contact you if you're not too busy.
> > > Thanks
> > > Regards
> > 
> > no problem, I will try to be available then
> 
> Hi Maor,
> I am ready to begin, last question, do you think I'd better remove the hosts
> from the old DC before create them in the new one ?
> Thnaks
> Regards

yes, please do

Comment 25 Alain 2015-09-08 09:11:28 UTC
(In reply to Maor from comment #24)
> (In reply to Alain from comment #23)
> > (In reply to Maor from comment #22)
> > > (In reply to Alain from comment #21)
> > > > (In reply to Maor from comment #20)
> > > > > (In reply to Alain from comment #19)
> > > > > > Ok Maor, I will plan to make another try on next week, I will keep you
> > > > > > inform of the results.
> > > > > > Thank you
> > > > > > Regards
> > > > > 
> > > > > Thanks, please let me know if you need any help on the process.
> > > > > I'm changing the severity to undefined for now, until we will get more
> > > > > details about the other try.
> > > > 
> > > > I will make the operation tomorrow morning between 9h30 to 12h30.
> > > > If I have a big trouble, I will contact you if you're not too busy.
> > > > Thanks
> > > > Regards
> > > 
> > > no problem, I will try to be available then
> > 
> > Hi Maor,
> > I am ready to begin, last question, do you think I'd better remove the hosts
> > from the old DC before create them in the new one ?
> > Thnaks
> > Regards
> 
> yes, please do

Maor,
That's what I've done, but no after reboot and installing hosts, they are all non-responsive and the network doesn't go up only loopback...

Comment 26 Maor 2015-09-08 15:41:06 UTC
It looks that since the hosts were not rebooted, the external-VMs which were automatically imported to the recovered engine has ran over the existing unregistered entities in the OVF_STORE disk.

The engine should filter out all the external VMs when updating OVF_STORE disk

Comment 27 Maor 2015-09-21 11:35:53 UTC
The bug here is when an external VM is running in the setup and the OVF_STORE is being updated with it.

You can reproduce this bug with the following steps:
1. Create a VM with a disk on a Storage Domain
2. Move the Storage Domain to maintenance - At this point the VM will be saved in the OVF_STORE disk
3. Move the Storage Domain back to up again
4. Run the VM - At this point copy the qemu command process in the Host to use it later

At this point you can DR the setup (or do the following steps):
5. Remove the Storage Domain from the setup
6. Try to run the VM again from the Host. - So the VM will be added automatically to the setup as external-VM
7. Import the Storage Domain back to the setup - At this point the Storage should have the orignal VM as a candidate entity to register to the setup.
8. Stop the external VM and remove it from the setup
9. Try to import the candidate entity from the imported Storage Domain.

Comment 28 Elad 2015-10-15 13:50:17 UTC
While having an external VM with the same UUID as a VM that is unregistered in the imported storage domain in its OVF_STORE disk, OVFs upload to the OVF_STORE disk doesn't override the VM with the external one.

Steps I did:
1) Created a domain and VM with disk located in it
2) Deactivated the domain so the OVF_STORE will be updated
3) Activated the domain back
4) Started the VM
5) Stopped ovirt-engine service in the engine
6) On a second RHEVM setup: added the host that has the qemu process of the VM from the first setup and created a new DC with a new domain (master). The VM was reported as an external one 
7) Imported the domain from the first setup to the second one. Activated the domain
8) Deactivated the imported domain
9) Removed the external VM from the setup
10) Activated the imported domain 
11) Registered (imported) the VM 

The VM I registered is the original VM and not the external one

Verified using RHEV-3.6.0-15
rhevm-3.6.0-0.18.el6.noarch
vdsm-4.17.8-1.el7ev.noarch

Comment 29 Sandro Bonazzola 2015-11-04 11:17:14 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.