Bug 1290518 - failed Activating hosted engine domain during auto-import on NFS
Summary: failed Activating hosted engine domain during auto-import on NFS
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.HostedEngine
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-3.6.2
: 3.6.2.6
Assignee: Roy Golan
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1293853 (view as bug list)
Depends On:
Blocks: 1269768
TreeView+ depends on / blocked
 
Reported: 2015-12-10 17:32 UTC by Roy Golan
Modified: 2016-02-18 11:19 UTC (History)
19 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-02-18 11:19:12 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-3.6.z+
rule-engine: blocker+
mgoldboi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine vm "sosreport -o ovirt" (1.34 MB, application/x-xz)
2015-12-14 15:29 UTC, Sandro Bonazzola
no flags Details
exception when click on hosted_storage line (108.00 KB, image/png)
2015-12-30 13:39 UTC, Gianluca Cecchi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1293853 0 high CLOSED Uncaught JS exception selecting unattached hosted_engine storage domain 2021-02-22 00:41:40 UTC
oVirt gerrit 51033 0 master MERGED core: hosted-engine: Lock the sd import exclusively 2021-01-05 12:17:35 UTC
oVirt gerrit 51126 0 master MERGED core: hosted-engine: import the domain only when DC is UP 2021-01-05 12:17:32 UTC
oVirt gerrit 51208 0 master MERGED core: hosted-engine: Add connection details explicitly for NFS 2021-01-05 12:17:35 UTC
oVirt gerrit 51414 0 ovirt-engine-3.6.2 MERGED core: hosted-engine: Lock the sd import exclusively 2021-01-05 12:18:11 UTC
oVirt gerrit 51415 0 ovirt-engine-3.6.2 MERGED core: hosted-engine: import the domain only when DC is UP 2021-01-05 12:17:35 UTC
oVirt gerrit 51416 0 ovirt-engine-3.6.2 MERGED core: hosted-engine: Add connection details explicitly for NFS 2021-01-05 12:17:36 UTC
oVirt gerrit 51417 0 ovirt-engine-3.6 MERGED core: hosted-engine: Lock the sd import exclusively 2021-01-05 12:17:36 UTC
oVirt gerrit 51418 0 ovirt-engine-3.6 MERGED core: hosted-engine: import the domain only when DC is UP 2021-01-05 12:17:36 UTC
oVirt gerrit 51419 0 ovirt-engine-3.6 MERGED core: hosted-engine: Add connection details explicitly for NFS 2021-01-05 12:18:13 UTC
oVirt gerrit 51457 0 ovirt-engine-3.6 MERGED core: storage: make storagServerConnectin compensatable 2021-01-05 12:18:13 UTC
oVirt gerrit 51478 0 ovirt-engine-3.6.2 MERGED core: storage: make storagServerConnectin compensatable 2021-01-05 12:18:14 UTC
oVirt gerrit 51488 0 master MERGED core: storage: make storagServerConnectin compensatable 2021-01-05 12:17:38 UTC

Internal Links: 1293853

Description Roy Golan 2015-12-10 17:32:16 UTC
Description of problem:

With NFS storage domain, the auto activate of the hosted engine domain during the auto-import failed. The host reports that it can not connect because it didn't have connection details. 

The error was thrown by StorageHandlingCommandBase:37 - 

"Did not connect host '{}' to storage domain '{}' because connection for conectionId '{}' is null."


Apparently the fetch of the connection details from the db was missing this.

But a manual Activate from the UI was successful. Both actions are fetching the connection details from DB so its vague why this fails the first time.


How reproducible:
needs investigation

Steps to Reproduce:
1. install hosted engine, NFS domain
2. after engine starts add a master storage domain and wait for pool to be Active
3. auto import should kick in and the failure to ActivateStorageDomain should be invoked.

Actual results:
Storage domain is attached to pool but left in Maintenance.

Expected results:
Storage domain should be Active

Comment 1 Roy Golan 2015-12-10 17:33:19 UTC
Can you please upload the log from your engine machine with the error?

Comment 2 Red Hat Bugzilla Rules Engine 2015-12-11 02:52:46 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 3 Red Hat Bugzilla Rules Engine 2015-12-11 02:52:46 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Sandro Bonazzola 2015-12-14 15:29:54 UTC
Created attachment 1105599 [details]
engine vm "sosreport -o ovirt"

Comment 5 cshao 2015-12-22 12:09:30 UTC
RHEV-H QE also met this issue.

Test version:
rhev-hypervisor7-7.2-20151221.1
ovirt-node-3.6.0-0.24.20151209gitc0fa931.el7ev.noarch
ovirt-node-plugin-hosted-engine-0.3.0-4.el7ev.noarch
ovirt-hosted-engine-setup-1.3.1.3-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.6-1.el7ev.noarch
RHEVM-appliance-20151216.0-1.3.6.ova   

Test steps:
1. TUI clean install rhevh
2. Login rhevh, setup network via dhcp
3. Deploy Hosted Engine
4. After engine starts add a master nfs storage domain and wait for pool to be Active

Test result:
1. Failed to activate Storage Domain hosted_storage.
2. Manually press activate on the domain still got failed.

Comment 7 Simone Tiraboschi 2015-12-22 23:54:09 UTC
It fails also here:

engine logs:

2015-12-22 22:43:22,215 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-10) [] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsync='true', hostId='29d66519-e9b9-4f49-aa68-2089af23a36a', vds='Host[,29d66519-e9b9-4f49-aa68-2089af23a36a]', vmIds='[2d6bd225-db62-40a6-a4ba-894decd280a5]'}), log id: 644451db
2015-12-22 22:43:23,221 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-10) [] FINISH, FullListVDSCommand, return: [{status=Up, nicModel=rtl8139,pv, emulatedMachine=pc, guestDiskMapping={487f9d83-4bcc-4eea-8={name=/dev/vda}, QEMU_DVD-ROM={name=/dev/sr0}}, vmId=2d6bd225-db62-40a6-a4ba-894decd280a5, pid=3796, devices=[Ljava.lang.Object;@42f0d0e2, smp=4, vmType=kvm, displayIp=0, display=vnc, displaySecurePort=-1, memSize=4096, displayPort=5900, cpuType=SandyBridge, spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir, statusTime=4295362380, vmName=HostedEngine, clientIp=, pauseCode=NOERR}], log id: 644451db
2015-12-22 22:43:23,715 INFO  [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (default task-15) [117abb1c] Lock Acquired to object 'EngineLock:{exclusiveLocks='[77d0d6b1-376f-4ee6-a8a3-c998a20e5d69=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2015-12-22 22:43:23,766 INFO  [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-48) [117abb1c] Running command: ActivateStorageDomainCommand internal: false. Entities affected :  ID: 77d0d6b1-376f-4ee6-a8a3-c998a20e5d69 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
2015-12-22 22:43:23,780 INFO  [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-48) [117abb1c] Lock freed to object 'EngineLock:{exclusiveLocks='[77d0d6b1-376f-4ee6-a8a3-c998a20e5d69=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2015-12-22 22:43:23,780 INFO  [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-48) [117abb1c] ActivateStorage Domain. Before Connect all hosts to pool. Time: Tue Dec 22 22:43:23 UTC 2015
2015-12-22 22:43:23,784 WARN  [org.ovirt.engine.core.bll.storage.BaseFsStorageHelper] (org.ovirt.thread.pool-8-thread-50) [117abb1c] Did not connect host '29d66519-e9b9-4f49-aa68-2089af23a36a' to storage domain 'hosted_storage' because connection for connectionId '80c6d414-74bb-44b2-8b06-487323d5aec5' is null.
2015-12-22 22:43:23,784 ERROR [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-48) [117abb1c] Cannot connect storage server, aborting Storage Domain activation.
2015-12-22 22:43:23,785 INFO  [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-48) [117abb1c] Command [id=cb536dd5-3c39-4497-887e-020b4b3b6e1c]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='00000001-0001-0001-0001-000000000374', storageId='77d0d6b1-376f-4ee6-a8a3-c998a20e5d69'}', status='Maintenance'}.
2015-12-22 22:43:23,803 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-48) [117abb1c] Correlation ID: 117abb1c, Job ID: 95adb6f4-ecf3-4bf9-b354-f735dac326ab, Call Stack: null, Custom Event ID: -1, Message: Failed to activate Storage Domain hosted_storage (Data Center Default) by admin@internal
2015-12-22 22:43:39,242 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-58) [] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsync='true', hostId='29d66519-e9b9-4f49-aa68-2089af23a36a', vds='Host[,29d66519-e9b9-4f49-aa68-2089af23a36a]', vmIds='[2d6bd225-db62-40a6-a4ba-894decd280a5]'}), log id: 197268cb
2015-12-22 22:43:40,248 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-58) [] FINISH, FullListVDSCommand, return: [{status=Up, nicModel=rtl8139,pv, emulatedMachine=pc, guestDiskMapping={487f9d83-4bcc-4eea-8={name=/dev/vda}, QEMU_DVD-ROM={name=/dev/sr0}}, vmId=2d6bd225-db62-40a6-a4ba-894decd280a5, pid=3796, devices=[Ljava.lang.Object;@1e3331e7, smp=4, vmType=kvm, displayIp=0, display=vnc, displaySecurePort=-1, memSize=4096, displayPort=5900, cpuType=SandyBridge, spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir, statusTime=4295379410, vmName=HostedEngine, clientIp=, pauseCode=NOERR}], log id: 197268cb


VDSM logs:

Comment 8 Doron Fediuck 2015-12-27 08:43:14 UTC
*** Bug 1293853 has been marked as a duplicate of this bug. ***

Comment 9 Roy Golan 2015-12-28 13:38:26 UTC
(In reply to shaochen from comment #5)
> RHEV-H QE also met this issue.
> 
> Test version:
> rhev-hypervisor7-7.2-20151221.1
> ovirt-node-3.6.0-0.24.20151209gitc0fa931.el7ev.noarch
> ovirt-node-plugin-hosted-engine-0.3.0-4.el7ev.noarch
> ovirt-hosted-engine-setup-1.3.1.3-1.el7ev.noarch
> ovirt-hosted-engine-ha-1.3.3.6-1.el7ev.noarch
> RHEVM-appliance-20151216.0-1.3.6.ova   
> 
> Test steps:
> 1. TUI clean install rhevh
> 2. Login rhevh, setup network via dhcp
> 3. Deploy Hosted Engine
> 4. After engine starts add a master nfs storage domain and wait for pool to
> be Active
> 
> Test result:
> 1. Failed to activate Storage Domain hosted_storage.
> 2. Manually press activate on the domain still got failed.


As a workaround can you please Destory the hosted_storage domain? It should be auto imported again.

Comment 10 cshao 2015-12-29 09:05:41 UTC
(In reply to Roy Golan from comment #9)
> (In reply to shaochen from comment #5)
> > RHEV-H QE also met this issue.
> > 
> > Test version:
> > rhev-hypervisor7-7.2-20151221.1
> > ovirt-node-3.6.0-0.24.20151209gitc0fa931.el7ev.noarch
> > ovirt-node-plugin-hosted-engine-0.3.0-4.el7ev.noarch
> > ovirt-hosted-engine-setup-1.3.1.3-1.el7ev.noarch
> > ovirt-hosted-engine-ha-1.3.3.6-1.el7ev.noarch
> > RHEVM-appliance-20151216.0-1.3.6.ova   
> > 
> > Test steps:
> > 1. TUI clean install rhevh
> > 2. Login rhevh, setup network via dhcp
> > 3. Deploy Hosted Engine
> > 4. After engine starts add a master nfs storage domain and wait for pool to
> > be Active
> > 
> > Test result:
> > 1. Failed to activate Storage Domain hosted_storage.
> > 2. Manually press activate on the domain still got failed.
> 
> 
> As a workaround can you please Destory the hosted_storage domain? It should
> be auto imported again.

Yes, the workaround can work well.

Comment 11 Gianluca Cecchi 2015-12-30 13:02:34 UTC
Hello,
perhaps I've not understood correctly, but bug 1293853 has been closed as a duplicate of this one.
In that bug, there is 
Uncaught JS exception selecting unattached hosted_engine storage domain
but there is also, at least in my case, the engine SD that, after update to 3.6.1, results in state "unattached" (and not "maintenance"). And if you try to attach it, you get crash of engine vm.

Will the resolution of this bug fix that case too?

Comment 12 Gianluca Cecchi 2015-12-30 13:37:14 UTC
OK, I followed what suggested in comment#9 and I confirm it works for me too.
So, initial environment is hosted_storage in state "unattached" and when I click on it I have the exception described in bug 1293853.
I select it and chose "destroy".
After some seconds I see the storage domain reappear, in status "maintenance".
I select it and choose "Activate" and the operation succeeds.
I can then see the engine VM under the "Virtual Machines" tab, initially in state "down" but after some more seconds as up.

NOTE: In events tab when the hosted_storage SD is automatically added, I see this strange message:

"
This Data center compatibility version does not support importing a data domain with its entities (VMs and Templates). The imported domain will be imported without them.
"

The compatibility of DC and cluster are both 3.6 (this environment was initially installed in 3.6.0)

Only problem is that I continue to have the exception when I select the hosted_storage line in Storage tab. See attach

Comment 13 Gianluca Cecchi 2015-12-30 13:39:34 UTC
Created attachment 1110499 [details]
exception when click on hosted_storage line

After update from 3.6.0 to 3.6.1 and destroying hosted_storage, it is readded in maintenance. I activate it, but the exception remains

Comment 14 Roy Golan 2016-01-03 01:31:11 UTC
(In reply to Gianluca Cecchi from comment #13)
> Created attachment 1110499 [details]
> exception when click on hosted_storage line
> 
> After update from 3.6.0 to 3.6.1 and destroying hosted_storage, it is
> readded in maintenance. I activate it, but the exception remains

I didn't expect that. Can you share the output of this query?

  $ psql engine -c "select d.storage,c.connection,d.storage_name from storage_server_connections c inner join storage_domains d on c.id = d.storage;"

Comment 15 Gianluca Cecchi 2016-01-03 15:18:51 UTC
here it is Roy (I have changed domain name):
engine=# select d.storage,c.connection,d.storage_name from storage_server_connections c inner join storage_domains d on c.id = d.storage;
               storage                |               connection                |  storage_name  
--------------------------------------+-----------------------------------------+----------------
 157b098d-11ba-451c-ba90-e500468a8ef0 | ractor.my.domain:/NFS_DOMAIN | NFS1
 aa1fcdac-9a32-4643-b549-1e77b60ed921 | ractor.my.domain:/ISO_DOMAIN | ISO
 d5db018a-6d51-44b5-b688-3ef8d53d6d83 | ractor.my.domain:/SHE_DOMAIN | hosted_storage
(3 rows)

Comment 16 Gianluca Cecchi 2016-01-07 08:00:09 UTC
Hello Roy,
did the output of the query in comment#15 satisfied your expectations?
One important news: today I connected again to the engine to see the exception described in attachment of comment#13, because it seemed a bit different from the initial exception I had in bug 1293853.
But actually today I don't get any exception at all... going to storage pane and then hosted_storage domain line I get nothing and the same if I navigate hrough its subpanels...
BTW the output from the query is always the same as in comment#15.
Is there any jbo that could have cleared any problems? Do you need engine.log and/or server.log on engine to check what cleaned the situation?

Comment 17 Roy Golan 2016-01-07 22:01:51 UTC
(In reply to Gianluca Cecchi from comment #16)
> Hello Roy,
> did the output of the query in comment#15 satisfied your expectations?

I saw it during one of my dev setup while working on SUBJECT but its gone now. 
If you ever see it again and can follow [1] to un-obfuscate the error it will be very helpful. Thanks again.

[1] http://www.ovirt.org/OVirt_Engine_Debug_Obfuscated_UI to actually see where the failure is that will be excellent. 

> One important news: today I connected again to the engine to see the
> exception described in attachment of comment#13, because it seemed a bit
> different from the initial exception I had in bug 1293853.
> But actually today I don't get any exception at all... going to storage pane
> and then hosted_storage domain line I get nothing and the same if I navigate
> hrough its subpanels...
> BTW the output from the query is always the same as in comment#15.
> Is there any jbo that could have cleared any problems? Do you need
> engine.log and/or server.log on engine to check what cleaned the situation?

Comment 18 Nikolai Sednev 2016-01-21 07:43:22 UTC
Successfully got HE-SD&HE-VM auto-imported on cleanly installed NFS deployment after NFS data SD was added. Engine was installed using PXE.
Works for me on these components:
Host:
ovirt-vmconsole-1.0.0-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch
mom-0.5.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.6.x86_64
ovirt-host-deploy-1.4.1-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.2.x86_64
ovirt-setup-lib-1.0.1-1.el7ev.noarch
vdsm-4.17.18-0.el7ev.noarch
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
Linux version 3.10.0-327.8.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jan 11 05:03:18 EST 2016

Engine:
ovirt-vmconsole-1.0.0-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch
rhevm-3.6.2.6-0.1.el6.noarch
rhevm-dwh-setup-3.6.2-1.el6ev.noarch
rhevm-dwh-3.6.2-1.el6ev.noarch
rhevm-reports-setup-3.6.2.4-1.el6ev.noarch
rhevm-reports-3.6.2.4-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-2.el6ev.noarch
Linux version 2.6.32-573.8.1.el6.x86_64 
(mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Fri Sep 25 19:24:22 EDT 2015

Comment 19 Nikolai Sednev 2016-01-21 07:45:19 UTC
SD was auto-attached and auto-activated, once data SD was manually added.

Comment 20 Gianluca Cecchi 2016-01-25 13:24:06 UTC
I confirm installation was now ok also for me with CentOS 7.2 and 3.6rc3.
See also here for details:
http://lists.ovirt.org/pipermail/users/2016-January/037452.html


Note You need to log in before you can comment on or make changes to this bug.