The Hosted Engine VM is not started anymore. HA agent daemon logs say: MainThread::WARNING::2015-02-06 10:12:17,063::hosted_engine::501::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attem pt and looking at vdsm.log I can see: Thread-97::DEBUG::2015-02-06 10:12:17,053::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1] Thread-97::DEBUG::2015-02-06 10:12:17,053::task::592::Storage.TaskManager.Task::(_updateState) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::moving from state init -> state preparing Thread-97::INFO::2015-02-06 10:12:17,053::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='b95765f8-c8d1-4641-9da8-7ce6e41ff059', conList=[{'connection': '192.168.1.107:/storage', 'iqn': ',', 'protocol_version': '3', 'kvm': 'password', '=': 'user', ',': '='}], options=None) Thread-97::DEBUG::2015-02-06 10:12:17,054::hsm::2404::Storage.HSM::(__prefetchDomains) nfs local path: /rhev/data-center/mnt/192.168.1.107:_storage Thread-97::DEBUG::2015-02-06 10:12:17,055::hsm::2428::Storage.HSM::(__prefetchDomains) Found SD uuids: (u'd326997c-b097-4e67-a1f4-758cfe96a339',) Thread-97::DEBUG::2015-02-06 10:12:17,055::hsm::2484::Storage.HSM::(connectStorageServer) knownSDs: {d326997c-b097-4e67-a1f4-758cfe96a339: storage.nfsSD.findDomain} Thread-97::ERROR::2015-02-06 10:12:17,055::task::863::Storage.TaskManager.Task::(_setError) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 870, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2486, in connectStorageServer res.append({'id': conDef["id"], 'status': status}) KeyError: 'id' Thread-97::DEBUG::2015-02-06 10:12:17,055::task::882::Storage.TaskManager.Task::(_run) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::Task._run: 6164cf9c-29b9-43ce-881a-c683481b64c9 (1, 'b95765f8-c8d1-4641-9da8-7ce6e41ff059', [{'kvm': 'password', ',': '=', 'connection': '192.168.1.107:/storage', 'iqn': ',', 'protocol_version': '3', '=': 'user'}]) {} failed - stopping task Thread-97::DEBUG::2015-02-06 10:12:17,055::task::1214::Storage.TaskManager.Task::(stop) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::stopping in state preparing (force False) Thread-97::DEBUG::2015-02-06 10:12:17,055::task::990::Storage.TaskManager.Task::(_decref) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::ref 1 aborting True Thread-97::INFO::2015-02-06 10:12:17,055::task::1168::Storage.TaskManager.Task::(prepare) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::aborting: Task is aborted: u"'id'" - code 100 Thread-97::DEBUG::2015-02-06 10:12:17,055::task::1173::Storage.TaskManager.Task::(prepare) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::Prepare: aborted: 'id' Thread-97::DEBUG::2015-02-06 10:12:17,056::task::990::Storage.TaskManager.Task::(_decref) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::ref 0 aborting True Thread-97::DEBUG::2015-02-06 10:12:17,056::task::925::Storage.TaskManager.Task::(_doAbort) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::Task._doAbort: force False Thread-97::DEBUG::2015-02-06 10:12:17,056::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-97::DEBUG::2015-02-06 10:12:17,056::task::592::Storage.TaskManager.Task::(_updateState) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::moving from state preparing -> state aborting Thread-97::DEBUG::2015-02-06 10:12:17,056::task::547::Storage.TaskManager.Task::(__state_aborting) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::_aborting: recover policy none Thread-97::DEBUG::2015-02-06 10:12:17,056::task::592::Storage.TaskManager.Task::(_updateState) Task=`6164cf9c-29b9-43ce-881a-c683481b64c9`::moving from state aborting -> state failed Thread-97::DEBUG::2015-02-06 10:12:17,056::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-97::DEBUG::2015-02-06 10:12:17,056::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-97::ERROR::2015-02-06 10:12:17,056::dispatcher::79::Storage.Dispatcher::(wrapper) 'id' Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 103, in wrapper return m(self, *a, **kw) File "/usr/share/vdsm/storage/task.py", line 1176, in prepare raise self.error KeyError: 'id' Looks like ha daemon is passing wrong values in connectStorageServer: conList=[{',': '=', '=': 'user', 'connection': '192.168.1.107:/storage', 'iqn': ',', 'kvm': 'password', 'protocol_version': '3'}] # rpm -qa |egrep "(vdsm|ovirt-hosted-engine-ha)"|sort ovirt-hosted-engine-ha-1.3.0-0.0.master.20150126112715.20150126112712.gita3c842d.el7.noarch vdsm-4.17.0-381.git160954e.el7.x86_64 vdsm-cli-4.17.0-381.git160954e.el7.noarch vdsm-gluster-4.17.0-381.git160954e.el7.noarch vdsm-infra-4.17.0-381.git160954e.el7.noarch vdsm-jsonrpc-4.17.0-381.git160954e.el7.noarch vdsm-python-4.17.0-381.git160954e.el7.noarch vdsm-python-zombiereaper-4.16.12-0.el7.noarch vdsm-xmlrpc-4.17.0-381.git160954e.el7.noarch vdsm-yajsonrpc-4.17.0-381.git160954e.el7.noarch # cat /etc/ovirt-hosted-engine/hosted-engine.conf fqdn=ovirt.home vm_disk_id=20d55944-7177-4883-979f-a4c5ce789d8d vmid=7e3d35ae-3611-4d50-a634-1bdbee81e67a storage=192.168.1.107:/storage conf=/etc/ovirt-hosted-engine/vm.conf service_start_time=0 host_id=1 console=vnc domainType=nfs3 spUUID=b95765f8-c8d1-4641-9da8-7ce6e41ff059 sdUUID=d326997c-b097-4e67-a1f4-758cfe96a339 connectionUUID=c65694a0-6d9b-4bd4-af05-6720c07aa897 ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem ca_subject="C=EN, L=Test, O=Test, CN=Test" vdsm_use_ssl=true gateway=192.168.1.1 bridge=ovirtmgmt metadata_volume_UUID=82664f1b-cdfb-4854-9077-2cc6082fc87b metadata_image_UUID=5d65488d-b5b4-4992-ad58-eb132125d99e lockspace_volume_UUID=e364cba3-4dcc-4997-8973-534380a0c006 lockspace_image_UUID=007f210b-8de4-471e-bb76-417519423a8b # The following are used only for iSCSI storage iqn= portal= user= password= port= # cat /etc/ovirt-hosted-engine/vm.conf vmId=7e3d35ae-3611-4d50-a634-1bdbee81e67a memSize=4096 display=vnc devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, type:drive},specParams:{},readonly:true,deviceId:e4c5cdbb-4511-4fd9-8c83-10d3d940a721,path:/var/tmp/RHEL-7.0-20140507.0-Server-x86_64-dvd1.iso,device:cdrom,shared:false,type:disk} devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:75d72f29-5ec4-498e-913e-f67b4fc72f9c,imageID:20d55944-7177-4883-979f-a4c5ce789d8d,specParams:{},readonly:false,domainID:d326997c-b097-4e67-a1f4-758cfe96a339,optional:false,deviceId:20d55944-7177-4883-979f-a4c5ce789d8d,address:{bus:0x00, slot:0x06, domain:0x0000, type:pci, function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1} devices={device:scsi,model:virtio-scsi,type:controller} devices={nicModel:pv,macAddr:00:16:3e:45:e1:88,linkActive:true,network:ovirtmgmt,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:97eef999-decd-4e2f-b7f2-7cc2fe3c40e6,address:{bus:0x00, slot:0x03, domain:0x0000, type:pci, function:0x0},device:bridge,type:interface} devices={device:console,specParams:{},type:console,deviceId:59548720-db25-45c5-b076-78c8ea745098,alias:console0} vmName=HostedEngine spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir smp=4 cpuType=SandyBridge emulatedMachine=rhel6.5.0
Hi Simone, can you provide me information how to verified this bug. I did deployment with two hosts on NFS storage(except problem with self.storageType) all works fine And did deployment with one host on iscsi storage(have bug with adding additional host on ISCSI) all works fine If it enough or you want some additional testing?
It's enough, thanks.
Verified on ovirt-hosted-engine-ha-1.3.0-0.3.beta.git183a4ff.el7ev.noarch
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one.