Description of problem: Hosted-engine over FC: I disconnected the LUN where the HE VM image is located on from the host using LUN masking. Then, when the VM entered 'Paused', I re-mapped the LUN to the host. Waited for few hours and the VM was not resumed. Checked the status of ha-agent and ha-broker and both were active. Checked agent.log and saw the following errors (every few seconds): MainThread::WARNING::2015-07-26 05:01:57,239::agent::200::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '8' MainThread::INFO::2015-07-26 05:01:57,260::hosted_engine::226::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: green-vdsb.qa.lab.tlv.redhat.com MainThread::WARNING::2015-07-26 05:01:57,305::hosted_engine::581::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2015-07-26 05:02:12,328::hosted_engine::581::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2015-07-26 05:02:27,350::hosted_engine::581::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2015-07-26 05:02:42,368::hosted_engine::581::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2015-07-26 05:02:57,391::hosted_engine::581::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::ERROR::2015-07-26 05:03:12,406::hosted_engine::584::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed trying to connect storage: MainThread::ERROR::2015-07-26 05:03:12,407::agent::197::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed trying to connect storage' - trying to restart agent MainThread::WARNING::2015-07-26 05:03:17,408::agent::200::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9' MainThread::ERROR::2015-07-26 05:03:17,408::agent::202::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug. MainThread::INFO::2015-07-26 05:03:17,408::agent::141::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down The LUN is visible to the host via device-mapper: 3514f0c54476012d7 dm-0 XtremIO ,XtremApp size=45G features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 4:0:1:1 sdc 8:32 active ready running `- 4:0:0:1 sdb 8:16 active ready running Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Server release 7.1 (Maipo) ovirt-3.6.0-3 ovirt-hosted-engine-ha-1.3.0-0.0.master.20150615153650.20150615153645.git5f8c290.el7.noarch vdsm-4.17.0-1054.git562e711.el7.noarch ovirt-hosted-engine-setup-1.3.0-0.0.master.20150623153111.git68138d4.el7.noarch libvirt-daemon-1.2.8-16.el7_1.3.x86_64 qemu-kvm-ev-2.1.2-23.el7_1.3.1.x86_64 sanlock-3.2.2-2.el7.x86_64 selinux-policy-3.13.1-23.el7_1.8.noarch How reproducible: Always Steps to Reproduce: 1. Deploy hosted-engine over FC 2. Once engine is up, un-map the LUN where the engine image is located on from the storage server using LUN masking. Wait for the VM to enter 'Pasued' (check with: # vdsClient -s 0 list table) 3. Re-map the LUN to host Actual results: As mentioned above, the VM is not resumed. Also, executing 'hosted-engine --vm-status' does not give any output: [root@green-vdsb ~]# hosted-engine --vm-status [root@green-vdsb ~]# Services are active with errors: [root@green-vdsb ~]# systemctl status ovirt-ha-agent.service ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled) Active: active (running) since Sun 2015-07-26 09:24:11 IDT; 3min 36s ago Process: 19933 ExecStop=/usr/lib/systemd/systemd-ovirt-ha-agent stop (code=exited, status=0/SUCCESS) Process: 17092 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-agent start (code=exited, status=0/SUCCESS) Main PID: 17105 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─17105 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent Jul 26 09:24:10 green-vdsb.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jul 26 09:24:11 green-vdsb.qa.lab.tlv.redhat.com systemd-ovirt-ha-agent[17092]: Starting ovirt-ha-agent: [ OK ] Jul 26 09:24:11 green-vdsb.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Jul 26 09:25:26 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-agent[17105]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed trying to connect storage: Jul 26 09:25:26 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-agent[17105]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Failed trying to connect storage' - trying to restart agent Jul 26 09:26:46 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-agent[17105]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed trying to connect storage: Jul 26 09:26:46 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-agent[17105]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Failed trying to connect storage' - trying to restart agent [root@green-vdsb ~]# systemctl status ovirt-ha-broker.service ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled) Active: active (running) since Tue 2015-07-14 05:22:52 IDT; 1 weeks 5 days ago Main PID: 6933 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─6933 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker Jul 26 04:28:44 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:28:59 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:29:14 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:29:30 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:29:45 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:30:00 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:30:15 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:30:30 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:30:45 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Jul 26 04:31:00 green-vdsb.qa.lab.tlv.redhat.com ovirt-ha-broker[6933]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'get-stats servic...ne.metadata' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in han... Hint: Some lines were ellipsized, use -l to show in full. VM remains Paused: [root@green-vdsb ~]# virsh -r list Id Name State ---------------------------------------------------- 5 HostedEngine paused Expected results: VM should be resumed from EIO once the storage is reachable again Additional info: sosreport: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsb.qa.lab.tlv.redhat.com-20150726092958.tar.xz
We had the same issue with hosted engine over iSCSI: https://bugzilla.redhat.com/show_bug.cgi?id=1157421
Hi Adam, this seems to be related to your fix for iSCSI and has nothing to do with hosted engine directly. We have no means to resume the VM and think that it needs to be handled on VDSM side.
This is an automated message. oVirt 3.6.0 RC1 has been released. This bug has no target release and still have target milestone set to 3.6.0-rc. Please review this bug and set target milestone and release to one of the next releases.
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high
oVirt 3.6.0 has been released on November 4th, re-targeting to 3.6.1 since this bug has been marked as high severity
I am finally getting around to looking at this. Trying to recreate.
Adam, any updates?
Elad, can you try to reproduce on the next 3.6.2 build? With the latest changes in the import of hosted engine this problem might be solved
Once we'll have a build with the fix for BZ #1293928, I'll give it a try
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Unmapped the LUN of the HE SD. agent.log: MainThread::ERROR::2016-01-21 15:07:49,528::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Request failed: failed to read metadata: [Errno 5] Input/output error' - trying to restart age nt Re-mapped the LUN. HE VM got resumed after a few minutes: MainThread::INFO::2016-01-21 15:25:43,235::ovf_store::101::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:c9cafc30-5e81-4867-9b21-d500984af48e, volUUID:198aba47-9c7d-4a1a-a99b-a f00f4cf644d MainThread::INFO::2016-01-21 15:25:43,562::ovf_store::101::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:8c8decb7-9475-4875-8a3a-3ca9bcf2d9f1, volUUID:48e4a035-e075-46ab-b7e0-1 0eab76f9a80 MainThread::INFO::2016-01-21 15:25:43,699::ovf_store::110::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2016-01-21 15:25:43,700::ovf_store::117::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/blockSD/64af5aa9-06b8-4d43-ae2b-fc7c12572c3c/images/8c8decb7-9475-4875-8a3a-3ca9bcf2d9f1/48e4a035-e075-46ab-b7e0-10eab76f9a80 MainThread::INFO::2016-01-21 15:25:43,714::config::225::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Found an OVF for HE VM, trying to convert MainThread::INFO::2016-01-21 15:25:43,719::config::230::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Got vm.conf from OVF_STORE MainThread::INFO::2016-01-21 15:25:43,719::hosted_engine::1150::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2016-01-21 15:25:43,736::hosted_engine::1162::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean MainThread::INFO::2016-01-21 15:25:43,736::hosted_engine::1111::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2016-01-21 15:25:44,150::hosted_engine::1117::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: 8158a77c-0f8d-4e32-ae45-3f0f6dae9aed Status = WaitForLaunch nicModel = rtl8139,pv statusTime = 4309809680 emulatedMachine = rhel6.5.0 pid = 0 vmName = HostedEngine devices = [{'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'volumeID': '73551778-3dfc-41b3-9ba8-3aac15fd5bf9', 'imageID': 'a8723540-5e1d-4c10-8a1a-77267412745e', 'readonly': 'false', 'domainID': '64af5aa9-06b8-4d43-ae2b-fc7c12572c3c', 'deviceId': 'a8723540-5e1d-4c10-8a1a-77267412745e', 'poolID': '00000000-0000-0000-0000-000000000000', 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'nicModel': 'pv', 'macAddr': '00:16:3E:76:D5:D5', 'network': 'ovirtmgmt', 'deviceId': '6dbe0a82-1616-41f7-939d-f71fd77141de', 'address': 'None', 'device': 'bridge', 'type': 'interface'}, {'index': '2', 'iface': 'ide', 'readonly': 'true', 'deviceId': '8c3179ac-b322-4f5c-9449-c52e3665e0ae', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller', 'deviceId': '1ad871a0-c1dd-49fe-8771-e476e015bb7d', 'address': {'slot': '0x04', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}}, {'device': 'usb', 'type': 'controller', 'deviceId': '2ddea97b-1cb0-4ae1-be38-17dd5f3529a0', 'address': {'slot': '0x01', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x2'}}, {'device': 'ide', 'type': 'controller', 'deviceId': 'd8e76ebf-81a4-4330-be43-bd4964d1441c', 'address': {'slot': '0x01', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x1'}}, {'device': 'virtio-serial', 'type': 'controller', 'deviceId': '25f2ee45-e245-4253-9256-b5dd0579f62d', 'address': {'slot': '0x05', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}}] guestDiskMapping = {} vmType = kvm displaySecurePort = -1 memSize = 4096 displayPort = -1 clientIp = spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir smp = 2 displayIp = 0 display = vnc MainThread::INFO::2016-01-21 15:25:44,150::hosted_engine::1118::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: MainThread::INFO::2016-01-21 15:25:44,150::hosted_engine::1130::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2016-01-21 15:25:44,159::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1453382744.16 type=state_transition detail=EngineStart-EngineStarting hostname='green-vdsb.qa.lab.tlv.redhat.com' MainThread::INFO::2016-01-21 15:25:44,322::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? sent MainThread::INFO::2016-01-21 15:25:44,322::hosted_engine::615::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM Tested over FC using: Host: ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch vdsm-jsonrpc-4.17.18-0.el7ev.noarch vdsm-python-4.17.18-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch vdsm-cli-4.17.18-0.el7ev.noarch vdsm-yajsonrpc-4.17.18-0.el7ev.noarch vdsm-xmlrpc-4.17.18-0.el7ev.noarch vdsm-4.17.18-0.el7ev.noarch vdsm-infra-4.17.18-0.el7ev.noarch Engine: ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch ovirt-vmconsole-1.0.0-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.2.6-0.1.el6.noarch ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.2.6-0.1.el6.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.2.6-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.2.6-0.1.el6.noarch rhevm-spice-client-x86-msi-3.6-6.el6.noarch rhevm-lib-3.6.2.6-0.1.el6.noarch rhevm-cli-3.6.0.0-1.el6ev.noarch rhevm-webadmin-portal-3.6.2.6-0.1.el6.noarch rhevm-tools-3.6.2.6-0.1.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-doc-3.6.0-2.el6eng.noarch rhevm-backend-3.6.2.6-0.1.el6.noarch rhevm-setup-3.6.2.6-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-6.el6.noarch rhevm-userportal-3.6.2.6-0.1.el6.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-branding-rhev-3.6.0-3.el6ev.noarch rhevm-sdk-python-3.6.2.1-1.el6ev.noarch rhevm-log-collector-3.6.0-1.el6ev.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.2.6-0.1.el6.noarch rhevm-dbscripts-3.6.2.6-0.1.el6.noarch rhevm-setup-plugins-3.6.1-2.el6ev.noarch rhevm-spice-client-x64-msi-3.6-6.el6.noarch rhevm-restapi-3.6.2.6-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-3.6.2.6-0.1.el6.noarch rhevm-3.6.2.6-0.1.el6.noarch rhevm-setup-base-3.6.2.6-0.1.el6.noarch rhevm-extensions-api-impl-3.6.2.6-0.1.el6.noarch rhevm-websocket-proxy-3.6.2.6-0.1.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.2.6-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-6.el6.noarch